WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

arXiv cs.CL Papers

Summary

WildFeedback is a novel framework that leverages in-situ user feedback from actual LLM conversations to automatically create preference datasets for aligning language models with human preferences, addressing scalability and bias issues in traditional annotation-based alignment methods.

arXiv:2408.15549v4 Announce Type: replace Abstract: As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, misalignment with real-world user preferences, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with LLMs to create preference datasets automatically. Given a corpus of multi-turn user-LLM conversation, WildFeedback identifies and classifies user feedback to LLM responses between conversation turns. The user feedback is then used to create examples of preferred and dispreferred responses according to users' preference. Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences, as evidenced by both traditional benchmarks and our proposed checklist-guided evaluation. By incorporating in-situ feedback from actual users, WildFeedback addresses the scalability, subjectivity, and bias challenges that plague existing approaches, marking a significant step toward developing LLMs that are more responsive to the diverse and evolving needs of their users.
Original Article
View Cached Full Text

Cached at: 04/20/26, 08:31 AM

# Aligning LLMs With In-situ User Interactions And Feedback
Source: https://arxiv.org/html/2408.15549
Taiwei Shi∗†, Zhuoer Wang∗‡, Longqi Yang∗⋄, Ying-Chun Lin∘, Zexue He▽, Mengting Wan⋄, Pei Zhou⋄, Sujay Jauhar⋄, Sihao Chen⋄, Shan Xia⋄, Hongfei Zhang⋄ Jieyu Zhao†, Xiaofeng Xu⋄, Xia Song⋄, Jennifer Neville∗⋄ ⋄Microsoft Corporation,∘Purdue University,‡Texas A&M University, ▽University of California San Diego,†University of Southern California Corresponding authors: [email protected], [email protected], [email protected], [email protected]. The work was done when Taiwei Shi, Zhuoer Wang, Ying-Chun Lin, and Zexue He were interns at Microsoft Corporation.

###### Abstract

As large language models (LLMs) continue to advance, aligning these models with human preferences has emerged as a critical challenge. Traditional alignment methods, relying on human or LLM annotated datasets, are limited by their resource-intensive nature, inherent subjectivity, misalignment with real-world user preferences, and the risk of feedback loops that amplify model biases. To overcome these limitations, we introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with LLMs to create preference datasets automatically. Given a corpus of multi-turn user-LLM conversation, WildFeedback identifies and classifies user feedback to LLM responses between conversation turns. The user feedback is then used to create examples of preferred and dispreferred responses according to users' preference. Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences, as evidenced by both traditional benchmarks and our proposed checklist-guided evaluation. By incorporating in-situ feedback from actual users, WildFeedback addresses the scalability, subjectivity, and bias challenges that plague existing approaches, marking a significant step toward developing LLMs that are more responsive to the diverse and evolving needs of their users.

WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback

Taiwei Shi††thanks:Corresponding authors: [email protected], [email protected], [email protected], [email protected]. The work was done when Taiwei Shi, Zhuoer Wang, Ying-Chun Lin, and Zexue He were interns at Microsoft Corporation.∗†, Zhuoer Wang∗‡, Longqi Yang∗⋄, Ying-Chun Lin∘, Zexue He▽,Mengting Wan⋄, Pei Zhou⋄, Sujay Jauhar⋄, Sihao Chen⋄, Shan Xia⋄, Hongfei Zhang⋄Jieyu Zhao†, Xiaofeng Xu⋄, Xia Song⋄, Jennifer Neville∗⋄⋄Microsoft Corporation,∘Purdue University,‡Texas A&M University,▽University of California San Diego,†University of Southern California

## 1 Introduction

Large language models (LLMs) have become a cornerstone of modern natural language processing (NLP) applications, powering a wide range of tasks from conversational agents to content generation. Despite their strengths, aligning LLMs with human preferences remains a challenge (Bai et al., 2022a; Ouyang et al., 2022; OpenAI et al., 2024; Dubey et al., 2024). Traditional alignment methods involve instruction tuning and preference training on curated human or LLM-annotated datasets (Bai et al., 2022a; Ouyang et al., 2022; Cui et al., 2024). However, these approaches face critical limitations: human annotation is resource-intensive and often subjective, while LLM-generated synthetic data risks reinforcing biases instead of capturing diverse human preferences (Gautam and Srinath, 2024; Wyllie et al., 2024; Chen et al., 2024; Poddar et al., 2024).

In response, recent work explores in-situ user feedback (e.g., upvotes, downvotes, engagement) for LLM training (Shi et al., 2022; Lin et al., 2024b; Don-Yehiya et al., 2024). This approach harnesses authentic user feedback during interactions with LLMs, offering a more dynamic and accurate reflection of user preferences. Rather than relying on static, costly, and misaligned pre-collected data, this method adapts to evolving user needs. However, existing works are limited in scope, either requiring explicit, structured feedback from users or fine-tuning models directly on responses that trigger explicit user feedback.

In this paper, we introduce WildFeedback, a novel framework designed to align LLMs with in-situ user interactions and feedback. WildFeedback addresses the limitations of existing approaches by constructing preference datasets from real user-LLM conversations, specifically focusing on user feedback that naturally occurs during these interactions. The overview of the framework is shown in Figure 1. Our framework comprises three key components: (1) Feedback signal identification, which detects and classifies user feedback, distinguishing between positive and negative signals to infer user preferences; (2) Preference data construction, which transforms these signals into structured preference datasets; and (3) Checklist-guided evaluation, which systematically assesses model responses using an instance-level checklist derived from extracted user preferences as a rubric. This ensures that model improvements are grounded in real user expectations rather than predefined heuristics. To demonstrate the effectiveness of WildFeedback, we apply it to WildChat (Zhao et al., 2024), a dataset containing over 148,000 multi-turn conversations between users and ChatGPT (OpenAI et al., 2024) (see details of WildChat in Appendix E). This process results in a preference dataset of 20,281 samples. The dataset is available here: https://huggingface.co/datasets/microsoft/WildFeedback, providing a rich resource for improving LLM alignment with real-world user preferences.

Through extensive experiments, we demonstrate that models fine-tuned on WildFeedback show significant improvements in aligning with user preferences, both in automated benchmarks and in our proposed checklist-guided evaluation framework. This work represents a step forward in creating more user-centric LLMs, with the potential to enhance user satisfaction across a wide range of applications.

The contributions of this paper are threefold:

1. Introduction of WildFeedback: We present a novel framework that leverages in-situ user feedback to construct preference datasets that better reflect actual human values, addressing the scalability and subjectivity issues inherent in human-annotated datasets and the biases in synthetic data.
2. Robust Data Construction: We adapt and expand on existing user satisfaction estimation techniques to identify feedback signals in natural conversations. This enables the creation of a nuanced preference dataset that includes both user preferences and corresponding responses, enhancing the effectiveness of fine-tuning LLMs to better align with user expectations.
3. Checklist-Guided Evaluation: We propose a checklist-guided evaluation methodology that aligns the assessment of model performance with real user preferences, providing a more accurate benchmark for evaluating LLMs' alignment with human values.

## 2 Related Work

#### Feedback Learning for LLMs

Incorporating human feedback has been shown to be an effective strategy to align LLMs with human preferences (Ouyang et al., 2022; Bai et al., 2022a; Dubey et al., 2024). However, relying on human annotators to provide human feedback is inefficient and resource-intensive, which makes it hard to scale up. Additionally, human preferences are highly subjective. A small set of annotators may not represent broader preferences. Accordingly, some researchers aim to supervise AI models by models themselves (Bai et al., 2022b; Lee et al., 2023; Madaan et al., 2023; Burns et al., 2023; Li et al., 2023a; Shi et al., 2026). For instance, Bai et al. (2022b) introduced constitutional AI, in which they prompt LLMs to self-refine their own generations given a set of human-defined constitutions. However, relying on model's own feedback can create a feedback loop where the model's outputs increasingly reflect its own biases rather than diverse and authentic human perspectives. Recently, researchers have begun exploring the mining of user preferences from natural human-LLM interactions (Shi et al., 2022; Lin et al., 2024b; Don-Yehiya et al., 2024; Buening et al., 2026). These approaches capture real-time user feedback for more accurate preference alignment. Our work builds on this trend by leveraging in-situ user interactions to create preference datasets that better align with actual human values, addressing the limitations of both synthetic and human-annotated preference datasets.

#### Data for LLM Alignment

LLM alignment typically consists of two steps: instruction tuning and preference training. Instruction tuning, or supervised finetuning (SFT), aims to finetune models with a set of instruction-response pairs. Early works incorporated various NLP tasks for instruction tuning, demonstrating that LLMs could generalize well across different tasks (Wang et al., 2022; Chung et al., 2022; Ouyang et al., 2022). Subsequent research focused on constructing instruction data by directly distilling from capable LLMs (Wang et al., 2023; Xu et al., 2023). Researchers later recognized that preference training could further boost model performance across various tasks (Ouyang et al., 2022; Dubey et al., 2024). Preference training uses desired and undesired responses, either human-annotated (Bai et al., 2022a) or LLM-generated (Cui et al., 2024). Beyond general-purpose preference datasets, some datasets focus on specific tasks, such as summarization (Wu et al., 2021), model safety (Ji et al., 2023; Shi et al., 2024; Pan et al., 2025), and mathematics (Lightman et al., 2023; Song et al., 2025). However, these approaches often rely on curated datasets that are either manually annotated by human experts or generated by models like GPT-4 (OpenAI et al., 2024). While these datasets provide a useful foundation, they may not fully capture the complexity and diversity of real-world user interactions. Our work addresses this gap by introducing a framework that leverages real-time feedback from actual users, allowing for more authentic and context-sensitive alignment of LLMs with true human preferences.

## 3 WildFeedback

Existing preference datasets often suffer from a mismatch between actual human preferences and those of the annotators (Chen et al., 2024; Poddar et al., 2024). Synthetic preference datasets, such as UltraFeedback (Cui et al., 2024), rely solely on GPT-4 to generate rankings and determine which responses are preferred or dispreferred. However, this approach may not accurately capture real human values or nuanced preferences. Relying on synthetic data can create a feedback loop where the model's outputs increasingly reflect its own biases rather than diverse and authentic human perspectives. On the other hand, preference datasets annotated by human annotators are difficult to scale due to time and budget constraints (Bai et al., 2022a; Ouyang et al., 2022; Dubey et al., 2024). Moreover, human annotators' preferences can be highly subjective, often differing significantly from those of real users (Zhang et al., 2024; Fleisig et al., 2023).

To address these challenges, we introduce WildFeedback, a framework designed to align LLMs with in-situ user interactions and feedback. Unlike previous approaches that rely on synthetic responses, our framework directly learns preferences from real-world users, capturing both explicit and implicit feedback signals. The framework comprises three steps: (1) feedback signal identification, (2) preference data construction, and (3) checklist-guided evaluation. The pipeline is illustrated in Figure 1. We apply this framework to WildChat (Zhao et al., 2024), a corpus of real user-ChatGPT conversations, and obtained the WildFeedback dataset, a preference dataset of 20,281 samples.

### 3.1 Feedback Signals Identification

To construct preference data from natural human-LLM interactions, we first identify conversations that contain feedback signals. This can be achieved through user satisfaction estimation. In multi-turn conversational sessions, a user may explicitly express their satisfaction (e.g., "thank you") or dissatisfaction (e.g., "revise it") in their utterances. Lin et al. (2024b) proposed a framework named SPUR that can automatically learn and identify SAT (satisfaction) and DSAT (dissatisfaction) patterns. SPUR generalizes SAT/DSAT rubrics from conversations with annotated thumb feedback by recursively prompting GPT-4. These rubrics can then be used to score a user's overall satisfaction or dissatisfaction, allowing us to identify utterances containing feedback signals.

Similar Articles

Re-Centering Humans in LLM Personalization

arXiv cs.CL

This paper studies the gap between synthetic and human data for evaluating LLM personalization across three stages: attribute extraction, relevance matching, and response generation. Results show models perform worse on real human data, and the authors introduce lightweight training interventions to improve alignment.