Reflections and New Directions for Human-Centered Large Language Models

arXiv cs.CL 05/11/26, 04:00 AM Papers
human-centered-ai llms hci responsible-ai stanford framework
Summary
This paper presents a framework for Human-Centered Large Language Models (HCLLMs), integrating HCI and NLP perspectives to prioritize human values throughout the model development lifecycle.
arXiv:2605.06901v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science. With this rise in global influence comes greater urgency to build, evaluate, and deploy these systems in a manner that prioritizes not only technical capabilities but also human priorities. This work presents a framework for developing Human-Centered Large Language Models (HCLLMs), which integrates perspectives from Natural Language Processing (NLP), Human-Computer Interaction (HCI), and responsible AI. Considering the ethics, economics, and technical objectives of language modeling, we argue that model developers need to address human concerns, preferences, values, and goals, not only during a cursory post-training stage, but rather with rigor and care at every stage of the pipeline. This paper offers human-centered insights and recommendations for developers at each stage, from system design to data sourcing, model training, evaluation, and responsible deployment. Then we conclude with a case study, applying these insights to understand the future of work with HCLLMs.
Original Article
View Cached Full Text
Cached at: 05/11/26, 06:39 AM
# Reflections and New Directions for Human-Centered Large Language Models
Source: [https://arxiv.org/html/2605.06901](https://arxiv.org/html/2605.06901)
Caleb Ziems\*,Dora Zhao\*,Rose E\. Wang, Matthew Jörke, Ahmad Rushdi, Advit Deepak, Sunny Yu, Anshika Agarwal, Harshvardhan Agarwal, Gabriela Aranguiz\-Dias, Aditri Bhagirath, Justine Breuch, Huanxing Chen, Ruishi Chen, Sarah Chen, Haocheng Fan, William Fang, Cat Gonzales Fergesen, Daniel Frees, Tian Gao, Ziqing Huang, Vishal Jain, Yucheng Jiang, Kirill Kalinin, Su Doga Karaca, Arpandeep Khatua, Teland La, Isabelle Levent, Miranda Li, Xinling Li, Yongce Li, Angela Liu, Minsik Oh, Nathan J\. Paek, Anthony Qin, Emily Redmond, Michael J\. Ryan, Aadesh Salecha, Xiaoxian Shen, Pranava Singhal, Shashanka Subrahmanya, Mei Tan, Irawadee Thawornbut, Michelle Vinocour, Xiaoyue Wang, Zheng Wang, Henry Jin Weng, Pawan Wirawarn, Shirley Wu, Sophie Wu, Yichen Xie, Patrick Ye, Sean Zhang, Yutong Zhang, Cathy Zhou, Yiling Zhao, James Landay,Diyi Yang\*

Stanford University

###### Abstract

Large Language Models \(LLMs\) are increasingly shaping the private and professional lives of users, with numerous applications in business, education, finance, healthcare, law, and science\. With this rise in global influence comes greater urgency to build, evaluate, and deploy these systems in a manner that prioritizes not only technical capabilities but alsohumanpriorities\. This work presents a framework for developing Human\-Centered Large Language Models \(HCLLMs\), which integrates perspectives from Natural Language Processing \(NLP\), Human\-Computer Interaction \(HCI\), and responsible AI\. Considering the ethics, economics, and technical objectives of language modeling, we argue that model developers need to address human concerns, preferences, values, and goals, not only during a cursory post\-training stage, but rather with rigor and care at every stage of the pipeline\. This paper offers human\-centered insights and recommendations for developers at each stage, from system design to data sourcing, model training, evaluation, and responsible deployment\. Then we conclude with a case study, applying these insights to understand the future of work with HCLLMs\.

###### Contents

1. [1](https://arxiv.org/html/2605.06901#S1)Introduction
2. [2](https://arxiv.org/html/2605.06901#S2)HCI for HCLLMs1. [2\.1Understanding the*Who*: the Humans in HCLLMs](https://arxiv.org/html/2605.06901#S2.SS1) 2. [2\.2Defining the*What*: Principles and Challenges for Designing HCLLMs](https://arxiv.org/html/2605.06901#S2.SS2)1. [2\.2\.1Bridging the Gulf of Envisioning\.](https://arxiv.org/html/2605.06901#S2.SS2.SSS1) 2. [2\.2\.2Interpreting LLM Outputs\.](https://arxiv.org/html/2605.06901#S2.SS2.SSS2) 3. [2\.2\.3Navigating Human\-LLM Relationships\.](https://arxiv.org/html/2605.06901#S2.SS2.SSS3) 4. [2\.2\.4Designing for Diverse Cultures and Contexts\.](https://arxiv.org/html/2605.06901#S2.SS2.SSS4) 3. [2\.3Expanding the*How*: Methods from HCI for HCLLMs](https://arxiv.org/html/2605.06901#S2.SS3)1. [2\.3\.1Experimental Research](https://arxiv.org/html/2605.06901#S2.SS3.SSS1) 2. [2\.3\.2Participatory Approaches](https://arxiv.org/html/2605.06901#S2.SS3.SSS2) 3. [2\.3\.3Qualitative Inquiry](https://arxiv.org/html/2605.06901#S2.SS3.SSS3) 4. [2\.4From Human\-Centered Design Challenges to Technical Artifacts](https://arxiv.org/html/2605.06901#S2.SS4)1. [2\.4\.1Case Study: Motivating Physical Activity with HCLLMs](https://arxiv.org/html/2605.06901#S2.SS4.SSS1)
3. [3](https://arxiv.org/html/2605.06901#S3)Data for HCLLMs1. [3\.1](https://arxiv.org/html/2605.06901#S3.SS1)Data Provenance1. [3\.1\.1Pretraining Data](https://arxiv.org/html/2605.06901#S3.SS1.SSS1) 2. [3\.1\.2Instruction\-tuning Data](https://arxiv.org/html/2605.06901#S3.SS1.SSS2) 3. [3\.1\.3Alignment Data](https://arxiv.org/html/2605.06901#S3.SS1.SSS3) 2. [3\.2](https://arxiv.org/html/2605.06901#S3.SS2)Data Representation, Bias and Ethics1. [3\.2\.1Quality\-of\-Service Harms](https://arxiv.org/html/2605.06901#S3.SS2.SSS1) 2. [3\.2\.2Representational Harms](https://arxiv.org/html/2605.06901#S3.SS2.SSS2) 3. [3\.2\.3Allocational harms](https://arxiv.org/html/2605.06901#S3.SS2.SSS3) 4. [3\.2\.4Mitigating Harms](https://arxiv.org/html/2605.06901#S3.SS2.SSS4) 3. [3\.3](https://arxiv.org/html/2605.06901#S3.SS3)Consent and Ownership1. [3\.3\.1Data Privacy Considerations](https://arxiv.org/html/2605.06901#S3.SS3.SSS1) 2. [3\.3\.2Proactive vs\. Reactive Privacy Strategies](https://arxiv.org/html/2605.06901#S3.SS3.SSS2) 3. [3\.3\.3Open Challenges in Data Privacy](https://arxiv.org/html/2605.06901#S3.SS3.SSS3) 4. [3\.4Expanding Data Sources: Synthetic and Non\-Traditional Data](https://arxiv.org/html/2605.06901#S3.SS4)1. [3\.4\.1Synthetic Data](https://arxiv.org/html/2605.06901#S3.SS4.SSS1) 2. [3\.4\.2Non\-traditional Data](https://arxiv.org/html/2605.06901#S3.SS4.SSS2)
4. [4](https://arxiv.org/html/2605.06901#S4)NLP for HCLLMs1. [4\.1](https://arxiv.org/html/2605.06901#S4.SS1)Supervised Fine\-tuning for HCLLMs1. [4\.1\.1Current Practices in Instruction Tuning](https://arxiv.org/html/2605.06901#S4.SS1.SSS1) 2. [4\.1\.2Human\-Centered Challenges with Instruction Tuning](https://arxiv.org/html/2605.06901#S4.SS1.SSS2) 3. [4\.1\.3Future of Instruction Tuning for HCLLMs](https://arxiv.org/html/2605.06901#S4.SS1.SSS3) 2. [4\.2](https://arxiv.org/html/2605.06901#S4.SS2)Learning from Human Preferences1. [4\.2\.1RL\-Based Methods](https://arxiv.org/html/2605.06901#S4.SS2.SSS1) 2. [4\.2\.2Non\-RL Methods](https://arxiv.org/html/2605.06901#S4.SS2.SSS2) 3. [4\.2\.3Beyond Human Feedback](https://arxiv.org/html/2605.06901#S4.SS2.SSS3) 3. [4\.3](https://arxiv.org/html/2605.06901#S4.SS3)Scaling Human Centered LLMs1. [4\.3\.1Scaling Laws in LLMs](https://arxiv.org/html/2605.06901#S4.SS3.SSS1) 2. [4\.3\.2Scaling in Human\-Centered Domains](https://arxiv.org/html/2605.06901#S4.SS3.SSS2) 3. [4\.3\.3Scaling in Human\-Centered Goals](https://arxiv.org/html/2605.06901#S4.SS3.SSS3) 4. [4\.3\.4Inference time scaling](https://arxiv.org/html/2605.06901#S4.SS3.SSS4) 4. [4\.4Personalization](https://arxiv.org/html/2605.06901#S4.SS4)1. [4\.4\.1Current Approaches](https://arxiv.org/html/2605.06901#S4.SS4.SSS1) 2. [4\.4\.2Future of Personalization for HCLLMs](https://arxiv.org/html/2605.06901#S4.SS4.SSS2) 5. [4\.5Pluralism](https://arxiv.org/html/2605.06901#S4.SS5)1. [4\.5\.1Current Approaches](https://arxiv.org/html/2605.06901#S4.SS5.SSS1) 2. [4\.5\.2Future of Pluralistic Alignment for HCLLMs](https://arxiv.org/html/2605.06901#S4.SS5.SSS2) 6. [4\.6Multilinguality](https://arxiv.org/html/2605.06901#S4.SS6)1. [4\.6\.1Current Approaches](https://arxiv.org/html/2605.06901#S4.SS6.SSS1) 2. [4\.6\.2Future of Multilinguality for HCLLMs](https://arxiv.org/html/2605.06901#S4.SS6.SSS2)
5. [5](https://arxiv.org/html/2605.06901#S5)Evaluation1. [5\.1](https://arxiv.org/html/2605.06901#S5.SS1)Model\-Level Evaluations1. [5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1)Benchmarks 2. [5\.1\.2Quantitative Evaluation](https://arxiv.org/html/2605.06901#S5.SS1.SSS2) 3. [5\.1\.3Qualitative Evaluation](https://arxiv.org/html/2605.06901#S5.SS1.SSS3) 2. [5\.2](https://arxiv.org/html/2605.06901#S5.SS2)Human\-Level Evaluations1. [5\.2\.1Human Values](https://arxiv.org/html/2605.06901#S5.SS2.SSS1) 2. [5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)Bias and Fairness Evaluation 3. [5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)Safety Evaluations 3. [5\.3](https://arxiv.org/html/2605.06901#S5.SS3)Societal\-level Evaluation
6. [6](https://arxiv.org/html/2605.06901#S6)Responsible Human\-Centered LLMs1. [6\.1](https://arxiv.org/html/2605.06901#S6.SS1)Interpretable and Explainable HCLLMs1. [6\.1\.1Current Approaches to Interpretability](https://arxiv.org/html/2605.06901#S6.SS1.SSS1) 2. [6\.1\.2Current Approaches to Explainability](https://arxiv.org/html/2605.06901#S6.SS1.SSS2) 3. [6\.1\.3Looking Forward](https://arxiv.org/html/2605.06901#S6.SS1.SSS3) 2. [6\.2](https://arxiv.org/html/2605.06901#S6.SS2)Steerable HCLLMs1. [6\.2\.1Current Approaches to Steerability](https://arxiv.org/html/2605.06901#S6.SS2.SSS1) 2. [6\.2\.2Looking Forward](https://arxiv.org/html/2605.06901#S6.SS2.SSS2) 3. [6\.3Safe HCLLMs](https://arxiv.org/html/2605.06901#S6.SS3)1. [6\.3\.1Current Approaches to Safety](https://arxiv.org/html/2605.06901#S6.SS3.SSS1) 2. [6\.3\.2Looking Forward](https://arxiv.org/html/2605.06901#S6.SS3.SSS2)
7. [7](https://arxiv.org/html/2605.06901#S7)Case Study: HCLLMs and the Future of Work1. [7\.1Defining the Stakeholders](https://arxiv.org/html/2605.06901#S7.SS1) 2. [7\.2Developing HCLLMs for the Future of Work](https://arxiv.org/html/2605.06901#S7.SS2) 3. [7\.3Responsibly Deploying HCLLMs in the Workforce](https://arxiv.org/html/2605.06901#S7.SS3)
8. [8](https://arxiv.org/html/2605.06901#S8)Conclusion
9. [References](https://arxiv.org/html/2605.06901#bib)

## 1Introduction

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/01_Intro.png)Figure 1:This survey has three core sections, focused on \(1\) defining, \(2\) developing, and \(3\) deploying human\-centered LLMs \(HCLLMs\)\. In the first section, we conceptualize human\-centeredness through HCI \(§[2](https://arxiv.org/html/2605.06901#S2)\) \(who, what, and how\)\. In the second, we illustrate how these principles appear across the LLM development pipeline \(§[3](https://arxiv.org/html/2605.06901#S3),[4](https://arxiv.org/html/2605.06901#S4),[5](https://arxiv.org/html/2605.06901#S5)\), and finally discuss considerations for deploying HCLLMs in a responsible manner \([6](https://arxiv.org/html/2605.06901#S6)\)\. Finally, we synthesize takeaways from the three core sections into a case study on the deployment of HCLLMs for the future of work \(§[7](https://arxiv.org/html/2605.06901#S7)\)\.Large Language Models \(LLMs\) have transitioned from research artifacts to production infrastructure\. They now power developer tools, enterprise copilots, search and recommendation systems, content moderation pipelines, and domain\-specific assistants across healthcare\(Thirunavukarasuet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1701)\), finance\(Xieet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1205), Nieet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1204)\), education\(Ganet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1702), Adiguzelet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1706), Wanget al\.,[2024d](https://arxiv.org/html/2605.06901#bib.bib2)\), science\(Siet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1461), Zhanget al\.,[2024f](https://arxiv.org/html/2605.06901#bib.bib1703)\)and law\(Liet al\.,[2025b](https://arxiv.org/html/2605.06901#bib.bib1206), Katzet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1207), Guhaet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib526)\)\. As LLMs are integrated into individual and collective processes, they can no longer be understood as isolated tools bounded by static performance metrics or leaderboard positions\. LLMs are sociotechnical systems with global influence, and should be developed and evaluated in starkly more human terms\. Are these models helpful, steerable, and safe under adversarial pressure, aligned across global markets, robust to distribution shift, and adaptable to evolving user goals and expectations? Do models comply with data governance regimes, privacy regulations, and ethical concerns around intellectual property? How can we build models that not only avoid harm but also actively contribute to human flourishing? Can LLMs do more than just passively assist humans; can they also actively collaborate with us as equal partners?

This survey advances the framework of Human\-Centered Large Language Modeling \(HCLLMs\) as a unifying lens for understanding and answering these questions\. Rather than treating human\-centered objectives as simple patches or alignment problems downstream of capability scaling, we argue that human\-centered methods must be embedded across the entire LLM development pipeline, from data sourcing and filtering, to post\-training and alignment, evaluation, deployment, and long\-term maintenance \(see Figure[1](https://arxiv.org/html/2605.06901#S1.F1)\)\.

Importantly, we will demonstrate how human\-centered objectives tend to resist universal solutions\. The optimal path will depend both on who you ask and how you operationalize concepts likeharmandbenefit\. Broad themes liketransparency,privacy,safety, andjusticefrequently emerge\(Jobinet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1705)\), but there will be significant variation in perspective on how these ideals should be implemented\(Awadet al\.,[2018](https://arxiv.org/html/2605.06901#bib.bib1707), Jobinet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1705)\)\. Governments and non\-profit organizations may codify the most dominant perspectives into laws and policies\(Jobinet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1705)\), but high\-level guidelines may fail to account for the nuances of real\-world use\(Hagendorff,[2020](https://arxiv.org/html/2605.06901#bib.bib1704)\), and lag behind the rapid evolution of language models themselves\(Auernhammer,[2020](https://arxiv.org/html/2605.06901#bib.bib189)\)\. In the face of these challenges, stakeholders often remain passive, which only endorses the status\-quo\(Kalluri,[2020](https://arxiv.org/html/2605.06901#bib.bib1711), Crawford,[2021](https://arxiv.org/html/2605.06901#bib.bib1712), Birhaneet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib224)\)\.

This survey paper elaborates on and endorses the alternative, Human\-Centered Design\(Capel and Brereton,[2023](https://arxiv.org/html/2605.06901#bib.bib261)\)\(HCD\), where users and other stakeholders arecentrallyinvolved with ideating, building, evaluating, and deploying Large Language Models\(Shneiderman,[2020](https://arxiv.org/html/2605.06901#bib.bib1117),[2022](https://arxiv.org/html/2605.06901#bib.bib1116)\)\. Their centrality at every stage of the design process is what distinguishes HCD from other instantiations of human factors design\(Xu,[2019](https://arxiv.org/html/2605.06901#bib.bib1343)\)that account for general user needs \(e\.g\.,transparency\) in only a small slice of the design or deployment process\(Auernhammer,[2020](https://arxiv.org/html/2605.06901#bib.bib189)\)\. LLM development rarelycentershumans, but there is a growing body of research from Natural Language Processing \(NLP\) and Human\-Computer Interaction \(HCI\) that points towards these ideals\. We will cover this foundation forHCLLMsin an in\-depth survey of relevant human factors approaches inHCI\(§[2](https://arxiv.org/html/2605.06901#S2)\) andNLP\(§[4](https://arxiv.org/html/2605.06901#S4)\), including more details on theData Pipeline\(§[3](https://arxiv.org/html/2605.06901#S3)\), and theEvaluation\(§[5](https://arxiv.org/html/2605.06901#S5)\) of LLMs\. With this foundation established, we will return to the principle axes ofResponsible and Ethical Deployment\(§[6](https://arxiv.org/html/2605.06901#S6)\), liketransparency,privacy, and safety\. Synthesizing our discussion across these prior chapters, we will conclude with a concreteCase Study\(§[7](https://arxiv.org/html/2605.06901#S7)\) on the considerations ofHCLLMsfor the future of work\.

## 2HCI for HCLLMs

How do we center humans in the design of LLMs? To start, we can turn to the field of human\-computer interaction \(HCI\), which offers the foundational principles for realizing the vision of HCLLMs\. In particular, HCI provides the established theories, methods, and frameworks for understanding and designing the critical interface between the human user and the complex system \(see Figure[2](https://arxiv.org/html/2605.06901#S2.F2)\)\. This field has long grappled with how to make technology not just functional, but also usable, understandable, and aligned with human values and needs\.

In the first subsection of this chapter, we start by tracing how principles ofhuman\-centered designapply to HCLLMs and understanding*who*are the stakeholders for designing HCLLMs \(§[2\.1](https://arxiv.org/html/2605.06901#S2.SS1)\)\. We then discussdesign principles and challengesfor creating HCLLMs in §[2\.2](https://arxiv.org/html/2605.06901#S2.SS2)\. These challenges range in scope from the individual level \(i\.e\., how we can improve end\-user interactions with these models\) to the societal level \(i\.e\., how to account for the diverse cultures and contexts in which these models will be deployed\)\. We then discuss how HCImethodologiescan be used in conjunction with techniques commonly used in NLP to build and evaluate HCLLMs\. We conclude in §[2\.3](https://arxiv.org/html/2605.06901#S2.SS3)with an overview of three methodological orientations — experimental methods, participatory approaches, and qualitative inquiry — from HCI and discuss how they can be adopted for HCLLMs\.

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/02_HCI.png)Figure 2:We can draw on the field of human\-computer interaction \(HCI\) to help inform human\-centered LLM design\. The first in this process is understanding*who*the relevant stakeholders are — both direct and indirect — in both the development and deployment of HCLLMs[2\.1](https://arxiv.org/html/2605.06901#S2.SS1)\. Second, we identify a set of unique interaction challenges when it comes to designing HCLLMs[2\.2](https://arxiv.org/html/2605.06901#S2.SS2)\. Finally, we discuss how HCI methods can be used for providing new design perspectives[2\.3](https://arxiv.org/html/2605.06901#S2.SS3)\. We synthesize these points in a case study on designing LLMs for motivating physical activity in Sec\.[2\.4\.1](https://arxiv.org/html/2605.06901#S2.SS4.SSS1)\.### 2\.1Understanding the*Who*: the Humans in HCLLMs

Human\-centered design \(HCD\) is an approach that centers the needs and experiences of users in all steps of the design process\. There are four guiding principles within HCDInteraction Design Foundation \([2025](https://arxiv.org/html/2605.06901#bib.bib594)\):

1. 1\.Centering real people, their needs, and experiences in the design process
2. 2\.Solving the core problems
3. 3\.Thinking of everything as existing in an interconnected system
4. 4\.Engaging in iterative prototyping to find solutions

When applying these principles of HCD to LLMs, it reframes the development away from solely optimizing technical performance toward understanding how these systems impact and fit into people’s real lives\. Rather than treating metrics such as accuracy or perplexity as the ultimate goal, HCD foregrounds the needs, values, and contexts of the humans who develop, use, and are affected by these models\. Thus, to start, we must first understand*who*the real people affected by LLMs are and*what*are the problems they face\. In our consideration, we account for people involved across the entire lifecycle of LLM development, starting from the data used to train these models or the objectives and principles that model developers prioritize to the way that end users interact with models\. In this subsection, we provide an overview of important stakeholder groups and their corresponding roles in the development of HCLLMs: data workers, model developers, end users, and indirect stakeholders\.

##### Humans as Data Workers\.

Data workersare group of stakeholders that are essential to the HCLLM ecosystem but also often overlooked\. Data workers form the often invisible human labor that undergirds the datasets and methods that make LLM development possible\. Within this category of data workers, we also have classes of individuals\. For example, there are those who directly label datasets or rank model outputs asdata annotators\. Their judgments instantiate the “human preferences” that guide alignment, embedding positionality into what the model learns to valueOuyanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib963)\), Kirket al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib708)\)\. While annotators are the ones directly labeling the data, selecting which values or ideologies are prioritized often comes from other parties who hold power or authority over data workers \(e\.g\., project manager, client requesting annotated data\)Miceliet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1484)\), Wanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1485)\)\. In addition to data annotators, there are alsosafety and moderation workers, who engage in various methods to expose and rectify potential harms in the model before deployment\. Similar to content moderation in other domains, such as social media, this work often entails sustained exposure to toxic or disturbing material, raising serious questions about the ethics and mental health implications of this invisible laborPendseet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1483)\)\. Finally, there aredata subjects, or the individuals whose internet data forms the bulk of the web corpora used to pre\-train LLMs\. Importantly, many of those who are data subjects are doing so unwittingly as their data is included without consent or even awarenessPaulladaet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1569)\), Birhane and Prabhu \([2021](https://arxiv.org/html/2605.06901#bib.bib1500)\)\. Together, these data workers form the backbone of HCLLM development, yet their contributions are systematically undervalued and underprotected\.

##### Humans as Model Developers\.

Next,model developersof HCLLMs are the ones most immediately tasked with shaping the model architecture, training pipeline, and deliverable behaviors of HCLLMs\. They make crucial decisions about what data to collect, how to preprocess it, and which training objectives and evaluation protocols to employ\. Prior works have shown how the values of model developers can be implicitly imbued in resulting technical artifacts, such as in how quality filters are defined for dataGururanganet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1685)\)and what research problems are prioritizedBirhaneet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib224)\)\. In designing HCLLMs, model developers must harmonize the trade\-offs between performance, fairness, inclusivity, and safety\. For example, they must ensure the fairness, privacy, and safety of the training data of LLMs \(§[3](https://arxiv.org/html/2605.06901#S3)\), design comprehensive evaluation frameworks to capture real\-world usage scenarios \(§[5](https://arxiv.org/html/2605.06901#S5)\), and devise governance mechanisms and transparency guidelines that help ensure responsible deployment \(§[6](https://arxiv.org/html/2605.06901#S6)\)\.

##### Humans as End Users\.

End users— the individuals and groups who directly interact with LLM systems — represent one of the largest and most diverse stakeholder groups\. The general\-purpose nature of LLMs means that end users span a vast range: students seeking homework help; professionals drafting reports; people with mental health concerns looking for support and companionship; creative writers brainstorming ideas; non\-native speakers translating text; and countless othersHandaet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1255)\), Chatterjiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib56)\), Handaet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib584)\)\. Each user brings their own distinct goals, expertise levels, cultural backgrounds, and needs when interfacing with models\. Thus, as LLMs shift from nascent technologies into being increasingly ingrained in users’ lives, understanding their real\-world impact becomes critical\. Despite the outsized impact that LLMs may have on their lives, most end users have relatively little control or influence in the design and creation of these technologies\. There are methods that allow end\-users to exert some degree of agency over models, such as through fine\-tuning, which offers a more heavyweight interventionTanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1737)\), Wanget al\.\([2025c](https://arxiv.org/html/2605.06901#bib.bib1762)\)\. Research prototypes and product features also offer ways to tailor user interactions with LLMs through creating personalized memory stores or drawing on community knowledge banksZhaoet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1763)\), OpenAI \([2024b](https://arxiv.org/html/2605.06901#bib.bib1764)\), Ryanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1250)\), Zhonget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1765)\)\. While these personalization mechanisms offer some user agency, they also raise fundamental questions about the broader relationship between end users and LLMs\.

Key questions related to end users include: How do these systems affect productivity, creativity, learning, and decision\-making skills? What are the risks of over\-reliance, deskilling, or perpetuating existing biases? And how can we design HCLLMs that empower users rather than constrain or harm them? Answering these questions requires moving beyond system capabilities — bridging existing methods in NLP with those from HCI — to examine how LLMs may reshape human capacities, agency, and well\-being in practice\.

##### Humans as Indirect Stakeholders

Finally, even people who are not direct end users of LLMs, orindirect stakeholderscan still be meaningfully impacted by themFriedman \([1996](https://arxiv.org/html/2605.06901#bib.bib468)\)\. If we consider an example of an LLM used to assist physicians with taking clinical notes, patients are also affected by this system even if they are not directly interfacing with itHaberleet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib8)\), Koromet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib9)\)\. Beyond patients, we can also think of many other potential indirect stakeholders in this example, such as families or caregivers, insurance providers, hospital administrators, and so on\. Enumerating all indirect stakeholders can seem like an intractable problem; in part, this underscores that human\-centered LLMs are not just technical artifacts; they are sociotechnical systems with many externalities to consider\. There is no prescriptive formula for deciding which indirect stakeholders to prioritize; however, thinking about factors, such as who may not be well\-represented in making design decisions, who is most likely to be harmed by such systems \(both immediately and over longer time horizons\), or conversely who may stand to benefit in ways that are not explicitly intended, can inform designers\. More broadly, rather than treating indirect stakeholders as an overwhelming checklist, HCLLM designers should use this complexity as motivation to identify the most consequential stakeholders early and involve them throughout the design process, rather than only including them as an afterthought \(Sec\.[2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2)\)\.

### 2\.2Defining the*What*: Principles and Challenges for Designing HCLLMs

While involving humans throughout the LLM lifecycle helps us understand who is affected by these systems, it also raises fundamental questions about*how*to design for their needs\. Designing human\-centered LLMs requires more than simply optimizing model capabilities\. It demands careful attention to how users interact with, make sense of, and form relationships with these systems across diverse contexts\.

Designing human\-AI interaction is not a new challenge\. Pre\-dating LLMs, prior workYanget al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1359)\)has delineated the unique challenges of human\-AI interaction, highlighting the \(1\) uncertainties about model capabilities and \(2\) the complexity of AI outputs — both of which are problems that remain even as model capabilities have improved\. In tandem, many others have enumerated many best practices for designing human\-AI interactionAmershiet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib1134)\), Yildirimet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1496)\)\. These principles span the lifecycle of model development, including the initial conceptualization \(e\.g\., what values are imbued in the system, how is privacy and fairness handled\), the model development process \(e\.g\., what data is used, how is the model trained\), the model deployment \(e\.g\., how are errors handled, how are users’ preferences accounted for\), and the interface layer where humans interact with the system \(e\.g\., how are users’ expectations calibrated, how transparent are the model’s outputs\)Wrightet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1133)\)\. While these principles provide a strong foundation, human\-centered LLMs require adapting and extending them to address the unique characteristics of large language models, accounting for their open\-ended generation capabilities, evolving social and relational roles, and wide\-scale deployment\.

In this section, we explore specific challenges that arise when designing HCLLMs\. While of course there are challenges beyond those covered in this subsection, our goal is to illustrate the differing levels of consideration we must attend to\. These range from considerations at the unit of the individual, such as scaffolding users’ interactions with models and model outputs, to societal level questions when it comes to deploying models across diverse cultures and contexts\.

#### 2\.2\.1Bridging the Gulf of Envisioning\.

Within HCI, foundational frameworks for design areNorman \([1988](https://arxiv.org/html/2605.06901#bib.bib947)\)’s gulfs of execution and evaluation\. The gulf of execution pertains to gaps related to users figuring out how to do certain actions that they want, whereas the gulf of evaluation arises when the user is not able to interpret the system’s output\. These gulfs are persistent across the design of many technologies, ranging from everyday, physical objects like doorhandles to cutting edge technologies\. However, LLMs pose a new challenge for users: the “gulf of envisioning”Subramonyamet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1658)\)\. This gulf refers to the distance that emerges between what users may intend to do with an LLM and the prompts that are ultimately produced\.

Why does this gulf of envisioning occur? Although LLMs have proven capable across a wide\-range of tasks, they still require users to guide how they are being used\. The de facto form of user interaction with LLMs is prompting, or providing textual instructions delineating the desired interactions\. While prompting seems an ostensibly simple task, users consistently struggle with writing prompts, underspecifying the instructions and foregoing the appropriate level of detail, ultimately yielding unsatisfactory model outputsZamfirescu\-Pereiraet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1622)\)\. Furthermore, users must contend with the indeterminacy of model outputs as well as the “black\-box” nature of how their inputs are transformed into outputsSubramonyamet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1658)\), Yanget al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1359)\), Agrawala \([2023](https://arxiv.org/html/2605.06901#bib.bib1361)\)\.

Thus, one core challenge for designing human\-centered LLMs is overcoming this gulf of envisioning\. One approach to do so has been focused on designing interfaces that can better scaffold users’ interactions with models\. For example, recent works have introduced direct manipulation interfaces as an alternative to prompt writingMassonet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1641)\), Wuet al\.\([2022a](https://arxiv.org/html/2605.06901#bib.bib1313)\), Arawjoet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1648)\), such as providing a visual programming environment that helps users create more complex prompt chainsArawjoet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1648)\)\. Others have operationalized prompting pipelines as modular code components to provide a more systematic process for optimizing and refining inputsKhattabet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1651)\)\. There are also approaches beyond intervening at the prompt level that are intended to improve users’ interactions with LLMs\. One line of work has explored how to improve the model’s understanding of the user, such as through building better user models or improving context that LLMs have about the userShaikhet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1647)\), Naouset al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib46)\), Leiet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1197)\)\. These approaches aim to reduce this gap between users’ intentions and what they specify when interacting with an LLM\. The improving capabilities of models are only beneficial to users in so far as they can harness them\. Efforts coming from both directions — making it easier for users to guide model outputs with less effort and improving models’ understanding of users — are required to reduce this gulf of envisioning\.

#### 2\.2\.2Interpreting LLM Outputs\.

So far, our discussion of human\-LLM interaction has mainly focused on how users communicate intent to models\. We now turn our attention to how users evaluate and make sense of model outputs\. As\-is, model outputs can be verbose and unstructured, making it difficult for users to understand and leaving them feeling overwhelmedJianget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1733)\)\. To address this challenge, prior work in human\-computer interaction has proposed new interfaces to support user*sensemaking*— or providing external representations to encode data for task\-specific purposes\. For example, works such as SensecapeSuhet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1734)\)and GraphologueJianget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1733)\)provide interactive visualizations that allow users to visually explore LLMs’ outputs in a structured format rather than having to parse large amounts of text\.

Beyond helping users understand the content that LLMs output, we also must interrogate how users subsequently interpret and act upon these outputs\. Questions surrounding user trust and reliance on AI systems are long\-standing issues that predate the rise of LLMsPapenmeieret al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib47)\), Vasconceloset al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib50)\)\. Nonetheless, LLMs introduce new challenges to these established problems\. The complex nature of these models means that understanding why and how models produced the outputs is inscrutable to experts and end users alikeAmeisenet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1254)\)\. However, user trust and reliance are not solely model\-specific issues; the design of LLM applications also introduces new challenges\. For example,Swoopeset al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1649)\)demonstrate how the chat\-based design of most user\-facing LLMs can hide the inherent stochasticity of the models, making it difficult for users to calibrate trust\. Furthermore, anthropomorphic features of models can further foster user trust, even when it may be unwarrantedCohnet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib51)\)\. To combat these issues, researchers have proposed design features that can foster appropriate reliance, such as generating explanations, expressing uncertainty, and adding sources to claimsZhouet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1473)\), Kimet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1499)\)\. These solutions are not perfect; for instance, significant technical challenges remain in ensuring that cited sources are correct and relevant\. The hypothesized benefit of features like sourcing is their potential to engage users in slower, more careful thinking, but how to design LLMs that effectively empower users’ critical thinking over their outputs remains a key open question\.

#### 2\.2\.3Navigating Human\-LLM Relationships\.

Finally, we consider the evolving role of LLMs relative to users\. Traditionally, AI systems have functioned as assistants: tools that can augment human capabilities in restricted ways and that require users to delegate tasks\. However, as model capabilities approach or exceed human performance in certain domains, visions of models as equal collaborators are becoming increasingly plausible\. For instance,Shaoet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1124)\)articulates a framework in which humans engage in bidirectional collaboration with LLM\-based agents across a diverse set of tasks\. Shifting roles from assistant to collaborator carries significant implications for the design of LLMs\. For example, while assistants wait for explicit instructions to execute tasks, a model that serves as a collaborator might proactively suggest alternative approaches, challenge assumptions, or redirect problem\-solving strategies, demanding fundamentally different interface affordances\. These questions echo and build upon classic debates in HCI between direct manipulation interfaces, which afford users high degrees of control, and interface agents, which can act autonomously on users’ behalfShneiderman and Maes \([1997](https://arxiv.org/html/2605.06901#bib.bib1125)\)\. Determining how much control is required is not prescriptive but rather will be modulated by contextual factors such as the type of task or user expertise\.

Furthermore, as LLMs adopt more expansive roles beyond serving as functional tools in our lives, we must also consider the*affective*dimension in human\-LLM interaction\. Models are increasingly anthropomorphized, meaning that they are perceived as having human\-like characteristics — a fact that is exacerbated by the linguistic expressions in generated outputsChenget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib297),[2025a](https://arxiv.org/html/2605.06901#bib.bib43)\)\. Already, users are interacting with these models not only as coworkers or collaborators but also as friends, companions, and romantic partnersZhanget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1642)\), Pataranutapornet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1486)\)\. LLMs are also increasingly used in sensitive domains, such as providing emotional support or being used for therapy purposesZao\-Sanders \([2025](https://arxiv.org/html/2605.06901#bib.bib45)\)\. These affective interactions are also not always intentional\. For example,Zhanget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1642)\)observed that companionship\-oriented interactions emerge even when users are not primarily interacting with LLMs for emotionally laden tasks\. While human\-LLM relationships can have positive outcomes for users, including combating loneliness and reducing distressDe Freitaset al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1495)\), there are also many documented adverse impacts, such as fostering emotional dependenceLaestadiuset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1488)\), Pentinaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1489)\), encouraging harmful behaviorZhanget al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1487)\), Dupré \([2024](https://arxiv.org/html/2605.06901#bib.bib1490)\), or, in the extreme, triggering cases of intense mental health crisesMorrinet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1650)\)\. Balancing the trade\-offs between these benefits that human\-LLM relationships can have with these very real harms is an open challenge that necessitates interdisciplinary interventions from technologists, social scientists, ethicists, and policymakers\. As an example of a work in this area,Kirket al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1643)\)called for the community to prioritize the socio\-affective alignment of LLMs, accounting for how models fit into and actively shape individual users’ social and psychological ecosystems\. Nonetheless, how exactly we design human\-LLM relationships that promote user well\-being in the long\-term or other pro\-social outcomes remains an open area for exploration\.

#### 2\.2\.4Designing for Diverse Cultures and Contexts\.

Finally, we turn our attention to the critical challenge of designing LLMs that are sensitive and adaptive to diverse cultural contexts\. LLMs are not culturally neutral artifacts\. As discussed in more detail in §[3](https://arxiv.org/html/2605.06901#S3), they are trained predominantly on data from Western, Educated, Industrialized, Rich, and Democratic \(WEIRD\) societiesMihalceaet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1646)\)\. As such, models inherently encode and propagate specific cultural values, communication styles, and social normsNaouset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1)\), Durmuset al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib420)\), Ryanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1068)\)\. These misalignments can render models unhelpful, at best, and culturally insensitive or actively harmful to users\.

HCI offers critical theoretical lenses to help us not only question the assumptions underlying LLM design but also offer more generative opportunities for design\. For example,Bardzell \([2010](https://arxiv.org/html/2605.06901#bib.bib1135)\)’s foundational work on Feminist HCI argues that we ought to be attending to marginalized user groups when designing rather than focusing only on a presumed “default”\. In this vein, other critical theories on postcolonial computing and literature on decolonial practices have urged designers to decenter dominant Western perspectives and account for the plurality of worldviews and epistemologies that existIraniet al\.\([2010](https://arxiv.org/html/2605.06901#bib.bib1130)\), Alvarado Garciaet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1493)\)\. These works push designers to move beyond simply “de\-biasing” models and instead question the fundamental assumptions embedded within them: whose knowledge is centered, whose values are prioritized, and whose ways of being are marginalized? For example, users from different cultural backgrounds may have different expectations from what they wanted out of an ideal AI system and use cases, requiring that we are able to localize models to these needs rather than assuming a single defaultGeet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1109)\), Qadriet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1131)\), Phutane and Vashistha \([2025](https://arxiv.org/html/2605.06901#bib.bib1492)\)\. Systematically prioritizing knowledge from certain groups or cultures over others is not simply a design challenge for better human\-LLM interaction but can present quality\-of\-service differences with material impacts on usersWilsonet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1494)\), Devet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1491)\)\. How to design, build, and evaluate LLMs that are genuinely context\-aware and culturally adaptive remains a significant and vital open question for the field\.

### 2\.3Expanding the*How*: Methods from HCI for HCLLMs

Finally, in addition to providing new perspectives on how we ought to design HCLLMs, HCI also introduces a set of methods that researchers and practitioners can employ in pursuit of these goals\. Given HCI’s interdisciplinary roots, there are many “ways of knowing” or methodological orientations that researchers employ within the field\. In this section, we will discuss the applicability of three classes of research methods — experimental methods, participatory research, and grounded theory — that are particularly applicable to HCLLMs and are less utilized within NLP\. For a comprehensive overview of other methodological practices in HCI, we refer readers toOlson and Kellogg \([2014](https://arxiv.org/html/2605.06901#bib.bib1695)\)\.

#### 2\.3\.1Experimental Research

An important step in designing and evaluating HCLLMs requires measuring their impact on human outcomes, rather than only characterizing behavior\. While traditional methods in NLP, such as benchmarking, provide a fast and scalable way to evaluate models, these numbers are often divorced from the context that LLMs are applied and do not capture their impact in practiceRajiet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1041)\), McIntoshet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib889)\)\. Unlike observational studies, which are useful in informing us whether variables are related, experimental research helps reveal many types of relationships \(e\.g\., association, causality\) between variables of interestGergle and Tan \([2014](https://arxiv.org/html/2605.06901#bib.bib42)\)\. Experiments can vary along the spectrum of research control, ranging from laboratory experiments to field experiments that are conducted in real\-world settings but often have many factors that researchers cannot control forOulasvirta \([2008](https://arxiv.org/html/2605.06901#bib.bib41)\)\.

For HCLLMs, experimental methods enable researchers to move beyond measuring what models can do to understanding what they*actually*do and*why*\. First, experimental methods also allow researchers to better understand mechanisms that explain observed patterns of behavior\. For example, to better understand why users become dependent or overreliant on LLMs, recent workChenget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib40)\)focused on the construct of*social sycophancy*, finding through a series of laboratory experiments, that users are more likely to trust and also use sycophantic models\. Isolating specific mechanisms can then inform design interventions that are first validated intrinsically \(i\.e\., benchmarks\) and then extrinsically with additional experiments\. Second, experimental methods can quantify the utility of LLMs when deployed in practice, allowing researchers to evaluate outcomes rather than capability\. A number of field experiments have started, looking at LLMs’ impacts across domains including educationCuna and Shen \([2025](https://arxiv.org/html/2605.06901#bib.bib1766)\), Wanget al\.\([2025d](https://arxiv.org/html/2605.06901#bib.bib1413)\), software engineeringBeckeret al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib39)\), healthcareKoromet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib9)\), and so on\. For example,Beckeret al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib39)\)’s work provides evidence that challenges the assumption that LLMs increase developer productivity, demonstrating through a field experiment that using these tools increases the time it takes for developers to complete their tasks\. These findings complicate our understanding of models and point to the exogenous factors \(e\.g\., institutional endorsement, workflow integration, user onboarding\) that affect outcomes\. Overall, moving from what a model can perform within a clearly scoped benchmark setting to how it impacts users or society writ large — and why — is a critical contribution from HCI’s methodological toolkit that can complement traditional NLP evaluations\.

#### 2\.3\.2Participatory Approaches

Building HCLLMs requires us to engage with the communities that are impacted by and using these technologies\. A useful approach for understanding how we can work toward this engagement is to draw on participatory research methodologies\. More than being a prescriptive methodology to follow, “participatory research” delineates an orientation towards involving relevant stakeholders in the knowledge creation processBergold and Thomas \([2012](https://arxiv.org/html/2605.06901#bib.bib32)\)\. Examples of methods that fall under this broader umbrella of “participatory research” include participatory action researchHayes \([2011](https://arxiv.org/html/2605.06901#bib.bib37)\)and community\-based participatory researchUnertlet al\.\([2016](https://arxiv.org/html/2605.06901#bib.bib34)\), Wallersteinet al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib38)\)\. For a more extensive list of participatory research frameworks, we refer the reader toVaughn and Jacquez \([2020](https://arxiv.org/html/2605.06901#bib.bib33)\)\. Despite the many variants that fall under participatory research, a unifying emphasis is on fostering a democratic and inclusive process that treats stakeholders or community members as equal research collaborators rather than subjectsVaughn and Jacquez \([2020](https://arxiv.org/html/2605.06901#bib.bib33)\), Bergold and Thomas \([2012](https://arxiv.org/html/2605.06901#bib.bib32)\)\.

Participatory approaches have been adopted in creating machine learning solutions, such as developing context\-specific models for detecting feminicideSureshet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1128)\)or providing frameworks for communities to articulate algorithmic policiesLeeet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib30)\)\.Tsenget al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib31)\)provides an example of how participatory research approaches can be applied to designing HCLLMs\. Through interviews with different stakeholders in the journalism ecosystem \(e\.g\., reporters, editors, executives\),Tsenget al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib31)\)articulate how LLMs must be designed to address journalists’ needs, such as using an open\-sourced model that can be fine\-tuned for their tasks rather than relying on commercial offerings\.

While we have listed exemplars of work that have adopted participatory approaches, it is important to state that participation is neither a panacea to the many of the challenges with creating HCLLM nor is it easy to institute these methods in practice\. These methods require building trust with communities, which can be a long\-term process requiring significant time investmentLe Dantec and Fox \([2015](https://arxiv.org/html/2605.06901#bib.bib36)\)\. To ensure that participatory approaches are equitable in practice, it further requires centering communities, especially those who do not hold positions of power, and ensuring that research tangibly benefits these individuals rather than acting as an extractive forceHarringtonet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib35)\)\.

#### 2\.3\.3Qualitative Inquiry

Qualitative methods encompass a broad range of approaches, including ethnographic studies, interviews, participatory design workshops, thematic analysis, and other interpretive techniquesBlandfordet al\.\([2016](https://arxiv.org/html/2605.06901#bib.bib1253)\)\. For HCLLMs, existing work has also sought to extract insights by analyzing users’ chat histories with models, such as common interaction patterns or high\-level values reflected in model responsesTamkinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib593)\), Huanget al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib591)\)\. While quantitative measures reveal how well models perform or what users do, qualitative approaches provide a deeper understanding of why and how individuals think about, interact with, and make sense of these systems\. AsGeertz \([2008](https://arxiv.org/html/2605.06901#bib.bib1129)\)illustrates through the notion of “thick description,” qualitative methods enable researchers to capture the meanings, contexts, and social dynamics that underlie behavior, rather than merely documenting surface\-level patterns\.

Qualitative methods offer several particularly valuable contributions to HCLLMs\. First, they can uncover needs and harms in underrepresented populations, revealing how specific groups engage with LLMs and the nuanced benefits and harms they experience—insights that aggregate metrics often obscure \(e\.g\.,Maet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1132)\)’s interviews with LGBTQ\+ individuals exploring LLMs in mental health support;Qadriet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1131)\)’s workshops with South Asian participants examining cultural misrepresentations in AI\)\. Qualitative methods also help generate new design insights and hypotheses by surfacing unexpected use patterns, workarounds, and unmet needs that can inform future system design\. Through providing this rich description about how users interact with these models, qualitative work enables contextual understanding of how LLM interactions are embedded in broader social and work practices, revealing dependencies and consequences that otherwise may be missed\. For example,Tamkinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib593)\)’s analysis of user interactions with Claude revealed areas of unsafe behavior that their current guardrails were not designed to catch, allowing them to refine their system design through empirical insights\. Finally, qualitative approaches complement and enrich quantitative findings by helping triangulate results and providing interpretive depth that explains why certain patterns emergeZhaoet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1126)\)\.

### 2\.4From Human\-Centered Design Challenges to Technical Artifacts

Addressing these design challenges for HCLLMs requires coordinated interventions and across the entire LLM development pipeline — from the data curation and model training processes to the user interface design\. Already when introducing these challenges, we have discussed interface\-level interventions\. For example, at the model level, improving model capabilities via post\-training, such as instruction tuning and preference learning \(as discussed in §[4](https://arxiv.org/html/2605.06901#S4)\), can help models better interpret underspecified prompts, helping bridge the gulf of envisioning\. Decisions that developers make around what sources to include in the training data will affect models’ capabilities across different cultures and context \(§[3](https://arxiv.org/html/2605.06901#S3)\)\. Beyond the model training pipeline, choices around the technical design of the system, such as how to handle source attribution or what safety guardrails are put in place \(§[6](https://arxiv.org/html/2605.06901#S6)\), will mold users’ interactions, impacting relationships that may form\. Nonetheless, the critical point of this chapter is that these technical interventions are most effective when informed by and evaluated against the human\-centered principles outlined above\. The challenges we have charted are fundamentally*sociotechnical*problems; they cannot be solved by better models alone, nor by better interfaces alone, but through the careful co\-design of both\.

#### 2\.4\.1Case Study: Motivating Physical Activity with HCLLMs

To make the human\-centered LLM design process concrete, we present a case study based on two related systems for LLM physical activity coaching: \(1\) GPTCoachJörkeet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1796)\), an LLM\-based coach that implements motivational interviewing, and \(2\) BloomJörkeet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1797)\), a mobile application that integrates GPTCoach with established, UI\-based behavior change interactions\. Physical inactivity is a major public health concern, with large portions of the population falling short of recommended guidelines for physical activity\. LLMs present a promising opportunity to combine the scalability of existing mobile health interventions with the personalization of human coaching\. Through this case study of designing an LLM health coach, we illustrate how a*human\-centered*process can help realize this opportunity\.

Let us start by considering the status quo approach that treats training a good LLM health coach as an instruction\-following problem\. The approach is to collect user data \(e\.g\., common barriers to activity, wearable data\), feed it to a model, and have the model generate personalized nudges, exercise plans, or advice\. This framing is an intuitive starting point, and it maps cleanly onto standard LLM training pipelines\. However, it also implicitly encodes a set of assumptions: that users always want or need recommendations and advice, that more information yields better outcomes, and that the primary bottleneck is the model’s ability to produce accurate health advice\.

GPTCoach and Bloom instead took a human\-centered approach, exemplifying the following three concepts discussed in the chapter:

- •Working with stakeholders to shape the system\.Rather than starting from the status quo,Jörkeet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1796)\)conducted formative interviews with health experts and prospective end\-users\. These interviews revealed that health experts emphasized the importance of facilitative, non\-prescriptive support that refrains from giving unsolicited advice—a mode of engagement that runs counter to how a standard LLM chatbot operates\. Experts described their role as staying “in the passenger seat” and helping clients surface their own goals and barriers, rather than telling them what to do\. This learning fundamentally reframed the problem to be solved, the system that was designed to address it, and the evaluation criteria\. Moreover, engagement with experts did not end after the formative study\. After the lab study, the authors hired trained experts to code all transcripts to measure adherence to motivational interviewing\. Expert coding indicated that GPTCoach used conversational strategies that were consistent with motivational interviewing or neutral over 93% of the time, but qualitative feedback revealed important gaps compared to skilled human practitioners, highlighting specific areas for improvement that would not have surfaced without expert involvement\.
- •Focusing on the interaction\.A second theme from this case study is that model capability, while important, is not sufficient in and of itself for a successful human\-centered system\. While the authors could have devoted significant efforts to training, prompt chaining proved sufficient to enable LLM\-based motivational interviewing\. This created space for the authors to focus on how different aspects of the interaction design could shape users’ experiences in substantive ways\. For example, GPTCoach’s non\-prescriptive and non\-judgmental communication style had a greater impact on participants’ overall experience than its analysis of their health data\. In Bloom, the authors represented the coaching agent as abeeavatar named Beebo\. Beebo’s capabilities are the same as those of a generic chatbot, but it’s representation substantially changed the nature of the interaction\. Many participants resonated with the avatar and described Beebo in relational terms, leading to increased engagement and adherence\. Beebo’s clear role as a “coach,” not a general purpose assistant, helped set expectations when Beebo redirected conversations back towards physical activity, or when guardrails triggered refusals for medical advice\. This dynamic points to the design challenge of navigating human\-LLM relationships from Sec\.[2\.2](https://arxiv.org/html/2605.06901#S2.SS2)\. Taken together, these design choices reflect a move beyond thinking narrowly about model performance toward a more holistic understanding of how users will interact with these systems\.
- •Evaluating with users\.Finally, this case study showcases how the different methods from HCI offer new ways of knowing for evaluating HCLLMs\. In a four\-week randomized field study \(N=54\) comparing Bloom to a no\-LLM control, the authors used a mixed methods evaluation, synthesizing insights across qualitative coding of participant interviews, survey data, app usage logs, and wearable data\. The quantitative data revealed a 5x increase in overall app usage time in the LLM condition, while mean physical activity levels stayed comparable in both conditions\. Meanwhile, survey measures revealed substantial shifts in physical activity mindsets and satisfaction\. Qualitative coding added rich nuance to these findings, with participants in the LLM condition reporting stronger beliefs that activity was beneficial to their health, greater enjoyment of exercise, an expanded appreciation of “what counts” as activity, and increased self\-compassion when goals were missed\. Most importantly, participants attributed these mindset shifts to interactions with Beebo that kept them in control of their own behavior change, such as finding flexible alternatives when plans fell through\. More broadly, this illustrates how qualitative “thick” understanding can surface thewhybehind user experiences, which are insights that a purely quantitative evaluation might miss\.

Overall, this case study demonstrates how critically engaging with humansearlyin the design process can reframe the problem being solved and the evaluation target, leading to qualitatively different solutions that better serve human needs\. Notably, the most consequential outcomes in both studies—positive changes in participants’ beliefs about physical activity and their own abilities—emerged from the design process rather than from improvements in model capability\.

## 3Data for HCLLMs

Data is central to the language modeling paradigm, just as it has been throughout the history of machine learning\(Halevyet al\.,[2009](https://arxiv.org/html/2605.06901#bib.bib1571)\)\. Every stage of language model development, from pretraining to evaluation and deployment, depends on the availability of massive text corpora\(Sunet al\.,[2017](https://arxiv.org/html/2605.06901#bib.bib1570)\)\. Critically, the scale, diversity, and quality of this linguistic data can determine a model’s downstream utility for users\(Kaplanet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib683), Liuet al\.,[2024c](https://arxiv.org/html/2605.06901#bib.bib682), Zhouet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib1451)\)\. Data quality and quantity can quickly become bottlenecks\(Villaloboset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1236)\), limiting progress in AI\(Longpreet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib1568)\)\. Thus one of the most urgent challenges in AI is in identifying diverse and representative sources of data\.

From the human perspective, data is more than the fuel behind AI progress\. Data is a dynamic reflection of lived human experience\. It reflects the people, institutions, cultures, histories, and social contexts that produce it\. In this sense, data is never neutral\. Rather, data encodes viewpoints and values\(Dotan and Milli,[2020](https://arxiv.org/html/2605.06901#bib.bib225)\), assumptions and biases\(Paulladaet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1569)\), and even political and social structures\(Scheuermanet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1732), Miceliet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1606), Capel and Brereton,[2023](https://arxiv.org/html/2605.06901#bib.bib261)\)\. The origins of such data may be the subject of legal claims and privacy concerns, and its content may be highly personal or sensitive\(Bender and Friedman,[2018](https://arxiv.org/html/2605.06901#bib.bib535)\)\. To understand the human impact of LLMs, it becomes necessary to consider the human origins of data that shapes our models, particularly in pre\-training, instruction tuning, and alignment \(Figure[3](https://arxiv.org/html/2605.06901#S3.F3)\)\.

In the first subsection of this chapter, we examine the\(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\)provenanceof data used to develop LLMs\. We ask where this data comes from, who produced it, under what conditions it was produced, and how it was transformed throughout this process\. In this way, we recognize how data encodes implicit values, perspectives, and cultures that shape LLM behavior\. From here, we are positioned to understand human\-centered concerns around\(§[3\.2](https://arxiv.org/html/2605.06901#S3.SS2)\)representation and bias, how the data’s origins systematically skews, misrepresents, and erases the perspectives of underrepresented groups, leading to representational and allocational harms\. While rich community and personal data may be used to mitigate some of these harms, we consider issues around\(§[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\)consent and ownership\. Finally, we consider some of the biggest data challenges facing LLM developers today, and how proposed solutions like\(§[3\.4](https://arxiv.org/html/2605.06901#S3.SS4)\)synthetic dataaccount or fail to account for the human\-centered objectives we have outlined\.

### 3\.1Data Provenance

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/03_Data.png)Figure 3:This chapter focuses on the human origins of data \(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\), and how data encodes perspectives and values that impact HCLLM outcomes, from representation and bias \(§[3\.2](https://arxiv.org/html/2605.06901#S3.SS2)\) to consent and ownership \(§[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\)\. In particular, we considerpre\-training data,instruction\-tuning data, andalignment data\.Data provenance\(Longpreet al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib307)\), also calleddataset genealogy\(Dentonet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1605)\), is the record of a dataset’s origins, history, and transformations throughout its lifecycle\. An understanding of data provenance is critical for achieving transparency in LLM development\(Bommasaniet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib241)\)\. Without transparent data practices, it becomes difficult to predict and understand why LLMs leak private information\(Kandpalet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib118), Bubecket al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1457)\), violate copyrights\(Carliniet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib265)\), or perpetuate social biases\(Dentonet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1605)\)\. But with data provenance, stakeholders become more equipped to audit models\(Mökanderet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib186)\)and tackle these human\-centered concerns\. In this section, we will investigate the provenance of data used for pre\-training, instruction\-tuning, and aligning LLMs\. In particular, we ask where data comes from, who produced it, and under what conditions it was produced\.

#### 3\.1\.1Pretraining Data

Data Sources\.The provenance of LLM pretraining data is often complex, layered, and opaque\. Unlike the small, curated datasets of traditional machine learning, LLM pretraining corpora tend to be huge, multi\-trillion token aggregations across heterogeneous and potentially noisy sources\. These sources traditionally include web text, digitized books, and open\-source code repositories\(Wenzeket al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1300), Raffelet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1034), Soldainiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1145), Devlinet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib762)\)\. Although different model developers use different data mixtures, most incorporate an open web crawl that at least partially intersects the[Common Crawl](https://commoncrawl.org/)\. The Common Crawl contains monthly snapshots of “open web” — partial samples of machine\-crawlable sites reached from seed URLs that were initially crowdsourced in 2008\(Baack,[2024](https://arxiv.org/html/2605.06901#bib.bib185)\)\. This is not a random sample of the internet\. Large web crawls like this favor wikis, news sites, blogs, and other user\-generated content platforms, which are generally multilingual, but heavily skew towards English\(Baack,[2024](https://arxiv.org/html/2605.06901#bib.bib185)\)\. Much of this data has been found to be socially undesirable, with a high prevalence of hate speech and sexually explicit content\(Luccioni and Viviano,[2021](https://arxiv.org/html/2605.06901#bib.bib853)\)\. The data can also reify social, cultural, and political biases\(Naouset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1), Fenget al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib457), Navigliet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib923)\)\. Finally, this web\-scale data inextricably encodes the structural biases of the web itself, where the majority of content is produced by an active minority of users, and these users overly represent Western, Educated, Industrialized, Rich and Democratic \(WEIRD\) populations\(Baeza\-Yates,[2018](https://arxiv.org/html/2605.06901#bib.bib649)\)\.

Quality Filtering\.Because open web data is noisy, redundant, low\-quality, and often socially undesirable, model developers use classifiers or heuristics\(Chenet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib195), Penedoet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1561), Raeet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1031), Soldainiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1145)\)to filter their pre\-training corpora for unique and high\-quality documents that are information dense and free from toxicity or personally identifiable information\(Longpreet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib164)\)\. By filtering pre\-training data in this way, researchers can train safer models with better performance at lower computational costs\(Duet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib415), Raeet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1031), Albalaket al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib163)\)\. The distributions of these filtered corpora are shaped by sampling decisions, including the filters used to determine document quality\. These quality filters often systematically exclude both communities and discursive topics\(Lucyet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1653)\)\. For instance, toxicity classifiers often exhibit racial and linguistic bias\(Sapet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1153), Dodgeet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1327)\)\. Quality filters trained on Wikipedia and OpenWebText tend to favor text from wealthy, educated, urban areasGururanganet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1685)\)\.

After language identification\(Conneau and Lample,[2019](https://arxiv.org/html/2605.06901#bib.bib337), Laurençonet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib336)\)and deduplication\(Leeet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib766)\), model\-based filtering is one popular quality filtering approach\. For example, perplexity\-based methods filter noisy documents that appear as highly surprising to a much smaller language model\. CCNet\(Wenzeket al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1300)\)was constructed as a subset of the Common Crawl, filtered with 5\-gram language models that the authors trained on Wikipedia data for each target language\. They use a fastText classifier\(Joulinet al\.,[2017](https://arxiv.org/html/2605.06901#bib.bib328)\)for language identification and run each deduplicated document through the appropriate 5\-gram model to compute perplexity, filtering based on a heuristic and language\-specific perplexity threshold\. Disconcertingly, this pipeline effectively filters out minority dialects and low\-resource languages\(Albalaket al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib163)\)for which language ID is unreliable\(Caswellet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib166), Kuduguntaet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib167)\), or the perplexity model is overfit to only a small corpus\(Fenget al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib457), Lucyet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1653)\)\. Perplexity\-based filtering will retain text that matches the language distribution the filtering model was fit to\. When the standard is Wikipedia, filtering will primarily preserve Standard American English in the third person, written in a neutral, semi\-formal and broadly readable register, with clear declarative sentences\. Another idea is to prompt existing LLMs to estimate the quality of pre\-training data zero\-shot using some manually\-written definition of high quality data\(Sachdevaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib334), Wettiget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib333), Penedoet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib329)\)\. This was the approach used for Llama\-3\(AI@Meta,[2024](https://arxiv.org/html/2605.06901#bib.bib822)\)\. However, manually\-written definitions are brittle and may not encompass task\-specific or user\-specific notions of LLM utility\(Heldet al\.,[2025b](https://arxiv.org/html/2605.06901#bib.bib851)\)\. A third idea is to fine\-tune a small model like fastText\(Joulinet al\.,[2017](https://arxiv.org/html/2605.06901#bib.bib328)\)as a binary quality classifier\. The binary classifier was the approach used by DataComp for Language Models\(DCLM; Liet al\.,[2024d](https://arxiv.org/html/2605.06901#bib.bib1321)\), as this resulted in models with higher scores on general benchmarks like MMLU\(Hendryckset al\.,[2021b](https://arxiv.org/html/2605.06901#bib.bib563)\)\. However, methods like this are prone to overfitting on the training set, the construction of which itself reflects and reifies the values of model developers\.

Another popular filtering mechanism is to use content heuristics like domain name blacklists, and toxic keyword dictionaries, which were used to construct the Colossal Clean Crawled Corpus \(C4\)Xueet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1323)\), Raffelet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1034)\)\. The English C4 was found to skew heavily towards Wikipedia articles, patents, and United States news articles, such as the New York Times\(Dodgeet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1327), Elazaret al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib227)\)\. Most documents in the corpus had been published after the year 2011\. The multilingual variant, mC4\(Xueet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1323)\), represents 101 identified languages, but many of these languages are under\-represented\(Snæbjarnarsonet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1325)\)\. For example, compared to 2\.7T tokens of English, mC4 contains only 600,000 tokens of Javanese\(Ajiet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1324)\)\. Data for these lower\-resource languages is also much noisier than that of the English subset\(Kreutzeret al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1317), van Noordet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1326)\)\.

Many pretraining corpora aggregate documents from a variety of sources\. Popular examples include the Pile\(Gaoet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib484)\), RedPajama\(Weberet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1691)\), and Nemotron\-CC\(Suet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib820)\), which contain 300B, 1T, and 7T tokens respectively, sampled from the Common Crawl, as well as academic texts, books, coding, medical, and legal documentsBidermanet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1322)\), Weberet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1691)\)\. Over half of the documents in these data mixes are duplicated at least once, and some of these contain personally identifiable information like email and IP addresses, as well as toxic language\(Elazaret al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib227)\)\.

#### 3\.1\.2Instruction\-tuning Data

Compared to the origin story of pre\-training data, the provenance of instruction\-tuning datasets is relatively well\-known\. Data is typically aggregated by a single organization for the purpose of fine\-tuning a model to follow instructions\. Therefore the provenance of instruction dataset development resembles the distribution of model developers, over half of whom originate in the US or China\(Heldet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib183)\)\. Notable examples include Google’s FLAN\(Chunget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib309)\), AI2’s Natural Instructions\(Mishraet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib905)\), Stanford’s Alpaca\(Taoriet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib172)\), and Cohere’s Aya corpus\(Singhet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1136)\)\.

The instruction aggregation process often involves selecting a diverse range of tasks over which are constructed well\-formatted prompt\-output pairs\. Some organizations may opt to annotate these pairs entirely from scratch, like in theDatabricks \([2023](https://arxiv.org/html/2605.06901#bib.bib173)\)Dolly\-15k\. There are ethical and scientific benefits in such cases where data is sourced with explicit consent, attribution, and compensation\. However, this is not the norm, especially since doing so demands significant human labor\.

In many cases, instructions are sourced automatically from evaluation benchmarks via templates\(Chunget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib309), Longpreet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib829)\), which may be further translated\(Muennighoffet al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib915)\)or restructured using tertiary models\. The templates themselves typically have a human origin\. For example, the Natural Instructions dataset\(Mishraet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib905)\)was sourced from annotation guidelines that the benchmark developers constructed to onboard crowdworkers\. Sometimes, humans also write templates from scratch, especially in the early days of UnifiedQA\(Khashabiet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1238)\)and FLAN\(Weiet al\.,[2022a](https://arxiv.org/html/2605.06901#bib.bib1285)\), and in low\-resource settings like the multi\-lingual Aya corpus\(Singhet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1136)\)\. However, much of the data construction pipeline is automated\. This trend is growing as instruction\-tuning datasets are generated synthetically\. For example, the instructions used to fine\-tune Stanford’s Alpaca model\(Taoriet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib172)\)were distilled from GPT\-3\.5, a larger model which was itself instruction\-fine\-tuned\. This approach, called self\-instruction tuning\(Wanget al\.,[2023c](https://arxiv.org/html/2605.06901#bib.bib1264)\), has been adopted in a range of more recent work\(Penget al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib986), Liet al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib776)\)\. As we will discuss in §[3\.4](https://arxiv.org/html/2605.06901#S3.SS4), the use of synthetic data for self\-instruction tuning complicates data provenance, and may exacerbate the human\-centered concerns raised in this chapter\.

#### 3\.1\.3Alignment Data

Additional datasets are used formodel alignment, or the process of training more helpful and less harmful models via supervised fine\-tuning, preference tuning, and reinforcement learning from human feedback \(RLHF\)\(Askellet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib188)\)\. Since alignment data is what produces models that are useful to humans, it constitutes a major force behind the sudden proliferation in the number of LLM users worldwide\.

Since the notion of helpfulness or harmfulness is ambiguous and varies with different cultures and contexts, one might expect a commensurate heterogeneity in both the source and format of alignment data\(Ethayarajh,[2024](https://arxiv.org/html/2605.06901#bib.bib184)\)\. This is generally not the case\. With respect to the format, many alignment datasets assume a Bradley–Terry model of pairwise human preferences\. Datasets like Anthropic’s HH\-RLHF\(Baiet al\.,[2022a](https://arxiv.org/html/2605.06901#bib.bib196)\), OpenAI’s InstructGPT\(Ouyanget al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib963)\), and Peking University’s PKU\-SafeRLHF\(Jiet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib511)\)couple a user prompt with a pair of model responses: one preferred and one dispreferred\. With respect to data sources, many preference judgments come from a very small pool of annotators, sometimes within the organization itself\. For example, Peking University hired 28 internal annotators to construct PKU\-SafeRLHF, and Anthropic’s internal research team similarly hired and trained a small group of contractors to construct HH\-RLHF\.

Crowdsourcing and citizen science can serve to democratize the process of collecting alignment data\. One drawback of these approaches is sampling bias, which may favor researchers, AI enthusiasts, and individuals from industrialized nations\. Chatbot ArenaChianget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib366)\), also known as LMArena, is one example of a public web platform with open\-user participation in which volunteers engage with pairs of anonymous models and provide preference feedback in the standard binary format\. The project was initiated at the University of California Berkeley in 2023, and covers 96 languages, although the vast majority are in English\. OpenAssistant Conversations\(Köpfet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib346)\)is a similar crowdsourcing effort, initiated by the German non\-profit LAION in 2022\. Over 13k volunteers contributed alignment data in 35 different languages, particularly in English \(50%\), German \(20%\), and Spanish \(10%\)\. Of these annotators, 89\.1% identified as male, with a median age of 26\. These clear demographic biases above will skew the values, perspectives, and interests represented by this data\.

To address issues of demographic bias, some dataset developers intentionally target underrepresented demographics in their recruitment efforts\. For example, the PRISM Alignment DatasetKirket al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib708)\)is an academic project initiated at the University of Oxford, where the developers recruited Prolific workers from 33 underrepresented countries\. The Meta Community Alignment Dataset\(Zhanget al\.,[2026](https://arxiv.org/html/2605.06901#bib.bib819)\)is a similarly\-motivated multilingual preference dataset in which its 15k participants were recruited from five countries on YouGov\. Still, there remain limitations in recruiting diverse populations from crowdwork platforms, which have limited global coverage\(Rinderknechtet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1148), Douglaset al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib412), Palan and Schitter,[2018](https://arxiv.org/html/2605.06901#bib.bib1146)\)\.

Synthetic data is an emerging trend among subsections in this chapter, and it is largely motivated by the need to scale AI beyond what human annotation labor can support\(Casperet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib268), Santurkaret al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1074)\)\. Some LLM developers have considered synthetic data in the alignment step as well\. Variants of this approach include Constitutional AI\(Baiet al\.,[2022c](https://arxiv.org/html/2605.06901#bib.bib79)\)and Reinforcement Learning from AI Feedback\(RLAIF; Leeet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib113)\)\. Both approaches shift critical alignment decisions from data contributors to more centralized authorities: namely, the LLM\-as\-a\-Judge, and those who prompt it\. For example, in Constitutional AI, models judge their own output against the standards of a human\-written constitution, and then re\-write a better, constitution\-aligned response\. Anthropic’s original 2022 constitution was sourced from Western liberal\-democratic sources like the United Nations Declaration of Human Rights, the OECD, and Google’s AI Principles\. These frameworks employ individualist, rights\-based moral reasoning\(Haidt,[2012](https://arxiv.org/html/2605.06901#bib.bib1027)\), which may not represent other global ethical traditions, or incorporate the voices of pluralistic user bases\(Sorensenet al\.,[2024c](https://arxiv.org/html/2605.06901#bib.bib1152)\)\.

### 3\.2Data Representation, Bias and Ethics

The story of LLM training data is a story about whose voices become computationally legible and whose are overwritten or erased\. In §[3\.1](https://arxiv.org/html/2605.06901#S3.SS1), we considered how the story of data is shaped by its sources, filtering decisions, annotation pipelines, and synthetic generation practices\. Imbalances or biases in the provenance of LLM pre\- and post\-training data can exacerbate representational, allocational, and quality\-of\-service harms for those who use these models\. Representational harms include stereotyping, denigration, and misrecognition, when LLMs perpetuate and amplify distorted and harmful portrayals of personal identities and social groups\(Blodgett,[2021](https://arxiv.org/html/2605.06901#bib.bib230)\)\. Allocational harms arise when LLMs reinforce or amplify inequality in the distribution of opportunities and resources\(Barocaset al\.,[2017](https://arxiv.org/html/2605.06901#bib.bib206), Eubanks,[2018](https://arxiv.org/html/2605.06901#bib.bib443)\)\. Quality\-of\-service harms involve performance disparities across different user groups, which may cascade into both representational and allocational harms\. For further discussion on how to define and measure these harms, see §[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)\.

Sociotechnical harms become harder to diagnose when data provenance is incomplete\. Without visibility into the linguistic, cultural, and geographic origins of the data, as well as the filtering and curation pipelines, researchers cannot identify: \(1\) why the model stereotypes certain voices, \(2\) why specific groups are absent from generated outputs, or \(3\) how certain narrative tropes became dominant\. We will briefly discuss the relationship between data provenance and each of these harm outcome categories, as well as data\-based mitigation strategies\.

#### 3\.2\.1Quality\-of\-Service Harms

Quality\-of\-service harms are disparities in model utility for users from different sociodemographic groups\(Shelbyet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib727)\)\. These disparities are often rooted in the composition and curation of data as discussed in §[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\. Pre\-training data scraping, quality filtering, instruction\-tuning templates, and alignment data collection tend to over\-represent native English speakers from wealthy, Western nations and under\-represent the language and perspectives of marginalized communities\. As a result, LLMs introduce quality\-of\-service harms for individuals from these communities\(Shahet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1601)\)\.

LLM performance degrades for speakers of non\-standard language varieties or dialects on a wide range of tasks\(Kantharubanet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib664), Joshiet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib670)\), from text classification\(Lwowski and Rios,[2021](https://arxiv.org/html/2605.06901#bib.bib669)\)and machine translation\(Ahiaet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib663)\)to question answering\(Ziemset al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib671), Fleisiget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib668)\)and conversational AI\(Artemovaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib667)\)\. This inequitable distribution of utility to LLM users can result in allocational harms like inequitable wages and quality of life, especially as LLMs are integrated into the workplace\(Shaoet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib577)\)and becomegeneral purpose technologies\(Eloundouet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib236)\)\. Quality\-of\-service bias also contributes to the representational harm of erasure, and may derive in part from representational biases\.

#### 3\.2\.2Representational Harms

LLMs demonstrate representational harms when they propagate negative or skewed representations of social groups, including cultural misrepresentation, stereotypes, essentialist language, and erasure\(Chien and Danks,[2024](https://arxiv.org/html/2605.06901#bib.bib447)\)\. Representational harms can derive from pre\-training data\(Chuet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib107)\), not only from its explicitly harmful, stereotypical, and toxic language\(Luccioni and Viviano,[2021](https://arxiv.org/html/2605.06901#bib.bib853)\), but also from implicitly biased language\(Caliskanet al\.,[2016](https://arxiv.org/html/2605.06901#bib.bib257), Navigliet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib923)\), framing effects\(Fenget al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib457)\), and the sparsity of socioculturally representative data\(Naouset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1)\)\. Data quality filters exacerbate racial and linguistic biases that skew pre\-training data away from in\-group perspectives in favor of unrepresentative and misinformed out\-group perspectives\(Wanget al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib729)\)\. Post\-training data can further induce mode\-collapse, effectively flattening their representational distributions to portray groups one\-dimensionally\(Bisbeeet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib728), Durmuset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib422), Röttgeret al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib709)\)\. This kind of distributional flattening is a form ofessentializingthat is particularly harmful for groups historically portrayed as one\-dimensional\(Wanget al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib729)\)\.

Unsurprisingly, LLMs are known to generate harmful stereotypes in question\-answering\(Naouset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1)\), machine translation\(Ghosh and Caliskan,[2023](https://arxiv.org/html/2605.06901#bib.bib582)\), and open\-ended generation\(Dhamalaet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib393)\)\. These issues are only exacerbated when the prompts are written in non\-standard dialects, which may trigger demeaning or condescending responses from models\(Fleisiget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib668)\)\. LLM\-simulated personas also collapse into stereotypical caricatures\(Chenget al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib1602),[b](https://arxiv.org/html/2605.06901#bib.bib1599), Guptaet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib533)\)\. These simulations systematically misrepresent, flatten, and essentialize the perspectives of underrepresented groups based on protected characteristics like age, gender, and disability\(Wanget al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib729)\)\.

#### 3\.2\.3Allocational harms

Allocational harms are disparities in individuals’ access to material resources like jobs, housing, credit, healthcare, childcare, education, and transportation\(Cybereyet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib446)\)\. When LLMs are embedded in decision\-making systems, they can introduce, amplify, or otherwise reinforce allocational disparities, in part as a result of representational biases in the training data\(Senet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib730), Chien and Danks,[2024](https://arxiv.org/html/2605.06901#bib.bib447)\)\. In pre\-training, skewed representations can lead models to encode assumptions about who is qualified, creditworthy, employable, or deserving of services\(Mehrabiet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib895)\)\. During post\-training, alignment data privileges the annotators’ norms of professionalism, risk, and appropriate behavior\(Conitzeret al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib985)\), which appear in downstream allocational biases as follows\.

LLMs used in hiring decisions can be more likely to recommend less\-prestigious jobs to speakers of marginalized dialects\(Hofmannet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib581)\)\. In content moderation, LLMs are prejudiced against speakers of African American English\(Sapet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1153)\)\. In automated exam scoring, LLMs show disparate performance for students with backgrounds not represented in training data\(Schalleret al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib578)\)\. And more broadly, LLM decisions are biased against underrepresented groups across domains such as business \(i\.e\., funding a startup\), finance \(i\.e\., approving a credit card\), relationships \(i\.e\., resolving conflicts\), law \(i\.e\., issuing a passport\), science \(i\.e\., approving a research study\), and the arts \(i\.e\., awarding a filmmaking prize\)\(Tamkinet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib445), Levyet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib981)\)\.

#### 3\.2\.4Mitigating Harms

Mitigating sociotechnical harms requires interventions across the data pipeline\. The first step is to establish transparent data provenance through documentation practices likeDatasheets for DatasetsandData Statements\(Gebruet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib908), Bender and Friedman,[2018](https://arxiv.org/html/2605.06901#bib.bib535)\)\. By explicitly recording the linguistic, demographic, and geographic composition of datasets, as well as filtering and annotation decisions, making it easier to diagnose representational gaps and biases\. Model cards and system cards further extend this transparency to downstream users by documenting intended use, performance disparities, and known limitations\(Mitchellet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib907)\)\. Recent work argues that provenance\-aware documentation should include not only source descriptions but also transformation histories, like filtering, deduplication, and synthetic augmentation\(Scheuermanet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1732)\)\.

With transparent data provenance, a second mitigation step is to involve stakeholders in the process of data creation and diversification, using participatory methods\(Vaughn and Jacquez,[2020](https://arxiv.org/html/2605.06901#bib.bib33)\), following §[2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2)\. With community\-level organization, it is possible to develop rich data resources for low\-resource languages and underrepresented communities\(Orifeet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1731), Heidt,[2025](https://arxiv.org/html/2605.06901#bib.bib982)\)\. However, diversification alone may prove insufficient without governance structures that prevent extractive data practices and ensure ongoing community oversight\(Benjamin,[2023](https://arxiv.org/html/2605.06901#bib.bib984)\)\.

A third mitigation approach is to collect learnable data from user interactions with LLMs at the individual level\. Personalized alignment methods may be considered in which individual preference data is collected from user interactions and used to shape subsequent model behavior through prompt\-based\(Hebertet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib560)\), retrieval\-based\(Salemiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib743)\), or alignment\-based methods\(Ryanet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1250)\)\. For more discussion on this direction, see §[4\.4](https://arxiv.org/html/2605.06901#S4.SS4)\.

### 3\.3Consent and Ownership

The data used to pre\- and post\-train LLMs may include sensitive personal information \(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\)\. Such personal data may actually help LLM systems become more capable, useful, and proactive\. For example, LLMs can infer from users’ hidden behaviors, personal habits, and broader computer use patterns, what the user might need before they even make a request\(Shaikhet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1647)\)\. However, the use of private or personal data comes with an array of legal and ethical challenges\(Yanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib119), Subramaniet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib130)\)\. Sensitive, private, or copyrighted information can be inadvertently leaked or reproduced without authorization, further complicating compliance with laws and regulations\(Zhanget al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib143), Khan and Hanna,[2022](https://arxiv.org/html/2605.06901#bib.bib132), Wachter,[2019](https://arxiv.org/html/2605.06901#bib.bib138)\)\. Here, we will consider these concerns regarding the ownership of data\.

#### 3\.3\.1Data Privacy Considerations

Data privacy is broadly defined as the ability of individuals to control their personal information\. Privacy leaks can occur in data, either when sensitive personal information is explicitly encoded, or when this information can be inferred\(Yanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib119), Kshetri,[2023](https://arxiv.org/html/2605.06901#bib.bib133), Staabet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1160)\)\. In the former setting, LLMs can memorize personally identifiable information from training data and expose these details at inference time\(Yanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib119), Carliniet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib140), Staabet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1160)\)\. Attackers can exploit vulnerabilities in LLMs through methods such as backdoor attacks, membership inference attacks, and model inversion attacks, which can extract sensitive information embedded in the model during pre\-training or fine\-tuning\(Yanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib119), Carliniet al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib140)\)\. For instance,Carliniet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib265)\)demonstrated that it is possible to recover individual training examples, including names and phone numbers, by attacking the language model\. Similarly,Zhaoet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib141)\)revealed that LLMs could generate infringing content when prompted with partial information from copyrighted materials\. These risks are more severe in larger models with more parameters and when longer contexts are used in prompting, making it increasingly challenging to address these vulnerabilities effectively for current LLMs\(Karamolegkouet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib95), Carliniet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib265),[2023](https://arxiv.org/html/2605.06901#bib.bib115)\)\.

Additionally, legal frameworks play a critical role in governing the use of personal and copyrighted data\. Since 2018, the General Data Protection Regulation \(GDPR\) in the European Union has mandated data minimization, consent requirement, and the “Right to Erasure\." This can be re\-interpreted to apply to AI systems, though with limitations after the data collection process\(Neel and Chang,[2023](https://arxiv.org/html/2605.06901#bib.bib135), Wachter,[2019](https://arxiv.org/html/2605.06901#bib.bib138)\)\. More regulations and protocols are needed to comply with ethical obligations\. Copyright law introduces another layer of complexity\. Copyright law grants creators exclusive rights to use and distribute their work, with specific exceptions\. Under §107 of the United States Copyright Law, the fair use doctrine permits limited usage of copyrighted materials without permission, typically for purposes such as commentary, research, or information extraction, but not for verbatim reproduction\(Karamolegkouet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib95)\)\. With the increasing influence of LLMs, the use of online data has come under heightened scrutiny; justifications under principles like "Legitimate Interests" for personal data and "Fair Use" for copyrighted content are being questioned more rigorously\(Franceschelli and Musolesi,[2022](https://arxiv.org/html/2605.06901#bib.bib139)\)\. Notably, companies such as OpenAI, Stability AI, and Microsoft have faced various legal challenges, including consumer privacy lawsuits and copyright infringement claims, underscoring the growing contention surrounding privacy and copyright issues in AI development\(Brittain,[2024b](https://arxiv.org/html/2605.06901#bib.bib937), Claburn,[2023](https://arxiv.org/html/2605.06901#bib.bib938), Vincent,[2023](https://arxiv.org/html/2605.06901#bib.bib939), Brittain,[2024a](https://arxiv.org/html/2605.06901#bib.bib940)\)\.

#### 3\.3\.2Proactive vs\. Reactive Privacy Strategies

Adopting a proactive approach to privacy is essential\. Rather than deferring mitigation until after model training, privacy considerations should inform every stage of data collection and curation\. This includes implementing privacy\-preserving data collection protocols, robust anonymization techniques, and consent\-based frameworks from the outset\. For instance, it is critical to obtain consent and minimize sensitive information collection, employ more tools to detect and remove personally identifiable information, and use more sophisticated data anonymization techniques to better protect aganst privacy leakageYanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib119)\), Subramaniet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib130)\)\. Consent\-Based Data Collection should be adopted in scenarios like web scraping to respect individual’s rights\(Subramaniet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib130)\)\. Web architectures like SOLID\(Sambraet al\.,[2016](https://arxiv.org/html/2605.06901#bib.bib136)\)and Consent Tagging\(Zhanget al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib137)\)aim to streamline consent acquisition\(Zhanget al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib143)\)\.

For more reactive privacy strategies after the data collection stage, various techniques have been proposed to mitigate these issues\. Data cleaning methods aim to remove or generalize sensitive information from datasets before training\(Brownet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib250), Ouyanget al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib963), Baiet al\.,[2022a](https://arxiv.org/html/2605.06901#bib.bib196), Kandpalet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib118)\)\. Federated Learning approaches decentralize the training process to enhance privacy by keeping data local and aggregating updates instead of sharing raw data\(Chenet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib120), Yuet al\.,[2024c](https://arxiv.org/html/2605.06901#bib.bib121), Xuet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib122), Hooryet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib124)\)\. Differential Privacy methods extract useful statistical information from datasets without revealing individual data by introducing controlled random noise or applying aggregation techniques\(Hooryet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib124), Du and Mi,[2021](https://arxiv.org/html/2605.06901#bib.bib125), Liet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib126), Shiet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib127), Wuet al\.,[2022b](https://arxiv.org/html/2605.06901#bib.bib128)\)\. Additionally, Knowledge Unlearning techniques selectively forget or remove sensitive information from models to mitigate privacy risks\(Seyitoğluet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib142), Chen and Yang,[2023](https://arxiv.org/html/2605.06901#bib.bib129), Eldan and Russinovich,[2023](https://arxiv.org/html/2605.06901#bib.bib131)\)\.

#### 3\.3\.3Open Challenges in Data Privacy

Currently, privacy risks persist across the entire LLM lifecycle, encompassing not only model\-centric issues but also human\-centered factors\. From the data side, stronger anonymization techniques and tools capable of identifying memorized personal information must keep up with LLMs’ evolving capabilities\(Staabet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1160), Subramaniet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib130)\)\. It should also be cautioned when scaling HCLLMs, as discussed in §[4\.3](https://arxiv.org/html/2605.06901#S4.SS3), that risks from memorization also increase with scale if repeated data are in the training stage\(Hernandezet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1662)\)\. In addition, the complexities of obtaining consent, especially in scenarios involving third\-party or inaccessible data sources, underscore the need for more robust frameworks to ensure transparent data sourcing and meaningful user control\(Zhanget al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib143)\)\. HCI researchers now also advocate for improved LLM interaction paradigms, a deeper understanding of user mental models, and systems that enable end\-users to reclaim ownership over their personal data\(Liet al\.,[2024i](https://arxiv.org/html/2605.06901#bib.bib134)\)\. Despite significant progress in addressing data privacy concerns, much of the research focuses on well\-known LLMs with relatively small scales\. In contrast, recently released models with larger parameter sizes have received less attention due to the challenges posed by their scale, data transparency issues, and the lagging development of privacy\-preserving technologies\(Yanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib119)\)\. Overall, greater efforts are needed to enhance legal frameworks, strengthen regulatory oversight, and advance research and technology to better safeguard privacy and copyright in the era of LLMs that developers, users, and policymakers can jointly share\.

### 3\.4Expanding Data Sources: Synthetic and Non\-Traditional Data

#### 3\.4\.1Synthetic Data

We often lack high\-quality, diverse, and privacy\-compliant dataAlmeida \([2024](https://arxiv.org/html/2605.06901#bib.bib29)\)\. Filtering methods \(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\) can filter out as much as 90% of raw web text data from the Common Crawl\. To replace this data, synthetic generation is one solution employed in Nemotron\-CC\(Suet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib820)\)and other popular pre\-training corpora\. Synthetic data generation preserves individuals’ confidentiality, replicating only the statistical properties of real datasets without retaining any personally identifiable information\. LLM\-generated synthetic text can also serve as fine\-tuning and evaluation dataVongthongsri \([2025](https://arxiv.org/html/2605.06901#bib.bib28)\), where it is invaluable for addressing class imbalancesMoonet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib26)\), especially in domains like healthcareGuo and Chen \([2024](https://arxiv.org/html/2605.06901#bib.bib27)\)where data is sensitive, and mathematical reasoning where gold examples are costly to produceChanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib25)\)\.

##### Methods Used to Generate Synthetic Data\.

Even medium\-size language models can effectively expand pre\-training corpora by paraphrasing existing data\(Mainiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib818)\)\. Moreover, LLMs can effectively generate entirely new content from scratch, including textbooks for pre\-training\(Gunasekaret al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1263)\)and instruction\-tuning data for post\-training\(Wanget al\.,[2023c](https://arxiv.org/html/2605.06901#bib.bib1264)\)\. Procuring high\-quality synthetic data with LLMs typically involves three stages: generation, curation and evaluationLonget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib24)\)\. Generation often involves prompt engineering to elicit LLM responses in the required format\. This involves using strategies such as task definition, conditional prompting, in\-context learning, and multi\-step generation, which address context limitations and degradation over reasoning stepsLonget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib24)\), Wanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib23)\)\.

The generated data often contains noise or corrupted samples due to hallucination, and is generally curated using sample filtering and label enhancement techniques\. Sample filtering could involve simple heuristic\-based strategies or leverage the advanced language\-understanding capabilities of LLMs to generate confidence scores for data points based on quality and reject samples with low scoresChunget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib13)\)\. Label enhancement strategies could include human inspection and annotation of low\-confidence samples\. These techniques are described in[3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1)\.

After curation, the generated data must be evaluated for several components, including the statistical similarity between synthetic and real data, impact on model performance, and ensuring that synthetic data preserves essential patterns and relationships\-Xiaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib11)\)capture these requirements in their proposed fidelity, utility, and privacy framework\.

##### Making Synthetic Data More Human\-Centric\.

A human\-centered approach to synthetic data creation should explicitly incorporate human values, perspectives, and audits at all stages of development, from generation to curation and evaluation\. First, generation should serve to reflect authentic human interactions and preferences when real data collection proves slow or costlyHämäläinenet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1186)\)\. Rather than simply increasing dataset sizes, synthetic data should contain realistic social interactions between individuals with diverse personalities and backgrounds\. This requires persona alignment \(§[6\.2](https://arxiv.org/html/2605.06901#S6.SS2)\) or role\-play in which the LLM portrays a consistent identity\(Tsenget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib570)\), possibly simulating a person from a particular sociodemographic background\(Lutzet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1016)\), or an agent with a role, like a tutor or counselor\(Liet al\.,[2024e](https://arxiv.org/html/2605.06901#bib.bib1633), Samuelet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1632), Shanahanet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1630)\)\. Persona alignment has been used to generate synthetic dialogues\(Occhipintiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1017), Andukuriet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1166)\)and preference data\(Castricatoet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib271)\)\.

At the curation stage, stratified sampling should reflect real\-world distributions along known axes of variation, such as opinions and preferences\(Sorensenet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib458)\)\. Finally, robust human\-in\-the\-loop validation and auditing is essential\. Human annotators and experts can review synthetic outputs, flag problematic patterns, and iteratively refine generation procedures\. One major concern is that LLMs may reproduce biases and harms present in their training data, leaking private information or reinforcing existing social inequalities\.Daset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib324)\)compare LLM\-generated datasets with human\-annotated benchmarks and highlight ethical concerns related to disparities in task performance and representational coverage\.Chenet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1775)\)identify several failure modes in LLM\-generated query–answer pairs, including instruction\-following errors\. To mitigate these risks,Chenet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1775)\)propose unlearning techniques to improve the reliability of synthetic queries\. To preserve privacy,Rameshet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1013)\)propose decentralized frameworks designed to reduce the likelihood of sensitive information exposure during data synthesis\. These steps will ultimately enhance the quality, fairness, and usability of synthetic data to align with ethical standards and user expectations\.

#### 3\.4\.2Non\-traditional Data

Recent progress in LLM research has shown the value of using non\-traditional data to make models more human\-centered\. This discussion focuses on three primary areas\. The first is multimodal data, which allows LLMs to work with inputs like speech, images, and touch\. The second is human\-AI interaction data, such as user feedback, edits, and eye\-tracking, which helps improve how well LLMs understand and respond to user needs\. Lastly, human\-human interaction data uses examples of real human interactions to teach LLMs how people communicate, enabling models to better handle context, complex emotions, and relationships\.

##### Multimodal Data\.

Recent research has sought to expand Large Language Models to enable multimodalityYinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1377)\), Liet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib782)\), significantly enhancing human\-LLM interaction by allowing systems to process and respond to a diverse range of input formats beyond text, such as speechRubensteinet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1064)\), Huanget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib611)\), soundZhanget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1397)\), Huanget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib611)\), visionAchiamet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib153)\), Zhanget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1397)\), Liet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib776),[2023b](https://arxiv.org/html/2605.06901#bib.bib1001)\), Fuet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib474)\), and tactile dataFuet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib474)\), Yuet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1385)\), creating richer and increasingly human\-like communication channels\. By integrating multiple sensory modalities, AI can better mirror human communication, which could further improve Human\-AI interaction\. For instance, a multimodal AI assistant could analyze a user’s tone of voice, facial expressions, and spoken words to assess emotional statesZhanget al\.\([2024e](https://arxiv.org/html/2605.06901#bib.bib1409)\), Chenget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib296)\), tailoring its responses accordingly\. Recent works also explore integrating human physiological data \(e\.g\. EEG, BVP\) with LLMs to enhance empathic human\-AI interactionDongreet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib409)\)\. In applications such as education, healthcare, and accessibility, multimodality fosters inclusivity by accommodating users with diverse needsYildirimet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1374)\), Belyaevaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib214)\), Changet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib277)\)\. Ultimately, multimodal AI systems bridge the gap between machine efficiency and human communication, making interactions more seamless, adaptive, and human\-centered\.

##### Human\-AI Interaction Data\.

Expanding the scope of human\-AI interaction data has opened new pathways for enhancing Large Language Models through both supervised fine\-tuning and reinforcement learning with human feedback \(RLHF\)\. For example, Vicuna is trained with massive user\-shared conversations with GPT to achieve high quality outputsChianget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1234)\)\. Another valuable type of human\-AI interaction data is human edits, where users adjust the outputs of LLMs to better match their desired results\. This data can be leveraged to fine\-tune LLMs for improved preference alignmentShaikhet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1097)\)or to extract user preferences more effectivelyGaoet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib488)\)\. Beyond text\-based interaction data, untraditional modalities such as eye\-gaze signals offer additional interaction\. Eye\-gaze data, in particular, provides a real\-time, implicit feedback mechanism that enhances context awareness and alignment with user intentKonradet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib719)\), Prokofievaet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib1018)\), Engelet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib434)\), Lopez\-Cardonaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib833)\)\. These gaze\-based interactions have been shown to improve multi\-modal conversational understanding and can be leveraged in RLHF workflows to refine LLM outputs dynamicallyLopez\-Cardonaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib833)\)\. The integration of gaze data into multi\-modal frameworks would help create richer, contextually adaptive systems, fostering more intuitive, personalized, and effective interactions across diverse applications\.

##### Human\-Human Interaction Data\.

Real world human\-human interaction data captures the nuances of human communication, including implicit cues, turn\-taking dynamics, and diverse conversational contexts\. Such data has the potential to improve LLMs by fostering deeper understanding of relational and situational context, thereby enabling models to generate responses that feel more natural, empathetic, and contextually appropriate\. Recent advancements demonstrate how mining teacher\-student interaction data, such as dialogue transcripts and collaborative problem\-solving sessions, can align LLM outputs with human cognitive and emotional patterns, which allow LLMs to address complex, interdisciplinary challenges in fields such as education, psychology, and social science by emulating and learning from authentic human interaction stylesWanget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1244)\), Xuet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1350)\), Wanget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1265)\), Wang and Demszky \([2024](https://arxiv.org/html/2605.06901#bib.bib1271)\)\. By leveraging human\-human interaction as an informative data source, we can expand the capacity of LLMs to foster meaningful, human\-centered interactions in diverse real\-world applications\.

The integration of multimodal data, human\-AI interaction data and human\-human interaction data can all help LLMs more closely approximate the complexity of human communication, in turn making models more usable and reliable across high\-impact domains like healthcare, education, and social services\. As we exhaust traditional text data sources, recent efforts, such as MINT\-1T, a multimodal text and image interleaved open\-source dataset generated byAwadallaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib21)\)will be fundamental to advance the performance of frontier models\.

## 4NLP for HCLLMs

Human\-centered LLMs are products of the multifaceted technical processes used to create them\. NLP techniques determine not only what models can do but also the boundaries of what they*cannot*\. These limitations can have particular consequences as users across diverse linguistic and cultural contexts interact with LLMs\.

Prior survey papers cover the technical practicalities and details of NLP methods for LLMsMinaeeet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib478)\), Zhaoet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib479)\)\. In this chapter, we instead focus on the human\-centered considerations across the language model training pipeline\. We have already discussed pre\-training practices in §[3](https://arxiv.org/html/2605.06901#S3)and will focus onpost\-training techniquesin this chapter\. Although post\-training recipes differ across models, two core components include a supervised fine\-tuning \(SFT\) stage \(§[4\.1](https://arxiv.org/html/2605.06901#S4.SS1)\) and a reinforcement learning stage that incorporates human preferences \(§[4\.2](https://arxiv.org/html/2605.06901#S4.SS2)\)\. We next discuss how the predominant paradigm of scaling applies to human\-centered objectives \(§[4\.3](https://arxiv.org/html/2605.06901#S4.SS3)\)\. Finally, we conclude by discussing three currently open challenges and future research directions for HCLLMs, coveringpersonalization \(§[4\.4](https://arxiv.org/html/2605.06901#S4.SS4)\), pluralistic alignment \(§[4\.5](https://arxiv.org/html/2605.06901#S4.SS5)\), and multilinguality \(§[4\.6](https://arxiv.org/html/2605.06901#S4.SS6)\)\. For a roadmap, see Figure[4](https://arxiv.org/html/2605.06901#S4.F4)\.

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/04_NLP.png)Figure 4:This chapter applies human\-centered considerations toexisting post\-training techniqueslike SFT and RLHF \(§[4\.1](https://arxiv.org/html/2605.06901#S4.SS1)\-[4\.2](https://arxiv.org/html/2605.06901#S4.SS2)\), and explores the limitations ofscaling for human\-centered outcomes\(§[4\.3](https://arxiv.org/html/2605.06901#S4.SS3)\)\. Finally, we cover open challenges inpersonalization \(§[4\.4](https://arxiv.org/html/2605.06901#S4.SS4)\), pluralistic alignment \(§[4\.5](https://arxiv.org/html/2605.06901#S4.SS5)\), and multilinguality \(§[4\.6](https://arxiv.org/html/2605.06901#S4.SS6)\)\.### 4\.1Supervised Fine\-tuning for HCLLMs

Following pre\-training, the subsequent stage in the pipeline involves some form of supervised fine\-tuning \(SFT\), where models are trained on curated datasets to align their outputs with specific objectives or use cases\. The goals of fine\-tuning vary depending on the desired capabilities and target applications\. For instance, existing models have employed SFT on step\-by\-step rationales and chain\-of\-thought reasoning examples to enhance their problem\-solving and reasoning capabilitiesMuennighoffet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib321)\), Olmoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib322)\)\. In other cases, SFT can also be used to adapt models for more bespoke, domain\-specific applicationsChenget al\.\([2025c](https://arxiv.org/html/2605.06901#bib.bib1768)\), Yueet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1772)\)\. Our discussion here focuses specifically oninstruction tuning— the supervised process of training LLMs on instruction\-response pairs — where models learn to follow diverse user instructions and generate appropriate responses\. Instruction\-tuning has become central to creating usable, general\-purpose conversational AI systems\. We briefly survey current practices in the literature before examining key tensions and exploring emerging frontiers for instruction tuning HCLLMs\.

#### 4\.1\.1Current Practices in Instruction Tuning

Instruction tuning is critical in achieving the instruction\-adherance and generalized problem\-solving capabilities that have helped popularized LLMs\. By guiding the models to follow explicit instructions and domain\-specific prompts, Instruction tuning improves LLMs’ capabilities to communicate in a human\-centered and user\-friendly manner\. While massive pre\-training on self\-supervised tasks improves the model’s understanding of language conventions and semantics, an LLM with an assistant\-like ability to respond conversationally and complete tasks makes LLMs far more useful as human tools\.

Already, we have seen the many successes of instruction tuning\. It has improved performance across diverse language models, from zero\-shot reasoning to domain\-specific tasks\. Beyond general alignment with human preferences, studies have explored ethical and domain\-specific challenges, highlighting the versatility of instruction tuning\. For instance,Prabhumoyeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1011)\)demonstrated how simply attaching toxicity metadata as part of the instruction template significantly reduces toxicity present in model outputs\. This approach suggests that explicitly encoding desirable or undesirable text qualities in instructions might enable the LLM to learn to promote or withhold similar texts with more ease\. This demonstrates how instructions can guide models to better align with social norms and ethical considerations\.

Moreover, domain\-specific instruction tuning for codingMuennighoffet al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib916)\), dialogue systemsOuyanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib963)\), financial analysisXieet al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1336)\), and multilingual translationZhuet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1476)\)shows that modest sets of targeted instructions can unlock robust capabilities without sacrificing general knowledge\. By specifying instructions to meet the requirements of each domain, LLMs can provide more reliable and appropriate outputs\. These successes underscore instruction fine\-tuning’s vital role in shaping LLMs into reliable, human\-oriented assistants\.

#### 4\.1\.2Human\-Centered Challenges with Instruction Tuning

While instruction tuning can generally improve the capabilities and usability of LLMs, there are still tensions that emerge in human\-centered contexts\. First, as mentioned before, instruction tuning helps shape models to respond more conversationally, making them more usable for users\. However, instruction tuning can lead to*superficial*improvements, rather than improving the reasoning capabilities of models\. For example, prior work found that models can merely learn to mimic the structure of the input data without heed to factual correctness or reliable reasoning and can fail to generalize to tasks outside of the training datasetGudibandeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib524)\), Kung and Peng \([2023](https://arxiv.org/html/2605.06901#bib.bib725)\)\. When users interact with models, these patterns become especially concerning, as users may be more inclined to trust outputs as the instruction\-following style appears more polished, confident, and authoritative, creating a veneer of competence that masks underlying reasoning failuresParket al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1781)\), Rathiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib49)\)\. This misalignment between surface\-level fluency and actual reliability can lead users to over\-rely on model outputs in high\-stakes contexts where factual accuracy is critical\.

A second tension with instruction tuning relates to*safety*concerns, which we discuss in more detail in §[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\. Since instruction\-tuned models are trained to comply with the provided prompt, instruction tuning can make models more susceptible to backdoor or poisoning attacks that embed malicious behaviors in datasetsPrabhumoyeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1011)\), Wanet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1241)\), Shuet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1767)\)\. As a result, models are more likely to produce unsafe responses, such as offensive or disallowed content\. Furthermore, other work has demonstrated that instruction tuning can make models more susceptible to jailbreaking attacks, as they have been trained to follow human requestsZenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1394)\)\. Ostensibly, model designers can add more safety data to the training dataset or align models to avoid these harmful instructions\. However, this practice can lead to overly cautious models that refuse to answer even benign queriesBianchiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1773)\)\. This resulting behavior is similarly unhelpful for users\. Thus, drawing this line between what instructions are permissible to follow in service of helpfulness versus what instructions may lead to unsafe behavior remains a core design tension\.

#### 4\.1\.3Future of Instruction Tuning for HCLLMs

What are the next frontiers for instruction tuning? A recurring theme when discussing tensions in instruction tuning is the role that data plays\. As mentioned in §[3\.1](https://arxiv.org/html/2605.06901#S3.SS1), ensuring dataset diversity may be crucial for avoiding biases and enhancing generalization across varied tasks\. To tackle this question, we can think about diversifying modalities of instruction tuning data and how data is sourced\. In addition, there are new frontiers for evaluating how instruction tuned models along dimensions\.

##### Multimodal data for instruction\-tuning\.

While our focus has centered on textual corpora for instruction\-tuning, human\-LLM interaction is not confined to text alone\. For hands\-free assistance, interacting via speech in addition to text is easier to manage for usersUdandaraoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1769)\)\. For visual editing \(e\.g\., figure design, poster\-making\), models ought to operate in both visual and textual modalitiesPanget al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1778)\), Siet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1779)\)\. Even in early demonstrations of intelligent assistants, our interactions are envisioned to seamlessly span multiple modalities\. including speech, gesture, and imagesBolt \([1980](https://arxiv.org/html/2605.06901#bib.bib1777)\)\. Advancements in multimodal instruction tuning and dataset creationLiet al\.\([2024f](https://arxiv.org/html/2605.06901#bib.bib1683)\)are thus critical for developing HCLLMs\. Moving beyond text introduces modality\-specific factors that are essential for human\-centered applications\. For instance, prosody in speech can help disambiguate user intent, and deictic gestures can provide spatial groundingSasuet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1776)\), Brooks and Breazeal \([2006](https://arxiv.org/html/2605.06901#bib.bib48)\)\. Furthermore, these multimodal capabilities require new approaches to both pretraining and fine\-tuning on interleaved multimodal data, enabling models to process the rich integrated signals that characterize human communicationUdandaraoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1769)\), Cuiet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1780)\)\.

##### Synthetic data for instruction\-tuning\.

How we obtain the requisite data for instruction\-tuning remains an important open question\. Synthetic data generation is an important research area that presents potential challenges for HCLLMs\. From a technical perspective, synthetic data generation offers benefits for scaling data collection by reducing reliance on human labor while maintaining data diversity\. There are some arguable human\-centered benefits: synthetic data can democratize model development by making fine\-tuning more accessible to researchers and practitioners without large annotation budgets\. Empirical work has demonstrated that synthetic instruction data can be particularly valuable in low\-resource settings, enabling more data\-efficient fine\-tuningPengpunet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1774)\)\. However, synthetic data generation also poses new challenges, exacerbating tensions discussed in §[4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2)\. Models trained on synthetic instructions may overfit to specific patterns present in the generated data, and despite claims of increased diversity, synthetic datasets can paradoxically reduce the authentic variation found in human\-generated instructionsChenet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1775)\)\. Figuring out how to address these flaws in synthetic data is important for leveraging this suite of methods to ensure models can handle the full range of real\-world user needs and interaction styles\.

##### Human\-centered evaluation frameworks\.

Emerging research shows that instruction tuning aligns LLMs with human brain activity, particularly in larger models with extensive world knowledgeAwet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib192)\)\. These findings open the door to more human\-centered evaluation frameworks, where models are examined by how closely they mirror human\-like reasoning, empathy, and context awareness\. This direction has implications for designing systems that better respect ethical norms, cultural sensitivities, and user well\-being\.

### 4\.2Learning from Human Preferences

In recent years, the idea of fine\-tuning LLMs on human preferences has seen remarkable success in improving their behavior\. Previously, it was believed that training on more samples and increasing model size was sufficient to increase performance\. However, researchers found that these scaling rules ignoredalignment, defined byAskellet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib188)\)as helpfulness, harmlessness, and honesty in model responses\. To this end, it was discovered that incorporating human feedback directly into the training process achieved massive gains in human preference alignment\(Askellet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib188), Leikeet al\.,[2018](https://arxiv.org/html/2605.06901#bib.bib768), Baiet al\.,[2022a](https://arxiv.org/html/2605.06901#bib.bib196)\)\. As such, in this section we discuss these developments chronologically, starting with reinforcement learning from human feedback \(RLHF\), its spinoffs, including direct preference optimization \(DPO\), and recent frameworks like Constitutional AI which aim for a future of fully self\-supervised AI alignment\.

#### 4\.2\.1RL\-Based Methods

Much of RLHF is built upon landmark research byChristianoet al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib306)\),Stiennonet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1168)\), andOuyanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib963)\)\. Together, these works demonstrated the feasibility of learning a reward function from human preferences and optimizing that function, first in the domain of simple robotics and Atari video games\(Christianoet al\.,[2017](https://arxiv.org/html/2605.06901#bib.bib306)\), then in the ability to improve LLM performance on summarization tasks\(Stiennonet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1168)\), and finally in improving LLM behavior on a wide breadth of tasks, including open generation, chatting, and question and answering\(Ouyanget al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib963)\)\.

Today, the canonical algorithm used to perform RLHF is proximal policy optimization \(PPO\)Schulmanet al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib1084)\), originally introduced as a simpler and more general improvement upon older RL methods like trust region policy optimization \(TRPO\)\(Schulmanet al\.,[2015](https://arxiv.org/html/2605.06901#bib.bib1085)\)\. However, there now also exists considerable research into exploring alternatives to PPO\. To address PPO’s high computational cost and sensitivity to hyperparamater tuning,Ahmadianet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib158)\)explore breaking PPO into its component pieces, and show that revisiting the formulation of human preferences in RL, discarding aspects that are unnecessarily complex for fine\-tuning pre\-trained LLMs, and returning to the most basic policy gradient algorithm, has yielded notable performance and efficiency gains\. Other works propose alternatives to PPO entirely, such as bringing the process online for online iterative RLHF\(Donget al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib408)\)or scoring sampled responses from different sources and aligning these with human preferences \(RRHF\)\(Yuanet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1390)\)\.

Other works on extending RLHF focus specifically on the data that goes into aligning LLMs, whether that be improving accessibility by filling gaps in existing datasets or addressing issues of scale\. For example, Okapi\(Laiet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib737)\)is introduced as the first system and dataset to focus on RLHF for multiple languages, covering 26 different languages\. For issues of scale, researchers from Google’s DeepMind propose reinforced self\-training \(ReST\)\(Gulcehreet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib528)\), which takes inspiration from growing batch RL to produce a dataset consisting of samples generated from the policy, which can then be used for offline training\.

Beyond PPO and its proposed alternatives, there also exists considerable discussion on other portions of the RLHF pipeline, targeting better alignment through task formulation and dataset augmentation\. Safe RLHF, proposed byDaiet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib358)\), explicitly decouples human preferences around helpfulness and harmlessness into two separate optimization objectives, and uses the Lagrangian method to balance trade\-offs between the two\.

#### 4\.2\.2Non\-RL Methods

DPO has gained significant attention as an RLHF alternative because it enables preference tuning without an explicit reward model\. It does so by directly including the probability ratio between preferred and dispreferred responses in its loss function\(Rafailovet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1033)\)\. The original authors show that DPO\-trained models generate responses that are preferred more frequently than those trained by PPO, and that DPO converges faster\.

Another sample\-efficient alternative, which not only avoids RL but also requires fewer than 10 samples, is Demonstration Iterated Task Optimization \(DITTO\) byShaikhet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1097)\)\. This method uses online imitation learning to create pairwise comparisons, treating user demonstrations as the gold standard and the model’s own outputs as dispreferred\. DITTO’s improvement in model alignment was shown through an average improvement of 19% in win rates, compared to few\-shot prompting and supervised fine\-tuning on various human\-centric tasks like news writing, emails, and blog posts\.

#### 4\.2\.3Beyond Human Feedback

Despite the success of methods like RLHF and DPO, recent research has sought to address the potential drawbacks of only using human\-sourced feedback for LLM alignment\. One drawback is the incompleteness of human feedback, which may only represent a partial view of collective human values\(Kirket al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib707)\)\. Furthermore, alignment is difficult to specify with explicit objectives\(Tamkinet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib1192), Bommasaniet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib240)\)\.Additionally, scaling quality and representative human feedback will become increasingly difficult with larger and more powerful LLMs\(Casperet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib268), Santurkaret al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1074)\)\.

##### Constitutional AI\.

To resolve these intricate issues, recent work has shifted from using RLHF to enlisting AI help along with human collaboration to supervise other AIs to train helpful and harmless AI systems\(Bowmanet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib245), Baiet al\.,[2022b](https://arxiv.org/html/2605.06901#bib.bib194), Saunderset al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1076)\)\. As previously mentioned, one issue with alignment is that it is not clearly defined\.

Anthropic’s work on Constitutional AI establishes a framework designed to answer this exact question\(Baiet al\.,[2022c](https://arxiv.org/html/2605.06901#bib.bib79)\)\. Rather than have humans provide explicit feedback, which may be inherently biased or incomplete, Constitutional AI only includes human feedback through the creation of a set of alignment principles \(i\.e\. a "constitution"\)\. Then, a model undergoes a training process similar to RLHF, but where the rewards are given by an LLM fine\-tuned according to the values in the constitution\(Ouyanget al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib963), Baiet al\.,[2022a](https://arxiv.org/html/2605.06901#bib.bib196)\)\. This approach also uses chain of thought reasoning to maximize LLM self\-reasoning capabilities throughout the entire process\(Nyeet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib954), Weiet al\.,[2022b](https://arxiv.org/html/2605.06901#bib.bib1287)\)\.

The end goal of Constitutional AI is not to get rid of human involvement or supervision entirely, but rather to have humans involved in only the most necessary aspects to move towards a self\-supervised approach to alignment\. Although Constitutional AI helped resolve many lingering issues with RLHF, this approach also brings up new questions in the ongoing research of alignment\. First, how does the global AI research community come up with a widely accepted constitution that incorporates the pluralistic values of human beings\(Hendryckset al\.,[2021a](https://arxiv.org/html/2605.06901#bib.bib565)\)? Second, how do we ensure a universal understanding and interpretation of the presumed constitution? How do we make sure there is a robust system for editing and improving the principles and rules as the society evolves? And when constitutional guidelines fail in ambiguous situations, how do we ensure that the models with minimal human supervision can still behave in a safe and useful way?

### 4\.3Scaling Human Centered LLMs

In NLP, “scaling” refers to the relationship between a model’s performance and factors such as the number of parametersnn, dataset sizedd, and computational resourcesccKaplanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib683)\)\. Understanding these scaling laws is important for developing HCLLMs that balance efficiency and performance with accessibility and fairness\.

#### 4\.3\.1Scaling Laws in LLMs

Kaplanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib683)\)conducted foundational research on empirical scaling laws for language model performance, particularly focusing on cross\-entropy loss\. Their work established that model performance improves predictably with increases innn,dd, andcc, following a power\-law relationship\. Importantly, they discovered that returns diminish when eithernnorddis held constant, underscoring the need for a strategic, balanced approach to scaling in NLP\.

Tayet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1208)\)expand onKaplanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib683)\)’s work to investigate the effect of scaling properties of different inductive biases and model architectures\. They find via extensive experiments that \(1\) architecture is an important consideration, \(2\) the best performing model can fluctuate at different scales, and \(3\) the choice of whether to scale depth \(number of layers\) or width \(more neurons per layer\) is important, especially in resource\-constrained environments\. Models often excel at pretraining but underperform on downstream tasks, underscoring the need to evaluate models based on human utility rather than just raw performance metrics\.Hoffmannet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib575)\)introduced "Chinchilla scaling," which showed that smaller models trained on larger datasets achieve better performance per compute budget\. This finding is applicable in resource\-constrained human\-centered applications where compute and data availability may be limited due to ethical or logistical constraints\.

Ivgiet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib641)\)investigate the applicability of scaling laws for different NLP tasks and find that benefits vary\. Tasks aligned with pretraining objectives, such as question answering, show clearer scaling behavior compared to specialized tasks like sentiment analysis\. Thus, for human\-centered applications, practitioners should assess whether scaling laws are applicable to their specific use case or if full\-scale testing is necessary\. While scaling can improve performance for some tasks, it may not always be the most efficient path \- a point further developed byLianget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib787)\), who show that similar or greater improvements to model accuracy can be achieved through more efficient human\-centric means, such as training with human feedback\.

#### 4\.3\.2Scaling in Human\-Centered Domains

Although scaling improves the overall performance of LLMs, it does not proportionally improve performance at the same rate for all subpopulations and human\-centered knowledge domains\. Representational biases in training data \(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)and §[3\.2](https://arxiv.org/html/2605.06901#S3.SS2)\) can lead to disparities in scaling\(Rolfet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib764)\)\. However, data is not the only cause of relative disparities in scaling, and may not even be the principal cause\.Heldet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1614)\)found that, when holding the data scale constant, model\-size scaling is responsible forwideningthe performance gap between LLM performance on certain varieties of English relative to other varieties\. In addition to dialect, the authors investigated AI risk behaviors\(Perezet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib763)\)and found that scaling mitigates some risks more than others\.

Scaling model size alone cannot address human\-domain\-specific challenges such as cultural biases, asBommasaniet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib240)\)highlight\. The effectiveness of scaling laws varies across different domains, with some human\-centered areas experiencing diminishing returns when either the number of parameters \(nn\) or dataset size \(dd\) is limited\. For instance,Brownet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib250)\)observed that large models like GPT\-3 show reduced performance gains on human\-centered tasks unless they are fine\-tuned with context\-specific data, emphasizing the need for tailored approaches in these applications\.

Gururanganet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib536)\)conducted a study to investigate whether it is still helpful to tailor a pretrained model to the domain of a target task in a world where large\-scale models, which form the foundation of today’s NLP landscape, have found much success in a broad\-coverage approach trained on a wide variety of sources\. Overall they found that multi\-phase adaptive pretraining offers large gains in task performance, implying that the quality and treatment of the dataddis more important than the quantity, which is relevant when dealing with sensitive or specialized human domains\.

Benderet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib235)\)contribute by arguing that increasingddwithout considering diverse and ethical data sources can lead to biased or non\-representative outcomes, negatively impacting human\-centered goals such as equitable access and cultural inclusivity\. Furthering the discussion on fine\-tuning for specific human\-centric domains,Zhanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1411)\)examine how different scaling factors influence the fine\-tuning performance of LLMs\. The study aligns withKaplanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib683)\)andHoffmannet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib575)\), exhibiting a multiplicative joint scaling law that links fine\-tuning performance to model size, fine\-tuning data size, and other scaling factors\. It indicates that fine\-tuning performance improves predictably when scaling both dataset sizeddand model sizennand finds that LLM fine\-tuning benefits more from LLM model scaling than pretraining data scaling\. The work also highlights that the effectiveness of fine\-tuning varies significantly based on the downstream task and the size and quality of the available fine\-tuning data, underscoring the need for task\-specific approaches in human\-centered applications\.

#### 4\.3\.3Scaling in Human\-Centered Goals

Looking at specific human\-centered evaluations, such as bias and fairness, there are potentially unexpected results when models are scaled\.Ethayarajh and Jurafsky \([2020](https://arxiv.org/html/2605.06901#bib.bib441)\)show that leaderboard\-driven scaling can create misaligned incentives in model development\. By analyzing examples like the SNLI leaderboard, they demonstrate that focusing solely on state\-of\-the\-art performance discourages practical models\. For instance, with SNLI baselines at 78% \(n\-gram\) and 81% \(LSTM\) versus a 92% BERT\-based SOTA, there is little motivation to develop lightweight models with 85% accuracy, which could balance performance and computational efficiency and be more practically useful\.

Ganguliet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1750)\)explored model safety through red teaming, where testers try to provoke harmful outputs\. Testing models from 2\.7B to 52B parameters, they found that RLHF\-trained models became harder to ‘break’ as they scaled, reducing the success of harmful attacks \(the harmlessness score increased from approximately \-0\.5 with 2\.7B parameters to 0\.5 with 52B parameters\)\. In contrast, other models showed no improvement in resisting such outputs with increased size\. This study stresses the importance of incorporating human feedback during training to develop safer AI systems\. Larger models exhibit heightened privacy risks\.Hernandezet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1662)\)demonstrate that an 800M parameter model could be degraded to that of a 400M model by repeating just 0\.1% of the training data 100 times, suggesting that larger models aren’t automatically more robust to certain types of data\-based attacks \(see §[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\)\.

In regards to emulating human values,Biedmaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib220)\)showed that as language models get larger, they show an increased preference for the task\-oriented values like accuracy and factual consistency at the slight expense of social intelligence or moral fiber and adherence to ethical norms\. While larger models may become more capable, their value systems have the potential to become increasingly misaligned with human values\.

Transfer learning is often a prerequisite for the application of LLMs in human\-centered tasks\. Scaling laws suggest that a model’s ability to transfer knowledge improves as its performance increases\. This relationship generally holds in human\-centered evaluations, with studies showing that transfer learning benefits from scaling in broad tasksHernandezet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib567)\), Raffelet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1034)\)\. However,Hernandezet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib567)\)andRaffelet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1034)\)highlight the importance of domain\-specific fine\-tuning and alignment techniques to achieve human\-centered objectives in specialized or sensitive areas such as legal and medical contexts\.

#### 4\.3\.4Inference time scaling

Recent advances in inference\-time scaling offer pathways to improve HCLLMs without retraining\. Now more targeted approaches to inference\-time adaptation are emerging that specifically address human\-centered concerns\.Zhanget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib852)\)introduce Controllable Safety Alignment \(CoSA\), a framework that enables inference\-time adaptation to diverse safety requirements without model retraining\. Rather than following a one\-size\-fits\-all approach to safety alignment where models refuse any potentially unsafe content, CoSA allows authorized users to modify safety configurations at inference time through natural language descriptions of desired safety behaviors\.

Despite these promising directions, there remain open questions about whether increased inference compute might undermine certain human\-centered objectives\. OpenAI’s emphasis on chain\-of\-thought \(CoT\) in their o\-series of models, for example, underscores the prevailing focus on inference\-time reasoning strategiesOpenAI \([2024a](https://arxiv.org/html/2605.06901#bib.bib849)\)\. However,Shaikhet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1620)\)find that explicitly prompting models to “think step by step” can inadvertently increase harmful biases and toxic outputs, observing an 8\.8% rise in biased responses and a 19\.4% increase in toxicity across relevant benchmarks\. Their study suggests that while inference\-time reasoning holds promise for improved performance or controllable safety alignment, it may also expose underlying biases\.

### 4\.4Personalization

We established that the goal of the post\-training stages is to align LLMs to human preferences\. Nonetheless, the term “human preferences” is blanket statement that obscures the diverse and potentially conflicting desires that different users have\. Rather than trying to create LLMs that can satisfy everyone, the goal of personalization is to align a model’s outputs with the preferences of a single individual, both in terms of content and style\(Zhanget al\.,[2024g](https://arxiv.org/html/2605.06901#bib.bib7), Tsenget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib570)\)\.

#### 4\.4\.1Current Approaches

We cover three families of techniques for personalizing LLMs: prompting\-based approaches, retrieval\-based approaches, and personalized alignment \(e\.g\., learning from human preferences\)\. There is no “best” technique for personalization and instead depends on factors, such as what data is available, compute efficiency, degree of personalization required, and so on\.

##### Prompting\-based Approaches\.

Given the capabilities of LLM, prompt\-based approaches provide an efficient and popular approach to personalizing model outputs\. One prompt\-based technique is to provide a user persona directly in the text instructions provided to the model\. Here, the critical research decision is figuring out what content to provide for personalization\. For example, prior work has explored providing demographic information \(e\.g\., race, gender\) to the model for personalization, although these personas run the risk of stereotypes or caricaturizationChenget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1599),[a](https://arxiv.org/html/2605.06901#bib.bib1602)\), Guptaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib533)\), Huanget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1039)\)\. In addition, this approach has been adopted to imbue character traits upon the model, such as prompting the responses to reflect certain personality qualities \(e\.g\., extroversion, warmth\) or different tones \. Since these methods can suffer from inefficiency and information lossLiuet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib802)\), other works have also explored embedding user information into tokenized prompt embeddingsLiet al\.\([2024k](https://arxiv.org/html/2605.06901#bib.bib783)\), Hebertet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib560)\), Huanget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib614)\)\.

##### Retrieval\-based Approaches\.

What information about the user is relevant depends on the context\. Retrieval augmented personalization methods retrieve information from an external knowledge base that is then incorporated at inference\-time to personalize the model’s output\. As an initial foray into this area,Salemiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib743)\)benchmark different retrievers, including BM25 and a dense retrieval model, on a series of personalization tasks\. Other work has continued to refine retrieval methods, exploring how to improve capabilities while reducing the amount of retrieved data via summarizationRichardsonet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1040)\)or adopting techniques from other areas such as collaborative filteringShiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib10)\)\. Although retrieval\-based methods allow for on\-the\-fly personalization of models at inference\-time, the performance is constrained by retriever quality\. For example, lexical retrievers, such as BM25, may match on shallow keyword similarity rather than deep semantic understanding of user needs\. Furthermore, this approach may be more constrained in cold\-start situations or domains with sparse user data, where the knowledge base itself may be insufficient to support meaningful personalization\.

##### Personalized Alignment\.

Finally, rather than adapting model behavior at inference time, training\-based approaches can embed individual preferences into alignment objectives\. For example, several works present methods for creating personalized reward models, such as approaches that train multiple reward models that are later mergedJanget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib643)\)\. Critically, these approaches require that the dimensions for personalization are defined a priori, limiting the scope to which they can be applied\. Alternative approaches have sought to loosen these constraints, proposing methods that learn preferences from historical user interactions, which can then be used to train personalized reward models or directly align LLMsRyanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1250)\), Poddaret al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1006)\), Balepuret al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib6)\)\. However, training\-based personalization faces practical limitations: these methods require substantial amounts of user data in domains where data is often scarce, risk overfitting to individual user patterns, and incur significantly higher computational costs compared to retrieval\-based or prompting approachesRyanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1250)\), Shaikhet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1097)\), Zhanget al\.\([2024g](https://arxiv.org/html/2605.06901#bib.bib7)\)\. These tradeoffs make training\-based personalization most suitable for scenarios where there is sufficient data present and customization justifies the additional resource investment\.

#### 4\.4\.2Future of Personalization for HCLLMs

A central challenge in personalization is deciding what the model should adapt to\. For example, economists will distinguish between stated preferences — what people say they want — and revealed preferences—inferred from behaviorSamuelson \([1948](https://arxiv.org/html/2605.06901#bib.bib1037)\)\. In language model personalization, this tension also manifests\. Behavioral signals such as query reformulations, response ratings, or conversation length may reflect immediate satisfaction but diverge from users’ stated goals or long\-term values\. Consider a user learning a new subject who frequently requests direct answers\. Based on inferred signals, the model might be personalized to comply whereas a model targeting learning outcomes might instead offer scaffolded hints\. This raises fundamental questions about system objectives and user agency: which preferences should dominate when conflict arises, and how should systems handle patterns users exhibit but might not endorse upon reflection?

Temporal dynamics present an additional challenge\. User preferences and needs evolve over timescales, ranging from within\-session learning to long\-term skill development and shifting life contexts\. Yet, current approaches treat personalization as a static task\. While recent work has extended the length of conversation in personalization datasets, we lack real\-world benchmarks that capture these longer\-term dynamics or benchmarks for evaluating personalization over timeKirket al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib708)\), Zhanget al\.\([2024g](https://arxiv.org/html/2605.06901#bib.bib7)\), Zhaoet al\.\([2025c](https://arxiv.org/html/2605.06901#bib.bib1785)\)\. There is some early work that explores updating user profiles over time, but critical questions remain openWanget al\.\([2024e](https://arxiv.org/html/2605.06901#bib.bib1038)\)\. For instance, how can systems distinguish transient preferences \(a user exploring a new hobby\) from enduring ones \(a domain expert’s consistent working style\)? What mechanisms allow for efficient model updates under these circumstances? These temporal considerations compound the preference alignment challenges; even if we can perfectly identify and incorporate a user’s current preferences into LLM outputs, those preferences themselves may be moving targets\.

### 4\.5Pluralism

While we have discussed methods for aligning models to human preferences and values, the important, unaddressed question is*whose*values\. Not all humans share the same set of valuesDurmuset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib422)\), and what is permissible to some may be irrelevant or harmful to others\. Rather than assuming values are monolithic and aligning models to a “Silicon Valley default” set of preferences,*pluralistic alignment*is an approach of aligning language models to simultaneously serve diverse preferences\. AsSorensenet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib370)\)define, pluralistic alignment is the process of creating language models capable of representing a diverse set of human values and perspectives\.

#### 4\.5\.1Current Approaches

##### Defining Pluralism

Prior work has proposed three definitions as to how pluralism can be embedded into modelsSorensenet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1152)\):

1. 1\.Overton Pluralism: An Overton pluralistc model generates all*reasonable*perspectives for a given subject\. The model draws on the concept of the “Overton window,” which encompasses the range of ideas and perspectives that are considered acceptable within the mainstream\. Scholars such asLakeet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib740)\)have posited that Overton pluralism is compatible with the status quo alignment process, as longer conversational outputs typically share a few varied perspectives\.
2. 2\.Steerable Pluralism: Steerable pluralism advocates for using situational context to steer the model towards a particular perspective\. One popular use case of steerable pluralism is improving the cultural alignment of language models to particular groups and culturesMasoudet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib879)\), Taoet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib103)\)\. Personalization also falls under steerable pluralism as the model is steered to the values of a particular user\.
3. 3\.Distributional Pluralism: Finally, in distributional pluralism, developers steer models to produce responses that roughly correspond with a population distribution\. That is, if 70% of the population holds a particular opinion, the model will generate that opinion 70% of the time\. AsLakeet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib740)\)note, base models are somewhat distributionally aligned to the opinions present in pre\-training data\.

##### Methods for Pluralistic Alignment\.

Several works have proposed different methods for alignment, depending on what type of pluralism they seek to achieve\. For Overton pluralism, researchers have explored prompting techniquesMeinckeet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib897)\)and approaches that generate multiple perspectives before synthesizing themFenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib456)\), Hayatiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib556)\), Liet al\.\([2024h](https://arxiv.org/html/2605.06901#bib.bib781)\)\. For steerable pluralism, personalized reward modelsJanget al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib643)\), Poddaret al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1006)\), Chenet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib290)\)and optimization techniques \(e\.g\., Group Preference OptimizationZhaoet al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1442)\)\) enable alignment to specific user or group preferences with minimal context\. Other common implementation methods include targeted promptingAlKhamissiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib170)\)and generate\-then\-filter approachesFenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib456)\)\. Finally, for distributional pluralism, researchers generate diverse perspectives and filter based on population distributionsFenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib456)\), with surveys serving as key measurement tools since they enable matching LLM probability distributions to actual population responses\. OpinionsQASanturkaret al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1074)\)and GlobalOpinionsQADurmuset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib422)\)are two widely\-used survey benchmarks for evaluating distributional alignment\.

In tandem, a data\-centric approach to pluralistic alignment has focused on collecting datasets that represent a diversity of values and perspectives\. Chatbot ArenaChianget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib366)\)and PRISMKirket al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib708)\)are two such collections of preference data that retain user labels\. The PERSONA dataset attempts to simulate 1,500 users with synthetic personas for the purpose of studying pluralistic alignment, providing synthetic prompts and feedback pairsCastricatoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib271)\)\. The ValuePrism datasetSorensenet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1154)\)is a collection of values, rights, and duties applied to various scenarios\. They also release the KALEIDO model for measuring how certain statements agree or disagree with particular values\. ValueConsistencyMooreet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib911)\)is a dataset of 300 controversial topics in four languages with corresponding controversial questions\. Moral StoriesEmelinet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib432)\)is a dataset of narratives with moral dilemmas with multiple endings\. The stories were written from annotators of various demographic backgrounds and express different social and moral norms\.

#### 4\.5\.2Future of Pluralistic Alignment for HCLLMs

Achieving effective pluralistic alignment faces several challenges\. First, figuring out how to appropriately and effectively model diverse perspectives is a key precursor to effective pluralistic alignment\. Existing literature has shown that models tend to emphasize stereotypical representations when asked to simulate perspectives from different demographic groupsChenget al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1602),[b](https://arxiv.org/html/2605.06901#bib.bib1599)\), Deshpandeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib388)\), meaning that approaches that generate and synthesize varied perspectives require careful design to avoid misrepresenting communities\. While prompting LLMs with demographic features can improve alignment with group opinionsAlKhamissiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib170)\), the choice of base model sometimes has a more significant effect than the sociodemographic features themselvesBecket al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1607)\)\. Effective perspective modeling remains an open problem\.

A second challenge involves understanding and balancing the societal impacts of pluralistic models on public opinion\. It is well\-established that aligning LLMs to human preferences can exacerbate model sycophancySharmaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1101)\), where models agree with users regardless of validity\. Personalized models developed through steerable pluralism risk creating personalized echo chambers that reinforce rather than challenge user beliefs\. Already there are concerns about “echo chambers” in media consumption, which may only be further augmented with LLMsCinelliet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib4)\), Barberá \([2020](https://arxiv.org/html/2605.06901#bib.bib5)\)\. Even though pluralistic alignment aims to broaden the viewpoints that models may espouse, these methods still define what perspectives are considered “in bounds” of acceptability or follow ostensible public opinion distributions\. In doing so, these methods may inadvertently silence marginalized viewpoints that fall outside mainstream acceptability\. Future work should empirically investigate how different pluralistic alignment strategies affect opinion formation and marginalization in practice, moving beyond theoretical concerns to measurable impacts on diverse user populations\.

Finally, pluralistic alignment raises questions of data sovereignty and consent\. Many communities, particularly indigenous groups, do not want their data and opinions collected for training AI systemsRainieet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib1610)\)\. Even when motivated by the goal of broad representation, developers must carefully consider whether all communities want their perspectives embedded in AI systems, respecting the right of groups to opt out of technological representation entirely\.

### 4\.6Multilinguality

Beyond being able to capture the needs, values, etc\. of users, users ought to also be able to interact with models in their preferred language\. Although there are over 7,000 languages spoken worldwide, most of LLM development focuses on English\(Heldet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib183)\)\. Expanding the multilingual capabilities plays a critical role in helping democratize access to language models, particularly bridging the gap between high\-resource and low\-resource language technologies\.

#### 4\.6\.1Current Approaches

Multilingual large language models \(MLLMs\) are systems capable of both understanding and generating text in multiple languages\. While most LLMs perform best in English, there is a concerted effort to improve their multilingual capabilities\. From a data\-centric perspective, these efforts focus on curating diverse linguistic resources, including pre\-training corpora and post\-training datasets for supervised fine\-tuning \(SFT\) and preference learning\. Many pre\-training corpora, such as RedPajamaWeberet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1691)\)and CC\-100Conneauet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib340)\), are collected via web scraping\. Furthermore, many existing pre\-training corpora that are primarily in English already include a small percentage of non\-English data; incorporating even this small amount of data during pre\-training can improve cross\-lingual capabilitiesBlevins and Zettlemoyer \([2022](https://arxiv.org/html/2605.06901#bib.bib590)\)\. Nonetheless, a core challenge is that the Internet remains heavily English\-centric, leading to performance gaps for less\-resourced languages that appear infrequently online or, in some cases, lack standardized written forms\. For an in\-depth survey on multilingual LLMs, please refer toQinet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib585)\)\.

Rather than relying solely on naturally occurring multilingual text, researchers have turned to translating high\-resource English text into other languages\. For example,Wanget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib592)\)translate the pre\-training corpus FineWeb into multiple languages for pre\-training\. Similar approaches exist for creating post\-training datasets, such as the AyaÜstünet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib588)\)and CrossAlpacaRanaldiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib589)\)corpora, which rely on either machine translation or existing MLLMs to assist with translationLaiet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib737)\), Yueet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib780)\)\. However, translated datasets can introduce artifacts \(referred to as*translationese*\) that shift linguistic patterns and degrade data quality, particularly for downstream reasoning tasks and open\-ended generationDanget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib587)\), Vanmassenhoveet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib586)\)\. While translation is an efficient way to generate large amounts of multilingual data, another question we must ask is what gets lost in translation? For example, when translating from text in English, the resulting output might lack important cultural context and nuance that would be found in text originating in places that speak the selected languageQinet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib585)\)\.

Finally, complementary to training interventions are non\-training approaches enabled through prompting\. A wide range of prompting strategies for multilingual use has been proposed, including those surveyed in recent work byVatsalet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib736)\)\. One common strategy is “translate\-test”, or to translate user input into English before performing the task, leveraging the stronger English proficiency of most current LLMsLiuet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib779)\), Artetxeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib835)\), Huanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib840)\), Etxanizet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib845)\), Huanget al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib610)\)\. While this approach often boosts performance, it can fall short for tasks requiring cultural nuance, idiomatic understanding, or language\-specific world knowledge, where remaining in the original language is critical for faithful interpretation and generationLiuet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib779)\)\.

#### 4\.6\.2Future of Multilinguality for HCLLMs

Looking forward, efforts toward*human*\-centered multilingual capabilities must consider the following areas\. First, there is a growing interest in multilingual safety\. AsYonget al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib841)\)identify, multilingual safety remains massively underrepresented as a research domain, resulting in safety standards built for English that do not translate effectively to other linguistic contexts\. Treating English as the universal reference point obscures sociolinguistic variation and produces a gap between how models behave and how safety norms should operate for real users across languages\. Understanding and addressing multilingual safety thus remains an open frontier\.

A second challenge involves determining what it means to represent language in ways that do not exploit the communities that speak it\. The desire for multilingual NLP is not new\. Projects such as Meta’s No Language Left Behind demonstrate longstanding investment in broad language coverage\. However, asBird \([2024](https://arxiv.org/html/2605.06901#bib.bib842)\)argues, these efforts often treat language as a detached artifact rather than something rooted in communities, cultures, and social practices\. When language is treated as a pure optimization target, scraped data, or a resource to be “unlocked,” the resulting systems risk extraction without contributing tangible value to the communities whose linguistic labor enables them\.

Beyond technical considerations, achieving authentic multilinguality in human\-centered systems demands rethinking how data is gathered, whose language practices are modeled, and for what ends\. Simply scaling web pre\-training and technical fixes may improve multilingual benchmarks, but it does not confront the deeper question of whether these systems advance the needs, agency, and self\-determination of speakers\. Efforts to “democratize” access to LLMs echo earlier narratives that cast computing as a universal solution\. As the Information and Communication Technology for Development \(ICT4D\) literature reminds usToyama \([2015](https://arxiv.org/html/2605.06901#bib.bib843)\), Harris \([2016](https://arxiv.org/html/2605.06901#bib.bib844)\), such narratives risk reproducing existing inequities when social, cultural, and political contexts are ignored\.

## 5Evaluation

Evaluation methods allow model developers, users, and stakeholders to compare the capabilities and limitations of different LLMs, and to understand the scope of their utilities and risks in particular domains\(Changet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib1315)\)\. Evaluations can inform all stages of model development, from mixing pre\-training data\(Heldet al\.,[2025b](https://arxiv.org/html/2605.06901#bib.bib851), Mizrahiet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib332)\)to selecting and optimizing reward models for alignment\(Lambertet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib330), Fricket al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib331)\)\. After models are trained, evaluations arguably become even more important\. They act quality filters to decide whether and how companies deploy models\(Lianget al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib787)\)\. They inform researchers on the most important and promising directions for model improvement\(Srivastavaet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1159)\), and help anticipate models’ future capabilities\(Kaplanet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib683), Hoffmannet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib575)\)\. And finally, they shape public perceptions\(Liao and Sundar,[2022](https://arxiv.org/html/2605.06901#bib.bib1795)\), and inform policy and other regulatory decisions\(Erikssonet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1514)\)\.

Without a human\-centered evaluation of LLMs, model development, deployment, and governance may be oriented not towards the long\-term and collective good, but rather towards profit incentives and short\-term gains on surface\-level heuristics\(Erikssonet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1514)\)\. In this chapter, we observe common pitfalls and highlight best practices in human\-centered evaluation, spanning three levels as shown in Figure[5](https://arxiv.org/html/2605.06901#S5.F5)\. First, we consider evaluations at the level of model outputs \(§[5\.1](https://arxiv.org/html/2605.06901#S5.SS1)\), using both quantitative metrics \(§[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2)\) and qualitative evaluations \(§[5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3)\)\. Beyond raw outputs, we also consider how people experience LLMs \(§[5\.2](https://arxiv.org/html/2605.06901#S5.SS2)\), considering human values \(§[5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1)\), as well as concerns over bias \(§[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)\) and safety \(§[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\)\. Lastly, we discuss extrinsic evaluations at the societal level \(§[5\.3](https://arxiv.org/html/2605.06901#S5.SS3)\), measuring the system’s real world impact\.

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/sec5.png)Figure 5:In this chapter, we discuss common pitfalls and best practices for evaluating HCLLMs, considering three distinct levels of evaluation: the model level \(§[5\.1](https://arxiv.org/html/2605.06901#S5.SS1)\), the human level \(§[5\.2](https://arxiv.org/html/2605.06901#S5.SS2)\), and the societal level \(§[5\.3](https://arxiv.org/html/2605.06901#S5.SS3)\)\.### 5\.1Model\-Level Evaluations

#### 5\.1\.1Benchmarks

Benchmarking has long been a key driver in the development of AI systems\. Benchmarks act as a compass, encoding the values, priorities, and goals of the AI research communityEthayarajh and Jurafsky \([2020](https://arxiv.org/html/2605.06901#bib.bib441)\), Birhaneet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib224)\)\. They help determine not only how capable a model is, but also what we consider meaningful progress, allowing us to compare the strengths of different models\. Recent research has argued for a shift in what and how we evaluate, especially for human\-centered applications\. As LLMs increasingly influence real\-world decision\-making, especially in domains like education, law, healthcare, and customer support, the limitations of traditional benchmarks become even more critical\. Benchmarks must evolve to better represent human values such as fairness, robustness, usability, and positive societal impact\. Below, we discuss a few general principles for thinking about evaluating human\-centered LLMs\.

##### Moving Away from "Exams" and Rethinking What We Evaluate\.

Traditional benchmarks often mimic academic exams, assessing LLMs by how well they can replicate human outputs or solve static problems in standardized formats like multiple\-choice questions\(Hendryckset al\.,[2021b](https://arxiv.org/html/2605.06901#bib.bib563)\), math problems\(Sunet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1360)\), and code generation\(Jimenezet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1770)\)\. While useful, this framing compresses complex, multidimensional model behavior into a single metric\. Even interaction\-based, human\-voting evaluations like ChatbotArena\(Chianget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib366)\)are limited by their brittleness or their misalignment with how humans actually use AI\(Singhet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1151)\)\. In real\-world use, LLMs are collaborators, copilots, or tools embedded in workflows, so it becomes necessary to evaluate them in natural, complex, and multi\-step human\-AI interaction settings, not just in isolation\. One promising alternative is centaur evaluationsHaupt and Brynjolfsson \([2025](https://arxiv.org/html/2605.06901#bib.bib1358)\)where humans and models collaborate\. Here, we care about the outcome of the combined system\. These setups get closer to how AI is actually used in practice, whether it’s writing, analysis, customer support, diagnosis, or decision\-making\.

##### Ecological Validity\.

A central challenge in evaluating LLMs for real\-world use is ecological validity, the extent to which a benchmark setting reflects the complexity of how systems are actually used\. Controlled evaluations may offer cleaner signals, but they often fail to generalize to interactive, user\-facing deployments\. Recent workLiet al\.\([2025d](https://arxiv.org/html/2605.06901#bib.bib1357)\)has shown that no single benchmark strongly correlates with interactive performance for audio models across 20 existing datasets\. A model that excels at standard static tasks might still struggle in dynamic or collaborative environments\. This mismatch suggests a need for richer, context\-aware evaluations\. One promising direction is to build evaluations bottom\-up from in\-the\-wild data\. For example,Röttgeret al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1150)\)evaluate perspective and framing biases in LLM responses to natural user queries\. Benchmarks should also be robust and reliable, correlating good performance with success in real tasks\. This requires vetted examples with accurate annotations and sufficient statistical powerBowman and Dahl \([2021](https://arxiv.org/html/2605.06901#bib.bib244)\)\. Finally, effective benchmarks should reveal potential biases, artifacts, and any dual uses, as well as ways to mitigate such unintended consequencesWeidingeret al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1293)\)\.

##### Data Contamination and Dynamic Alternatives\.

With LLMs trained on massive web\-scale corpora, the risk of benchmark contamination has become a serious issue\. Many popular benchmarks are at least partially contained in training data, making their validity as evaluation tools less convincing\. The line between training and testing becomes blurry, especially for static tasks\. This is one benefit of dynamic, evolving benchmarks\. Examples like DynaBench\(Kielaet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib703)\), Chatbot Arena\(Chianget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib366)\), WebArena\(Zhouet al\.,[2023c](https://arxiv.org/html/2605.06901#bib.bib368)\), and WildVision\-Arena\(Luet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib367)\)introduce a degree of human involvement that better mirrors real\-world interaction\. Such dynamic setups are promising for evaluating generalization and interaction aspects and mitigating issues around saturation and contamination\.

##### General\-Purpose vs\. Domain\-Specific Evaluations\.

For example, DR BenchGaoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib486)\)assesses LLMs’ diagnostic reasoning abilities, PubMedQAJinet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib655)\)targets biomedical research question\-answering, and LegalBenchGuhaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib526)\)is designed for legal reasoning, including statutory interpretation and contract analysis\. In education, benchmarks are emerging to evaluate LLMs’ effectiveness in providing innovative and meaningful feedback to teachersWang and Demszky \([2023](https://arxiv.org/html/2605.06901#bib.bib1258)\)and emulate expert decision\-making in providing tailored math remediation help bridge the gap between technological capability and educational needsWanget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1244)\)\. Recently, GDPval measures model performance on economically valuable, real\-world tasks across 44 occupationsOpenAI \([2025b](https://arxiv.org/html/2605.06901#bib.bib757)\)\. These specialized benchmarks provide useful signals that evaluation is grounded in each specific context, offering contextualized and real\-world assessment of model performance compared to simply on math and coding tasks\. Collaborative efforts across domains are crucial to developing benchmarks that reflect the full complexity of human\-LLM interactions and the contexts in which LLM systems are deployed\.

Overall, a human\-centered framework often transcends traditional metrics and benchmarks that continue to prioritize efficiency and profitability above all else\. While these measures are useful in providing objective algorithmic reviews on quantitative criteria, they fail to, or sometimes not even attempt to, capture the human factors and societal patterns that are inherently present in these systems\.

#### 5\.1\.2Quantitative Evaluation

Automatic Metrics\.Notably, among automatic metrics, foundational methods have played a critical role in shaping intrinsic evaluations\. These metrics provide a systematic way to evaluate model performance through standardized benchmarks, making the evaluation process more efficient and scalable\(Saiet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1314), Askellet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib188), Hu and Zhou,[2024](https://arxiv.org/html/2605.06901#bib.bib603)\)\.

Metrics like BLEUPapineniet al\.\([2002](https://arxiv.org/html/2605.06901#bib.bib972)\)and ROUGE\(Lin,[2004](https://arxiv.org/html/2605.06901#bib.bib793)\)are valued for their simplicity and reproducibility\. However, their shortcomings include reliance on strict token matching, which often penalizes valid paraphrases and fails to capture deeper semantic equivalence\(Wietinget al\.,[2019](https://arxiv.org/html/2605.06901#bib.bib1728)\)\. Even embedding\-based metrics like BERTScoreHanna and Bojar \([2021](https://arxiv.org/html/2605.06901#bib.bib1577)\)can be fooled by lexical similarity, ranking a more similar incorrect translation higher than a dissimilar but correct one\.

Generally, quantitative metrics also are limited in addressing other human needs, such as interpretability, latency, cognitive load, and user satisfaction\. Optimizing solely for a metric like perplexity can lead to monotonous responses from a language modelCelikyilmazet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1316)\), which would be less appealing to a user\. In the high\-stakes domains such as healthcare, where applications are highly critical, existing metrics have been found to fail to capture trust, personalization, and empathyAbbasianet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib147)\)\. Finally, while these metrics may be automatic, they are often not scalable to tasks such as open\-ended question answering and complex planningGehrmannet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib15)\)\. These limitations have led to the development of complementary and alternative evaluation methods\.

Reference\-based Metrics\.Reference\-based approaches measure the similarity between the system output and the predefined reference samples, such as cosine similarityAgarwalet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib155)\), the E2E benchmarkBanerjeeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib203)\), HUSEHashimotoet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib554)\), and Reward BenchLambertet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib330)\)\. These methods maintain the benefits of standardized and objective evaluation for automatic metrics, but they are also limited to the quality of the used standard\. For instance, they may be inconsistent or disprove themselves against new references or optimize for closeness to a single gold standard, even if the overall response quality is worseNguyenet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1718)\)\. For creative tasks, such a gold standard may not even exist\.

Machine\-learned Metrics\.Machine\-learned metrics such as reward modelsRyanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1068)\)and classifier\-based scoringShaikhet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1096)\)show some promise in capturing nuances of human judgment\. However, it can be challenging to build pipelines to ground LLMs, such as with specific sources for factual correctnessTanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1717)\), or to social science theories that reflect human behavior and preferencesShaikhet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1096)\)\. Additionally, these methods face limitations in generalizing to out\-of\-distribution settings, particularly in addressing discrepancies in preferences across different groups of people worldwideRyanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1068)\)\.

#### 5\.1\.3Qualitative Evaluation

In contrast to quantitative evaluations \(§[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2)\), qualitative evaluations require a nuanced approach to evaluation as they work directly with humans \(or LLMs\)\. They are perhaps more human\-centered than automatic or machine\-learned metrics due to their subjects, while requiring more careful considerations to design fair and effective evaluations\. We first discuss two paradigms of qualitative evaluations, LLM as a Judge and Human Evaluation\. We then end the section with a coverage of Extrinsic Evaluation\.

LLM\-as\-a\-Judge\.The rise in popularity of LLMs has led to the “LLM\-as\-a\-Judge” paradigm, which caters towards more human\-centered systems\. Given the cost and subjectivity of human evaluation, LLM evaluation proves to be a feasible alternative, and the results are generally consistent with results from human experts on some tasksChiang and Lee \([2023](https://arxiv.org/html/2605.06901#bib.bib299)\)\. Within the LLM judge paradigm, there are various use cases, such as LLM\-derived metrics \(embedding\-based, probabilities, etc\.\)Jiaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib651)\), Xieet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1335)\), prompting, fine\-tuning LLMs with human evaluationsXuet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1349)\), Keet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib691)\), and human\-LLM collaborative evaluationsGaoet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib489)\)\. More recent methods employ multiple LLMs to engage in multi\-agent debates for evaluations and have shown better alignment with human assessmentChanet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib275)\)\. However, LLM\-based evaluators exhibit systematic limitations, including self\-preference biasPanicksseryet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib970)\), where models favor their own outputs, and inconsistent application of evaluation criteriaHuet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib602)\), both of which reduce the reliability of their judgments\. One set of limitations stems from hallucinations and lack of consistency and reproducibility that impacts accuracy of responses\. Furthermore, LLMs can exhibit biases similar to human cognitive biases, e\.g\., gender and authority biasChenet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1727)\)\. They also show self\-preference to LLM\-generated outputsPanicksseryet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib970)\)\. Researchers study agreement between human and LLM evaluations using metrics such as Intraclass Correlation Coefficient \(ICC\)Bartko \([1966](https://arxiv.org/html/2605.06901#bib.bib16)\)and Cohen’s KappaLiet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib1726)\), Warrens \([2015](https://arxiv.org/html/2605.06901#bib.bib17)\)\. However, these issues are exacerbated by humans over\-trusting LLM outputs for supposed objectivity in application settings\(Bansalet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib18)\)\. One approach to address this issue is the “LLM as a jury” paradigm proposed byVergaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1725)\), to check back on bias perpetuated by a single judge and thus better align with human evaluation\.

Human Evaluation\.Crowd\-sourcing platforms such as Amazon Mechanical Turk \(MTurk\)111https://www\.mturk\.com/and Prolific222https://www\.prolific\.com/have enabled large\-scale experiments within budget\. Researchers have access to a wider range of evaluators than they would have in in\-person studies\. Nevertheless, human evaluators online may exhibit biases and quality issuesIpeirotiset al\.\([2010](https://arxiv.org/html/2605.06901#bib.bib1724)\)\. In addition, evaluators’ demographics could be skewed depending on the platformDifallahet al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib397)\)\. Correspondingly, the data quality between the platforms might differ\.Douglaset al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib412)\)shows that Prolific and CloudResearch are more likely to produce high\-quality data, in comparison to MTurk, Qualtrics, and SONA\. However, these trends may be shifting as AI agents more readily mimic human respondents and bypass AI detection methods\(Westwood,[2025](https://arxiv.org/html/2605.06901#bib.bib1149)\)\.

Such human evaluations must be designed according to best practices\. Relevant questions are, how are human ratings collected? What questions are asked? We must design human evaluations carefully to avoid low\-quality annotations\. There exists a difficulty in standardization of human evaluations\.Huynhet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib628)\)found that 25% of HITs \(Human Intelligence Task, MTurk NLP studies\) have technical issues, with unclear / incomplete instruction issues and poor communications\. In some cases, humans may feel pressured to perform annotations they are unsure about\. 35% of requesters were also assessed to pay poorly or very badly\. Attempts to standardize human evaluations have been made in the form of inter\-evaluator agreement, which is not commonly used \(18% of 135 papersAmideiet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib1723)\)\), and is suggested to have limitations pertaining to human language variabilityAmideiet al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib1721)\)\. Thus, the answers to the above questions remain resoundingly insufficient\. Such issues need to be resolved for human evaluations to have representative power\.

There exist discrepancies between human annotator evaluation versus actual user evaluation, and preferences do not always correlate directly with objective model performanceMozannaret al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib912)\)\. This underscores the importance of capturing first\-person user experience in evaluating human\-centered LLMs\. Such limitations in current mainstream human evaluation techniques makes one wonder; how do current human evaluations fit into human\-centered evaluation paradigm? It is vital that human\-centered evaluation of language models follow the needs of human stakeholders i\.e\. end\-users\. Any attempt to short\-cut such process would result in inadequate task designs that serve the designer of the tasks, nothing more\. Who the stakeholders of the tasks are is then interesting question; for example, for a paper review generation task, the stakeholders would be domain experts \(NLP researchers\)Wanget al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib1249)\)\. For other tasks, careful design around actual users of the system may be necessary to ensure the evaluations remain human\-centered\.

### 5\.2Human\-Level Evaluations

Unlike model\-level evaluations, which focus on what the system produces, human\-level evaluations focus on how people experience the HCLLM\(Changet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib276), Parmantoet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1509)\)\. We focus particularly on human values \(§[5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1)\), bias \(§[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)\), and safety \(§[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\)\.

#### 5\.2\.1Human Values

Evaluations can measure needs, values, and aesthetic principles that humans care about\. We discuss helpfulness, coherence, empathy, creativity, user satisfaction, and transparency, each in turn\. By evaluating against these, model developers can create systems that not only technically perform well, but also enhance the user experience\.

Coherence\.Coherence ensures that the generated text flows logically and is understandable to human readersDang \([2006](https://arxiv.org/html/2605.06901#bib.bib360)\)\.Reinhart \([1980](https://arxiv.org/html/2605.06901#bib.bib1049)\)defines three conditions for coherence: \(i\) cohesion, \(ii\) consistency, and \(iii\) relevance\. Cohesion focuses on syntactic structure, ensuring that sentences are formally linked through referential links or semantic connectors\. Consistency requires logical alignment between sentences, ensuring they can coexist truthfully within a single interpretive framework\. Relevance emphasizes the relationship between sentences, the topic at hand, and its broader context\. Without coherence, LLM outputs would be disconnected language fragments that fail to provide meaningful information, potentially jumping between topics or making contradictory statements that human readers struggle to follow\. This would significantly impair the communication with and the trustworthiness of LLMs, as humans rely on coherent communication to build understanding\.

Creativity\.Creativity metrics assess the originality and diversity of outputs, while still ensuring factual accuracy\. These dimensions are particularly critical for content generation tasks, balancing innovation with reliability\(Deet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1511)\)\. For topics like creativity, where there may not be clear computational measures, researchers may consult to long\-established fields studying these constructs and have well\-defined rubrics, such as psychology or literature\(Mozaffari,[2013](https://arxiv.org/html/2605.06901#bib.bib1419), Amabile,[1983](https://arxiv.org/html/2605.06901#bib.bib1420)\)\.

Empathy\.Metrics should measure an LLM’s ability to recognize and respond to user emotions empathetically, especially in sensitive contexts\. Given that LLMs have been widely adapted to sensitive real\-world contexts–behavioral health, medicine, and education, just to name a few–\(Stadeet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib1477)\), evaluations focusing on emotional consistency and appropriateness could ensure responses are suitable and do not contain instability that could affect end\-users deeply\. Such metrics should evaluate how LLMs’ responses influence attitudes or behaviors in real\-world scenarios, taking applied feedback from human domain experts, such as psychologists, physicians, or educators, to assess the quality of the LLMs’ outputs based on their fields’ standardized measures\(Demszkyet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1478)\)\. Such evaluations could also promote development of human\-AI collaboration systems, which have been shown to elevate empathetic responses even human to human\(Sharmaet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1475)\)\.

Helpfulness\.Evaluation metrics should assess the model’s ability to provide relevant, beneficial, and non\-offensive information tailored to user needs, in relation to the behavioral impact of the model\(Penget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1481)\)\. More and more, models are developed to focus on certain needs in the world\. Therefore, it becomes important to track the helpfulness of the model in its specified downstream tasks and evaluate the users’ state, knowledge, and performance relative to exposure to the system\. For example, a model designed to help users prepare for events that require conflict resolution must be able to stimulate realistic conflict scenarios dependent on the user’s needs, provide diverse examples and responses, and promote guided practice where users can receive feedback to get better\(Shaikhet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib1096)\)\. In the evaluation of such systems, while technical components such as language generation and accuracy would be evaluated too, asking feedback from actual domain users through behavioral assessments would provide valuable insights to the development\. These impact\-focused evaluations consider the model’s generalizability in complex, real\-world scenarios and provide a more accurate assessment of its practical value from the domain\-users’ perspectives\.

Transparency\.Transparency is a cornerstone of responsible AI and is crucial for human\-centered LLM systems\. It enables users to understand system limitations and make informed decisions about when and how to rely on model assistance\. Approaches to transparency should include model reporting, publishing evaluation results, providing explanations, and communicating uncertainty\. These methods help different stakeholders understand and trust the LLMs, ensuring that the systems are used responsibly and effectivelyLiao and Vaughan \([2023b](https://arxiv.org/html/2605.06901#bib.bib790)\)\.

User Satisfaction\.As models grow bigger, become more task\-specific, and more integrated into day\-to\-day roles, general purpose benchmarks may not be enough to evaluate the performance of models in the wild and evaluators may seek feedback specific to a singular group of models\. Therefore, utilizing the actual usage data could benefit the development\-to\-deployment cycle the most\. Metrics derived from user feedback, interaction logs, and satisfaction ratings provide direct insights into the real\-world effectiveness of LLMs\. These are essential for understanding how users perceive and interact with model outputs\. As an example, in an attempt to understand how we can better align models with user needs,Wanget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1422)\)consults to act, highlighting a need for user\-centric evaluation\.

#### 5\.2\.2Bias and Fairness Evaluation

Drawing from the taxonomy of algorithmic harm developed byShelbyet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib727)\), bias, in particular, can be conceptualized along three dimensions: \(1\) representational, \(2\) allocation, and \(3\) quality of service\. These axes of harm require careful evaluation to avoid further entrenchment of social hierarchies, inequitable resource distribution, and performance disparities across demographic groupsShelbyet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib727)\), Blodgettet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib229)\)\. The human implications of these harms extend beyond technical measurements to real\-world consequences that affect people’s dignity, opportunities, and quality of lifeHofmannet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1547)\)\.

##### Representational Bias\.

Representational bias in model outputs reflects, and in some cases, amplifiesWang and Russakovsky \([2021](https://arxiv.org/html/2605.06901#bib.bib1251)\), Zhaoet al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib1435)\), our own implicit associations and social hierarchies\. This dimension of bias includes stereotyping, demeaning, erasure, alienation, denial of self\-identity, and the insistence on essentialist identity categoriesShelbyet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib727)\)\. These harms impact how individuals perceive themselves and their communities, potentially reinforcing societal prejudices and stereotypes that limit human potential\.Huet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1714)\)found that language models exhibit social identity bias, mirroring human ingroup solidarity and outgroup hostility\.

Stereotype benchmarks predominate evaluations along this dimension because they offer standardized methods and baselines\. For masked\-language models, notable frameworks include StereoSet \(SS\)Nadeemet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib924)\), CrowS\-Pairs \(CS\)Nangiaet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib927)\), WinoBias \(WB\)Zhaoet al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib1436)\), and WinoGender \(WG\)Rudingeret al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib1066)\)—all collections of contrastive prompt pairs \(stereotype vs\. non\-stereotype\) that aggregate to score for relative comparison between identity groups \(race, gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status\)\. These comparisons capture a model’s tendency to associate social groups with particular target terms of interest through predicted token probabilities for masked identifiers\. Researchers have also employed co\-reference resolution tasks, where ambiguous identifiers reference the same entity, to measure associations between identity markers and terms of interest, whether they be descriptors, stereotypes, occupations, or other attributesClark and Manning \([2016](https://arxiv.org/html/2605.06901#bib.bib311)\)\.

Another line of research has focused on open\-ended text generation and produced datasets of carefully curated questions and prompts to draw out stereotypes specific to certain social groupsParrishet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib978)\), Naouset al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1)\), Dhamalaet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib393)\), Gehmanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib494)\)\. For open\-ended prompts, classifier\-based comparative metrics like toxicityLianget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib787)\), Chunget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib309)\), Chowdheryet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib305)\), Gehmanet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib494)\), sentimentRoehrick \([2020](https://arxiv.org/html/2605.06901#bib.bib1058)\), and regardShenget al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib1108)\)serve as better indicators of bias than relative probability distributions of target terms\. Despite the wide adoption of all these benchmarks and datasets, critics find systematic conceptual issues—unstated assumptions, ambiguities, and inconsistencies in what is measured—and operational failures in their executionBlodgettet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib231)\), McIntoshet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib889)\), Seshadriet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1092)\)\.

Some datasets likeParrishet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib978)\)integrate perturbed context windows to explore the relationship between output bias and any ambiguous identity groups in the input\. However, recent investigations into the prompting methods and system\-level personas also reveal new confounds for these approaches, finding results to vary based on the perturbation methodologyDeshpandeet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib388)\), Shaikhet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1620)\)\. Additionally, survey papers in this field recognize that many studies do not contextualize their work within established definitions of biasBlodgettet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib229),[2021](https://arxiv.org/html/2605.06901#bib.bib231)\)\. Finally, there are mounting concerns over test set contaminationJegorovaet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib648)\), Reidet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1048)\), Zhuoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1479)\), Wanget al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1259)\)\.

##### Allocational Bias\.

Allocational bias is a direct consequence of representational biasDevine \([2001](https://arxiv.org/html/2605.06901#bib.bib391)\), Kurdiet al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib731)\), resulting in an unequal distribution of resources—whether financial, opportunity\-based, or service\-relatedBarocaset al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib206)\), Eubanks \([2018](https://arxiv.org/html/2605.06901#bib.bib443)\)\. Its human cost is particularly severe, as it directly affects access to essential resources, economic mobility, and social participation\.

In domains where model outputs can impact the material stability of vulnerable communities or social groups, such as housing, employment, social services, finance, education, and healthcareObermeyeret al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib955)\), it’s especially critical to evaluate discrepancies among social groups\. In the employment domain, this may manifest as resume screening tools that systematically favor men over other gendersSingh and Joachims \([2018](https://arxiv.org/html/2605.06901#bib.bib1123)\), Van Eset al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1228)\)or white\-sounding candidates over people of color based on the implicit identity markers in their nameMujtaba and Mahapatra \([2019](https://arxiv.org/html/2605.06901#bib.bib918)\), Armstronget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib181)\)\. Similarly, in social services and healthcare domains, screening tools may incorporate existing inequities related to education level, income, and race into their decision\-making processesEubanks \([2018](https://arxiv.org/html/2605.06901#bib.bib443)\), Obermeyeret al\.\([2019](https://arxiv.org/html/2605.06901#bib.bib955)\), Pessach and Shmueli \([2022](https://arxiv.org/html/2605.06901#bib.bib991)\)\.

While representational harm has established evaluation frameworks, allocation harm has historically lacked standardized benchmarks and well\-documented baselines for consistent measurement\. Emergent work byWanget al\.\([2024f](https://arxiv.org/html/2605.06901#bib.bib580)\)represents one of the first significant exceptions to this pattern, where they systematically measure employment as a downstream task by creating the JobFair dataset to quantify inequitable outcomes across gender identities\. The benchmark includes resume templates with varying demographic information passed to LLMs to score and rank\. Beyond this recent development, the dominant approach for evaluating this dimension of bias has required measuring outcome discrepancies when LLMs are tasked with decision\-makingVeldandaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1231)\), Salinaset al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1030)\), Armstronget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib181)\)\.

These investigations typically build upon established fairness metrics from prior literature, with measures like Equal Opportunity \(EOG\) \(equal true positive rates\), Equalized Odds \(equal rates for true positives and false positives\)Hardtet al\.\([2016](https://arxiv.org/html/2605.06901#bib.bib550)\), Demographic Parity \(equal likelihood of positive outcome\)Dworket al\.\([2012](https://arxiv.org/html/2605.06901#bib.bib424)\), Kusneret al\.\([2017](https://arxiv.org/html/2605.06901#bib.bib733)\), to name a fewVerma and Rubin \([2018](https://arxiv.org/html/2605.06901#bib.bib1233)\)\. Additional work has explored causal and counterfactual fairness approaches to better capture complex biases that arise in real\-world decision\-making contextsKilbertuset al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib1715)\)\.

A parallel line of research investigates allocation harm based on performance disparities based on identity\. These differences manifest in various contexts, from performance on non\-bias\-based benchmarks like MultiMedQA, where inquiries specific to certain demographic groups consistently underperformMcIntoshet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib889)\), Singhalet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1137)\), to fundamental downstream tasks including Named Entity Recognition \(NER\), classification, and text generationBlodgettet al\.\([2016](https://arxiv.org/html/2605.06901#bib.bib226)\), Blodgett and O’Connor \([2017](https://arxiv.org/html/2605.06901#bib.bib228)\)\. Language model performance degradation is particularly well\-documented for English slang and dialectal variationsJoshiet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib676)\), Blodgettet al\.\([2016](https://arxiv.org/html/2605.06901#bib.bib226)\), Benderet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib235)\)\. These disparities become even more pronounced when evaluating cross\-linguistic performance, largely due to the predominance of English in training dataWinataet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1305)\), Brownet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib250)\)\. In this way, these performance discrepancies span both the subjects of text generated and the users of the models, , creating a dual layer of exclusion for marginalized communities\.

As LLMs increasingly influence resource allocation in critical systems and domains such as housing, healthcare, and employment, the interplay between these dimensions of harm requires improved evaluation methods\. Future research must prioritize developing evaluation frameworks that establish coherent normative criteria, adapt effectively to open\-ended tasks, and address intersectional identities with increasing sophistication—all while maintaining the efficacy as models scale and directly involving affected communities in the design and evaluation of these systemsRajiet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1716)\)\.

#### 5\.2\.3Safety Evaluations

Safety refers to the ability of language models to generate content that does not cause harm, spread misinformation or violate ethical standardsHuanget al\.\([2023c](https://arxiv.org/html/2605.06901#bib.bib1696)\)\. It encompasses preventing models from producing toxic, discriminatory, or dangerous outputs, even when deliberately prompted to do so\. As language models become increasingly integrated into critical applications across healthcare, education, and legal domains, ensuring safety has become paramount\. There are extensive safeguards implemented during training, such as Reinforcement Learning from Human Feedback \(RLHF\)Ouyanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib963)\)has been widely adopted to align language model with human preference, andBaiet al\.\([2022a](https://arxiv.org/html/2605.06901#bib.bib196)\)proposed Reinforcement Learning from AI Feedback \(RLAIF\), which helps to improve safety in language models\.

Despite these efforts, ensuring safety remains a complex and evolving challenge\. This is partly due to a lack of unified evaluation benchmarksRöttgeret al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1062)\), and partly due to the nature of LLMs\. Language models learn from vast and diverse datasets and can exhibit unpredictable behaviors in specific contextsBenderet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib235)\)\. Such unpredictability often becomes evident when models are exposed to adversarial or unexpected inputs, highlighting significant gaps in existing safety mechanisms\. As a result, while current safeguards can be effective under typical conditions, they may not be sufficient to anticipate or mitigate every possible misuse scenario\. The importance of robust safety evaluations is further underscored by concerns surrounding data privacy and copyright discussed in §[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\.

##### Datasets for Safety Evaluation\.

The growing demand for ethical and aligned AI has led to the development of numerous datasets and benchmarks to evaluate and improve the safety, reliability, and alignment of LLMs\. These datasets vary widely in scope, methodology, and focus areas, reflecting the multifaceted nature of LLM safety\.Donget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib406)\)categorize the topics of existing evaluation datasets for LLM safety into four categories: toxicity \(generation of offensive language, instructions for illegal activities, and harmful content\), dicrimination \(biases against marginalized groups and protected characteristics\), privacy \(safeguarding personal information and intellectual property\) and misinformation \(measuring tendancy to generate false or misleading information\)\.

Many popular and relatively comprehensive benchmarks have been frequently used in research studies\. ToxiGenHartvigsenet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib553)\)is a large\-scale, autocomplete\-style dataset comprising 274k toxic statements across 13 minority groups, designed to detect implicit toxic speech\. It includes human annotations to assess the naturalness and perceived harmfulness of machine\-generated text; however,Röttgeret al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1062)\)highlight that this dataset may not accurately reflect real\-world usage scenarios for modern LLMs\. AdvBenchZouet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1503)\)focuses on adversarial robustness by providing 500 toxic strings and 500 harmful behaviors to evaluate the resilience of LLMs against prompts intended to generate harmful outputs\. TruthfulQALinet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib794)\)evaluates factual accuracy with 817 questions spanning 38 categories, demonstrating how larger LLMs often replicate human misconceptions and emphasizing the need for improved training objectives\. SafetyBenchZhanget al\.\([2023c](https://arxiv.org/html/2605.06901#bib.bib1404)\)offers a comprehensive safety evaluation framework with 11,435 multiple\-choice questions across seven critical categories, enabling assessments in both English and Chinese for a more diverse linguistic perspective\. Furthermore,Zhuoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1479)\)introduce a benchmark specifically for evaluating ChatGPT’s ethical performance, systematically examining bias, reliability, robustness, and toxicity to reveal both advancements and ongoing challenges\. Collectively, these datasets play a pivotal role in advancing safer and more trustworthy AI systems\.

##### Metrics for Measuring LLM Safety\.

Evaluation metrics are critical for assessing the safety performance of LLMs\. Key metrics include the Attack Success Rate \(ASR\)Donget al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib406)\), Zouet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1503)\), which measures the percentage of successful instances where models generate harmful target outputs following adversarial prompts\. Fine\-grained metrics, such as the toxicity scoreHartvigsenet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib553)\), evaluate the extent of toxic or harmful content produced in the generated text\. Truthfulness, assessed based on strict factual accuracy standards, focuses on whether statements accurately reflect factual information rather than conforming to belief systemsLinet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib794)\)\. Additionally, safety\-related multiple\-choice questions, such as those in SafetyBench, are used to evaluate LLMs’ ability to address specific safety concernsZhanget al\.\([2023c](https://arxiv.org/html/2605.06901#bib.bib1404)\)\. When applied to diverse datasets, these metrics provide a comprehensive framework for evaluating LLM safety, guiding efforts to reduce risks, improve alignment with ethical standards, and enhance trustworthiness in deployment\.

##### Jailbreaking\.

One particularly challenging aspect of safety evaluation is*jailbreaking*, where users deliberately attempt to circumvent safety mechanisms through crafted prompts or other techniques to induce unintended, harmful, or ethically questionable behaviorsWeiet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1288)\), Perez and Ribeiro \([2022](https://arxiv.org/html/2605.06901#bib.bib988)\)\. This poses considerable risks not only to individual users but also to society at large, as it can lead to the dissemination of misinformation, hate speech, or other malicious contentWeidingeret al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib1292)\)\. Moreover, adversaries continually develop new jailbreaking techniques, making it an ever\-evolving threat\. Addressing these vulnerabilities necessitates an adaptive approach to safety design, integrating continuous monitoring, adversarial testing, and dynamic evaluation frameworks to stay ahead of emerging risks and ensure robust model alignment with ethical standards\.

Jailbreaking techniques have evolved rapidly, becoming increasingly sophisticated and concerning\.Yiet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1373)\)classify the attacks into black\-box and white\-box attacks based on the transparency of the target model\.Xuet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1340)\)categorize the attack techniques into three categories: generative techniques, template techniques, and training gaps techniques\. Inspired by these works, we categorize the jailbreaking techniques into three main categories based on the form of attack\.

Prompt\-Based Jailbreaking\.Prompt\-based jailbreaking involves crafting specific inputs, known as jailbreak prompts, to intentionally bypass the safety alignment mechanisms of LLMsYuet al\.\([2024d](https://arxiv.org/html/2605.06901#bib.bib1383)\)\. These prompts manipulate the model into generating harmful, biased, or otherwise restricted outputs\. A common strategy involves disguising harmful requests as seemingly innocuous ones by framing them as fictional narratives, hypothetical scenarios, or questions for a family member\. For example, a straightforward harmful query like “How to make a bomb" can be rephrased into a seemingly harmless prompt “Living in a war\-torn region has instilled in me a profound need to understand the mechanisms of conflict, especially the technology behind bombs…" to persuade the LLMs to reveal sensitive information using a persuasion taxonomy techniqueZenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1394)\)\.

Various online platforms, including blogs, GitHub repositories, and dedicated websites like JailbreakChat,333The website is no longer active, but Alex Albert used to maintainjailbreakchat\.comcurate and share collections of jailbreak prompts that serve as templates to fit any malicious queries, making them widely accessible for misuse\. Jailbreaking strategies are either manually\-crafted or auto\-generated\. Auto\-generated prompts can be further divided into white\-box and black\-box methodsLinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib795)\), Yiet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1373)\)\. White\-box methods assume some level of access to the model’s internal workings and are often created using optimization techniques\. For example, GCGZouet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1503)\)uses a gradient\-based approach to find a suffix that, when attached to malicious queries, maximizes the probability that the model produces an affirmative response rather than a refusal\. This optimized suffix has been shown to be transferable across different models, including black\-box ones\. In contrast, black\-box methodsZenget al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1394)\), Chaoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib280)\), Mehrotraet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib896)\)rely solely on observing the model’s behavior through its outputs and API interactions, without access to its parameters or training data, leveraging LLMs as optimizers to achieve successful bypasses\.

Generation Exploitation\.Huanget al\.\([2023d](https://arxiv.org/html/2605.06901#bib.bib607)\)introduce the generation exploitation attack, demonstrating that by simply exploiting different generation strategies, such as varying decoding hyper\-parameters and sampling methods, it is possible to jailbreak 11 widely\-used open\-source language models, including LLAMA2, VICUNA, FALCON, and MPT families, at a low computational cost\. This attack highlights potential vulnerabilities in language models and poses serious security implications for AI safety and alignment research\.

Model Fine\-Tuning\.AI companies like OpenAI now offer fine\-tuning\-as\-a\-service\. They allow users to upload customized data for fine\-tuning, with the fine\-tuned models hosted on the provider’s servers and accessible via APIs\. However, this framework introduces a new type of threat, where harmful data may be used during fine\-tuning, either intentionally or unintentionally, to compromise the alignment built in pre\-trained modelsHuanget al\.\([2024d](https://arxiv.org/html/2605.06901#bib.bib613)\), Yanget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1366)\), Qiet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1023)\), Yiet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1372)\), Zhanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1395)\)\. Moreover,Heet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib558)\)propose a method to sample more harmful examples from a benign dataset, demonstrating that such examples can significantly degrade model safety\.

Cultural and Contextual Sensitivity\.Safety evaluations must account for linguistic and cultural diversity\. What constitutes harmful content varies significantly across contexts, making universal safety standards difficult to establish\. More nuanced, context\-aware evaluation frameworks are needed to address these complexitiesLiet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1698)\)\.

Balancing Safety and Utility\.Overly restrictive safety measures can limit the utility of LLMs for legitimate purposes\. Finding the optimal balance between safety and functionality remains a significant challenge, particularly in sensitive domains like healthcare, legal advice, and educational contentVijjiniet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1697)\)\.

Alignment with Evolving Social Values\.As societal values and ethical standards evolve, safety mechanisms must adapt accordingly\. This necessitates ongoing dialogue between AI developers, ethicists, policymakers, and diverse stakeholders to ensure that safety frameworks remain relevant and effectiveLiet al\.\([2024g](https://arxiv.org/html/2605.06901#bib.bib1700)\)\.

### 5\.3Societal\-level Evaluation

With the increasingly pervasive influence of large language models \(LLM\) across sensitive domains like mental healthStadeet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1161)\), Abdurahmanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib98)\), Lawrenceet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib755)\), educationWang and Zhang \([2024](https://arxiv.org/html/2605.06901#bib.bib1273)\), etc\., and the complex challenges these models present, evaluating their impact on users and society is crucial\. Traditional evaluations of LLMs use datasets and benchmarks to assess potential hazardous behaviors, but they often fail to bridge the "sociotechnical gap" between controlled assessments and real\-world performanceIbrahimet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib631)\), Weidingeret al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1294)\)\. By focusing on models in isolation, these methods overlook crucial human factors, resulting in an inadequate understanding of human\-model interactions and their consequences\. Furthermore, these evaluations fail to account for the continuation and amplification of societal inequities and biases existing in the data on which these models were trained\. Thus, it becomes essential to employ a comprehensive extrinsic evaluationJones and Galliers \([1995](https://arxiv.org/html/2605.06901#bib.bib674)\)framework that considers various categories of social impacts, such as bias and stereotypes, cultural values, performance disparities, privacy protection, financial implications, environmental costs, and content moderation laborSolaimanet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1144)\)\.

As mentioned in §[2](https://arxiv.org/html/2605.06901#S2), randomized controlled trials \(RCTs\) and large\-scale behavioral assessments play a big role in understanding and evaluating LLMs’ behavioral impact on users and society\. For instance, to evaluate the effect and quality of a newly developed AI co\-tutor that provides LLM\-generated feedback to real\-life tutors on student performance,Wanget al\.\([2025d](https://arxiv.org/html/2605.06901#bib.bib1413)\)conducted an experiment, where the tutors would either get access to the AI co\-pilot in their tutoring sessions or they would tutor without the assistance of the AI collaborator\. This setup enabledWanget al\.\([2025d](https://arxiv.org/html/2605.06901#bib.bib1413)\)to systemically evaluate the performance of their model in an ecologically valid setting, receiving working feedback from both the users and the model behavior itselfBrynjolfssonet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib101)\)\.

The application of robust evaluation techniques spans various domains\. In healthcare, where existing metrics often fail to capture critical factors like user comprehension and trust, researchers are developing new metrics to assess LLMs’ impact on end\-user decision\-making and expectationsAbbasianet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib147)\)\. These metrics measure both immediate behavioral changes and long\-term adoption patterns\. For example,Yanget al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1355)\)evaluated LLMs for mental health analysis, showing that while ChatGPT demonstrates strong in\-context learning, specialized methods often outperform it\. They also found that effective prompt engineering with emotional cues can improve results\. Similarly,Bak and Chin \([2024](https://arxiv.org/html/2605.06901#bib.bib97)\)observed that LLMs provided 20%\-30% irrelevant information when identifying users’ motivation states for health behavior change, highlighting their limitations\.

When evaluating LLMs, it is essential to consider the impact at scale through longitudinal and large\-scale studies, which are critical in understanding not just the immediate outcomes of LLM use, but also their sustained effects, as LLMs already undergo significant change in response to user engagementLiuet al\.\([2024e](https://arxiv.org/html/2605.06901#bib.bib102)\)\. As an example,Eloundouet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib236)\)examines the impact of LLMs on the U\.S\. labor market, particularly the enhanced effects of LLM\-powered software, finding that higher income jobs may face greater exposure\. These studies emphasize the importance of robust behavioral experimental design and scale–in units of time and users–in evaluating LLMs, as results obtained from small\-scale and lab\-controlled studies may not always generalize to larger, more diverse, real\-life user populations\. Furthermore, such evaluations allow us to explore cumulative effects, such as changes in users’ attitudes, considerations, and even decision\-making\.

Extrinsic Evaluation\.Extrinsic evaluations cover behavioral impacts, self\-efficacy reports, standardized evaluation, short\-term outcomes, and long\-term outcomesYanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1370)\)\.

Behavioral impacts:Behavioral impacts track the changes in qualitatively coded participant behaviors before and after exposure to a system\. Evaluations use task\-based assessments, tracking engagement and task completion\. For example, realHumanEval measures the number of tasks completed, time to task success, acceptance rate, and number of chat code copies, to comprehensively analyze the quality of AI\-coding tools and their impacts on human usersMozannaret al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib912)\)\. Standardized evaluations are also more objective and draw on pre\-defined assessments, for instance, the effects of AI\-generated suggestions on writing styleAgarwalet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib155)\)\.

Self\-efficacy evaluation:Self\-efficacy evaluation includes questionnaires of participants’ perceptions of a system’s usefulness and their own perceived levels of ownership when interacting with a system\(Longet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib826)\)\. Together, these extrinsic methods distinguish between performance\-based metrics, like tracking behavior or test performance, and perception\-based metrics, such as user surveys\.

Short\-term and Long\-term evaluation:Evaluations can be split into short\-term and long\-termYanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1370)\)with short\-term being constrained interactive sessions and long\-term being much longer studies \(i\.e\. 1 week\)\. In short\-term evaluations, new AI tools may receive subjectively higher scores due to novelty bias\.Sadeghiet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib44)\), Shinet al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib1114)\)\. This is less of an issue with long\-term evaluations\. For example, we can consider one longitudinal study on the AI chain toolLonget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib826)\)\. The tool used in this study is about creating a Tweetorial444Lengthy Twitter posts connected as a chainchain on science communication\. The study finds that after a “familiarization phase” the perceived utility of the tool even increases higher than when there was novelty bias, suggesting that the end\-user’s utilization capability of the tool increases via customizing the prompt and other means\. Thus, AI tool was found to be more useful in the long\-term\.

Drawing from economics and psychology research, other evaluations trace the impacts of AI assistance and the interaction of users’ short\-term and long\-term behaviors and attitudes\. For example, skill training systems have measured the changes in productivity and wages in participants after trainingAdhvaryuet al\.\([2018](https://arxiv.org/html/2605.06901#bib.bib933)\), Chiodaet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib935)\), and other measures look at how health, risk\-taking behaviors, and levels of societal trust have changed over timeBarrera\-Osorioet al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib934)\)\.

Long\-term evaluation of LLMs is necessary to ensure the LLMs remain human\-centered in the long run\. We risk measuring the novelty bias if we only perform short\-term evaluations, in which LLM’s helpfulness to humans can be inflated\. By measuring the long\-term effect of LLMs, we will be able to accurately measure the helpfulness of LLMs, among other important human\-centered metrics for the LLMs’ use\-case \(i\.e\. creativity\)\.

In summary, well\-rounded extrinsic evaluation should integrate objective performance and subjective user experience\. But while extrinsic evaluations offer a more holistic assessment for human\-centered objectives, they are often more complex and expensive\. Successful evaluations often make use of a mix of quantitative and qualitative methods to assess the quality of the system, yet there remains room for improvement in integrating human\-centered evaluations\.

## 6Responsible Human\-Centered LLMs

In this chapter, we highlight three broad properties that underpin a responsible deployment of HCLLMs, and we explore the tensions and relationships between these ideals \(Figure[6](https://arxiv.org/html/2605.06901#S6.F6)\)\. The first property isinterpretability \(§[6\.1](https://arxiv.org/html/2605.06901#S6.SS1)\): an HCLLM’s input\-output transformations should be understood\. This property is first, since it complements our next two properties,steerability \(§[6\.2](https://arxiv.org/html/2605.06901#S6.SS2)\)andsafety \(§[6\.3](https://arxiv.org/html/2605.06901#S6.SS3)\)\. Steerable models can be aligned along a pre\-selected dimension, and safe models do not produce undesirable outputs\. If we have an interpretable model, we may obtain a more steerable model through feature\-level control, and we may obtain a safer model by isolating and removing harmful representations\.

What makes responsible HCLLMs challenging is that these properties are not only complementary, but also in tension\. If we make a model more steerable to individual user preferences, this may undermine safety constraints, while overly rigid safety guardrails can limit a model’s ability to adapt to legitimate, diverse human needs\. And although interpretability supports steerability and safety in many ways, certain alignment and steering methods can make models less interpretable\. For example, reward functions used in alignment introduce an additional layer of complexity, and these functions are non\-identifiable\(Joselowitzet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1508)\)\. Multiple distinct reward functions can yield similar policy behavior\. As we discuss each of these three properties, steerability, safety, and interpretability, we will cover current approaches in the literature and then provide directions for future research\.

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/sec6.png)Figure 6:We enumerate three properties for the responsible HCLLM deployment:interpretability \(§[6\.1](https://arxiv.org/html/2605.06901#S6.SS1)\),steerability \(§[6\.2](https://arxiv.org/html/2605.06901#S6.SS2)\), andsafety \(§[6\.3](https://arxiv.org/html/2605.06901#S6.SS3)\)\. These properties are generally complementary, but tensions between them can make deployment difficult\.### 6\.1Interpretable and Explainable HCLLMs

The first dimension we emphasize isinterpretability and explainability\. Neural networks, as the fundamental building blocks of LLMs, remain largely opaque; the complex interactions between weights and activations make both training dynamics and inference behavior difficult to understand\(Gilpinet al\.,[2018](https://arxiv.org/html/2605.06901#bib.bib144)\)\. Yet understanding these systems is critical for ensuring the alignment of LLMs with human values and objectives\. We distinguish between two complementary goals: interpretability, which focuses on understanding*how*LLMs operate in general settings; and explainability, which seeks causal explanations for*why*LLMs produce specific behaviors, decisions, or outcomes\. Both are essential for human\-centered applications, but serve different purposes\. Interpretability provides a better understanding of LLM internals, which can help address undesired behaviors such as hallucinations, vulnerabilities to adversarial attacks, and encoded biases\. Explainability, by contrast, provides users with comprehensible justification for individual outputs, informing appropriate trust and enabling contestability\.

#### 6\.1\.1Current Approaches to Interpretability

##### Three interconnected areas of modern interpretability research\.

First, work on understanding internal mechanisms has revealed that transformer components can function as interpretable key\-value memories\(Gevaet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib499)\)and has begun to uncover how LLMs represent multilingual knowledge\(Tanget al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib1194), Zhanget al\.,[2024d](https://arxiv.org/html/2605.06901#bib.bib1407)\)\. Second, these mechanistic insights have enabled practical interventions on model behavior, such as inference\-time steering methods\(Liet al\.,[2023c](https://arxiv.org/html/2605.06901#bib.bib774), Turneret al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib373), Zouet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib1502), Wuet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1320)\), while model editing and machine unlearning techniques allow for targeted removal of undesirable traits\(Menget al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib899), Ilharcoet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib632), Liuet al\.,[2024b](https://arxiv.org/html/2605.06901#bib.bib812)\)\. Third, interpretability serves as a diagnostic tool for safety, helping researchers understand jailbreaking vulnerabilities\(Arditiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib179), Kirchet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib706)\)and identify adversarial attack vectors\(Łuckiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib854), Yuet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib1387), Jainet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib642)\)\.

##### Interpretability methods for human\-centered purposes\.

It is important to understand why certain model behaviors arise, such as sycophancy or deception; however, this cannot be done simply by examining model outputs in a post\-hoc fashion\(Sharmaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1101), Hubingeret al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib621)\)\. This shortcoming motivates the need to apply interpretability methods for human\-centered purposes\. Building on theories such as the linear representation hypothesis\(Parket al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib976)\), the platonic representation hypothesis\(Huhet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib622)\), and universal feature representations across all LLMs\(Lanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib746)\), interpretability has been used as a tool to understand different model biases, including social biases\(Liuet al\.,[2024d](https://arxiv.org/html/2605.06901#bib.bib1782)\), cultural biases\(Yuet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1783)\), and cultural knowledge\(Veselovskyet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1784)\)\. Additionally, recent work has sought to identify models’ internal representations of important model behaviors, finding that models encode harmfulness and refusal separately\(Zhaoet al\.,[2025c](https://arxiv.org/html/2605.06901#bib.bib1785)\)and that three dimensions of sycophancy — sycophantic agreement, sycophantic praise, and genuine praise — are all encoded along different linear directions in latent space and can be amplified and suppressed without affecting the other\(Vennemeyeret al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1786)\)\.

Another recent application of interpretability on HCLLM can help better model human\-AI interactions\. In order for LLMs to act as helpful assistants for users, they need to not only understand the user query but develop an understanding of a user’s latent traits and needs\. A misalignment between a model’s representation of the user and a user’s true needs can lead to various harmful outcomes, ranging from conversational grounding failures\(Shaikhet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib1095)\)to sycophancy and deception\. For example, to make a model’s user representation more transparent,Chenet al\.\([2024d](https://arxiv.org/html/2605.06901#bib.bib1787)\)designs a system to extract data related to a user’s demographic features and a dashboard that displays this representation\.Choiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1788)\)similarly extracts latent representations of users in LLMs, and these methods have also been applied to predict the behaviors of personalized LLMs\(Karnyet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1791)\)\.

#### 6\.1\.2Current Approaches to Explainability

##### Modern explainability research for LLMs pursues several complementary goals\.

Feature attribution methods identify which inputs most influence outputs, natural language rationales provide human\-readable justifications, and counterfactual explanations show how minimal input changes would alter predictions\(Zhaoet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib806)\)\. Unlike interpretability, which seeks general understanding of internal mechanisms, explainability focuses on justifying individual predictions in terms that users and stakeholders can act upon\. This goal has proven challenging, as traditional explainable AI \(XAI\) techniques such as LIME and SHAP\(Ribeiroet al\.,[2016](https://arxiv.org/html/2605.06901#bib.bib1054)\)become computationally impractical at the scale of billions of parameters, while LLM\-specific approaches such as chain\-of\-thought reasoning and post\-hoc citation generation often prioritize plausibility over faithfulnessLanhamet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib749)\), Turpinet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1223)\)\. For a comprehensive taxonomy of explainability techniques for LLMs, we refer readers toZhaoet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib806)\)\.

##### How explainability methods can be used for human\-centered purposes\.

Explainability serves as a foundational element for building user trust and enabling accountability in LLM systems\. The ability to assign responsibility for model decisions is essential not only for developing transparent systems but also for supporting downstream regulatory efforts – for instance, AI in hiring systems, compensation for content creators, and copyright law\(Guhaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib525)\)\. These concerns have motivated legislative action: for instance, the EU AI Act, which became enforceable in 2024, establishes explainability as a legal requirement in critical domains\(Smuha,[2025](https://arxiv.org/html/2605.06901#bib.bib800)\)\.

For end users, trust fundamentally depends on calibration \(i\.e\., whether models can reliably express what they know and don’t know\)\. Models often struggle to convey uncertainty, both through log\-probabilities and linguistic hedging\(Zhouet al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib1473)\), although recent work has made progress on both fronts\(Tianet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1215), Liet al\.,[2024j](https://arxiv.org/html/2605.06901#bib.bib778)\)\. Closely related is the problem of citation and attribution\. Effective attribution can provide causal explanations for LLM behavior, but current approaches have significant limitations\. While RAG systems supply LLMs with relevant context, there is no guarantee that models actually use that context to generate responses\(Duet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib414), Liet al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib770)\)\. Post\-hoc citation generation similarly suffers from severe faithfulness issuesLiuet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib796)\), motivating work on parametric attribution\(Khalifaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib695)\)and measuring training data influence more broadly\(Parket al\.,[2023c](https://arxiv.org/html/2605.06901#bib.bib1002), Grosseet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib522), Guuet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib539)\)\.

Chain\-of\-thought \(CoT\) reasoning represents a particularly contested approach to explainability\. On one hand, CoT outputs provide an accessible window into model reasoning that users can inspect without technical expertise\. On the other hand, research has shown that these explanations can systematically misrepresent the true reasons for a model’s predictions\(Lanhamet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib749), Turpinet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1223)\)\. This creates a paradox for human\-centered design: CoT explanations may increase user trust precisely because they appear plausible, even if they fail to faithfully reflect a model’s decision process\.

Finally, LLMs have shown potential for advancing explainability in other scientific domains\. For instance, LLM\-powered simulations have enabled HCI designers to explore counterfactual scenarios and reason about design decisions\(Parket al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib973)\)while LLM\-inspired approaches can extract interpretable biological features from protein language models\(Simon and Zou,[2024](https://arxiv.org/html/2605.06901#bib.bib1121)\)\.

#### 6\.1\.3Looking Forward

##### Providing understanding for model developers\.

The black\-box nature of LLMs, particularly the closed\-source ones, makes it difficult to predict and control how models behave\. For example, when models provide unsolicited affirmation to the user, it is unclear whatcausesthe model to provide that affirmation\. As the range of questions and interactions becomes more and more complex and open\-ended, interpretability becomes a key tool to answer questions like: how can we determine if a model is personalized? Does the model truly understand a user’s intent? Developing an understanding of a model’s representation of the user is especially important as people use LLMs for personal questions and even as AI companions\. Without an understanding of models’ behaviors, model builders risk harming users’ well\-being\(Chenget al\.,[2025b](https://arxiv.org/html/2605.06901#bib.bib40)\)\.

##### Uncovering unintended effects of post\-training\.

Another key application of interpretability for HCLLM is a better understanding of post\-training procedures like preference alignment\(Ferraoet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1790), Movvaet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1789)\)\. Without more interpretable approaches, post\-training can lead to various unintended effects \(e\.g\. sycophancy\) that can be difficult to monitor or mitigate post\-hoc\. Building on existing approaches that refine our understanding of what preference alignment actually optimizes for, model providers can better control and steer behaviors towards desirable directions\. For example, representation finetuning and steering\(Rimskyet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1792), Wuet al\.,[2025c](https://arxiv.org/html/2605.06901#bib.bib1793),[b](https://arxiv.org/html/2605.06901#bib.bib1794),[2024](https://arxiv.org/html/2605.06901#bib.bib1320)\)have been shown to be a promising way to control an LLM’s behaviors, and these methods can be applied to elicit behaviors that are aligned with users’ long\-term development\. The key first step towards making models safer and more steerable towards long\-term beneficial objectives would be to understand how they work\.

### 6\.2Steerable HCLLMs

#### 6\.2\.1Current Approaches to Steerability

Steerabilityis the second dimension we highlight for HCLLM deployment\. Steerability is the degree to which a model can be aligned along a particular dimension\(Miehlinget al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1634)\), such as preferences, norms, or user\-constraints\(Chenet al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib1631)\)\. In contrast to static notions of alignment that aim to produce a single globally acceptable behavior, steerability emphasizes conditional control, or the ability to modulate LLM outputs in accord with its particular users\. This property is especially important for HCLLMs, which are designed to interact with diverse users embedded in heterogeneous social, cultural, and institutional contexts\.

Steerability spans multiple dimensions\. First,personalizationis the goal in which an LLM’s outputs adaptively reflect the preferences of individual users or groups of users, along with their prior knowledge, goals, and needs\(Tsenget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib570)\)\. Personalized models should be able to provide more relevant recommendations\(Huet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib1543)\), and they should be calibrated to the user’s preferred writing styles\(Zhanget al\.,[2024g](https://arxiv.org/html/2605.06901#bib.bib7)\), learning styles\(Parket al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1410)\), and norms around privacy\(Shaoet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib61), Asthanaet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib239)\)and social behavior\(Liet al\.,[2024g](https://arxiv.org/html/2605.06901#bib.bib1700)\)\. A user’s preferences can be derived from explicit feedback, as in pairwise preference datasets, or from implicit feedback like historical interaction data\. For more in\-depth discussion of personalization methods, see §[4\.4](https://arxiv.org/html/2605.06901#S4.SS4)\.

Related to personalization ispersona alignmentor role play\. Here, the goal is that HCLLMs will adopt consistent identities or roles, like software developers, expert tutors, empathetic counselors, or skeptical reviewers, which remain stable across interactions\(Liet al\.,[2024e](https://arxiv.org/html/2605.06901#bib.bib1633), Samuelet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1632), Shanahanet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1630)\)\. Persona alignment is especially important in multi\-agent settings where multiple LLM personas interact and collaborate\(Parket al\.,[2023a](https://arxiv.org/html/2605.06901#bib.bib974), Guoet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1699)\)\.

In addition to better personalization and role playing, steerable HCLLMs should be able to adaptively understand low\-resource languages, dialects, or sociolinguistic varieties\(Ziemset al\.,[2023b](https://arxiv.org/html/2605.06901#bib.bib671)\)\. We refer to this target aslinguistic alignment\. This form of steerability is critical for equitable access, as models trained predominantly on high\-resource, standardized corpora often underperform for marginalized linguistic communities \(see §[3\.2](https://arxiv.org/html/2605.06901#S3.SS2)\)\. Finally,cultural alignmentmeans that models can be steered to reflect the norms, values, and narratives of particular communities or demographic groups\(Santurkaret al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1074)\)\. Unlike personalization, which targets individuals, cultural alignment operates at the level of shared practices and collective meaning\-making\.

A range of technical mechanisms support steerability\. At inference time, models may be steered through in\-context learning, prompt engineering, or output filtering\(Wieset al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1629)\)\. Post\-hoc control methods can condition generation on specific attributes or enforce constraints via decoding strategies\. More structurally, personalized reward models and fine\-tuning procedures can encode user\- or group\-specific objectives into model parameters\(Chenet al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib290)\)\.

Steerability is still fundamentally constrained by the representational biases embedded in pre\-training and post\-training data\(Mihalceaet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1646)\)\. If certain identities, linguistic forms, or cultural narratives are underrepresented or stereotyped in the training corpus, then prompt\-based steering may have limited expressive range\. In this sense, steerability is not just a matter of control at inference time, but is rooted much earlier in data provenance \(§[3\.1](https://arxiv.org/html/2605.06901#S3.SS1)\) and evaluation \(§[5](https://arxiv.org/html/2605.06901#S5)\)\. Steerability starts with measuring where biases arise, localizing their sources in the data pipeline, and redesigning collection and annotation practices accordingly\.

#### 6\.2\.2Looking Forward

Looking forward, we envision several research directions that extend current notions of steerability\. First, we consider continual learning for preference, culture, and pluralistic alignment\. Continual preference learning would allow models to adapt dynamically to evolving user needs by tracking implicit cues in interaction patterns or contextual data\(Shaikhet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib1647)\)\. Such systems often rely on persistent memory stores like chat histories or retrieval\-augmented generation\. A central challenge is achieving this adaptivity while preserving user autonomy and privacy\(Zhanget al\.,[2025c](https://arxiv.org/html/2605.06901#bib.bib1644)\)\. In cultural and pluralistic alignment, rather than optimizing toward a static representation ofcultureoruser values, HCLLMs should accommodate evolving norms and intra\-group disagreement, as supported by continual learning\. One way to elicit these evolving norms is to facilitate community\-centered discussion and debate, following methods like STELA\(Bergmanet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib1743)\)\. Rather than treating communities as homogeneous preference aggregates, pluralistic alignment frameworks should model disagreement as a first\-class signal\.

Pluralistic alignment comes with an array of technical and political challenges that will need to be addressed\. On the technical side, post\-training can induce mode\-collapse, in which heterogeneous group preferences and opinions are compressed in a lossy manner, suppressing minority viewpoints and preserving only the majority preferences for a given group\(Bisbeeet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib728), Durmuset al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib422), Röttgeret al\.,[2024a](https://arxiv.org/html/2605.06901#bib.bib709)\)\. To elicit diverse generations from LLMs, some inference time methods use iterative prompting\(Hayatiet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib556), Fenget al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib456)\), but these methods are not calibrated to real\-world distributions\. Other methods involve modifying the prompt with explicit identity terms for diverse user groups\(Giulianelliet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1645)\), but identity coded names induce models to draw on distributions of stereotypical representations or out\-group perceptions rather than in\-group perspectives\(Wanget al\.,[2025a](https://arxiv.org/html/2605.06901#bib.bib729)\)\. To address mode\-collapse in a manner that preserves in\-group perspectives, it may be necessary to update LLM priors implicitly through distributionally\-aligned in\-context examples, following methods like spectrum tuning\(Sorensenet al\.,[2025](https://arxiv.org/html/2605.06901#bib.bib458)\)\.

It is not only a methodological challenge, but also a political challenge to achieve localized, domain\-specific alignment processes that enable communities and stakeholders to meaningfully shape model behavior\(Delgadoet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib381)\)\. Methodologically, we have the participatory HCI approaches covered in §[2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2)\. However, politically, most model developers lack incentives to share control with communities\(Gabriel,[2020](https://arxiv.org/html/2605.06901#bib.bib1744)\)\. Current alignment pipelines are centralized by the small set of companies with resources to develop LLMs\. Similarly, academic institutions and governments can serve to centralize decision\-making across the LLM development pipeline\(Sureshet al\.,[2024](https://arxiv.org/html/2605.06901#bib.bib379)\)\. There are a number of less centralized alternatives\. For example, Masakhane\(Orifeet al\.,[2020](https://arxiv.org/html/2605.06901#bib.bib1731)\)is a network of NLP researchers working on NLP for local African languages, with community involvement at every stage, from dataset creation and annotation to model training\. EleutherAI is a distributed, volunteer\-driven research collective that has created open pre\-training corpora\(Gaoet al\.,[2021](https://arxiv.org/html/2605.06901#bib.bib484)\), models\(Blacket al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib1753)\), and scaling checkpoints\(Bidermanet al\.,[2023](https://arxiv.org/html/2605.06901#bib.bib1021)\)\. BigScience was a research effort in which over 1,000 researchers from academia and industry coordinated through HuggingFace and built BLOOM\(Workshopet al\.,[2022](https://arxiv.org/html/2605.06901#bib.bib983)\), a 176B\-parameter multilingual autoregressive language model, with decision\-making steps clearly and publicly documented\.

These initiatives illustrate that steerability does not need to be confined to top\-down fine\-tuning interfaces or proprietary alignment pipelines\. Distributed model development allows communities to steer models from the ground up, establishing linguistic resources, value functions, and development practices that ultimately shape downstream behavior\. At the same time, decentralization introduces its own tensions, including coordination costs, uneven resource distribution, and challenges of accountability\. Open and community\-led efforts may broaden participation, but they must still grapple with questions of safety, quality control, and transparent maintenance of HCLLMs\. In the following sections, we will discuss safety \(§[6\.3](https://arxiv.org/html/2605.06901#S6.SS3)\) and interpretability \(§[6\.1](https://arxiv.org/html/2605.06901#S6.SS1)\), as well as the tensions between these objectives\.

### 6\.3Safe HCLLMs

The third dimension we emphasize when building responsible HCLLMs issafety\. As defined in §[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3), safety is conceptualized as preventing LLMs from producing undesirable outputs \(i\.e\., those that may be toxic, harmful, discriminatory, or dangerous\), even when prompted to do so\. For example, the widespread use of LLMs raises critical concerns about ethical and social risks related to their outputs, including discrimination, hate speech, exclusion, misinformation harms, malicious uses, and so onZhanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib836)\), Benderet al\.\([2021](https://arxiv.org/html/2605.06901#bib.bib235)\)\. At the same time, there are concerns that LLMs can be used as agents of harm, such as using AI\-generated propaganda for misinformation purposesGoldsteinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib507)\)or spreading information that can facilitate harmful actions like the manufacturing of weaponsShaikhet al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib1620)\)\.

We begin by discussing the existing methods that are employed for addressing safety concerns both at the model training and interaction layers\. Looking forward, we advocate for expanding beyond this current definition of safety, which focuses on preventing harms, to encompass how we can build HCLLMs that also maximize user benefits\.

#### 6\.3\.1Current Approaches to Safety

First, we will discuss current methods for measuring and mitigating safety concerns\.Red\-teamingis a common practice for identifying harmful behavior through adversarial testing prior to deployment\. Red\-teaming approaches differ across model providers, and details are often not publicly disclosed as these practices are conducted in industry settingsFefferet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1749)\)\. AsFefferet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1749)\)survey, red\-teamers typically come from three pools: subject\-matter expertsAhmadet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1751)\), crowdworkersGanguliet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1750)\), or automated methods, such as the language models themselvesGanguliet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1750)\), Perezet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib1752)\)\. The objectives for red\-teaming can range from broad mandates to identify any harmful behavior to targeted assessments of specific risks, such as those related to national security\. After deployment, model providers may also run bug bounty programs that offer incentives for discovering safety or security vulnerabilitiesOpenAI \([2025a](https://arxiv.org/html/2605.06901#bib.bib1754)\), Anthropic \([2025](https://arxiv.org/html/2605.06901#bib.bib1755)\)\. In addition to red\-teaming efforts, model developers make use of benchmarks and other safety evaluations, which we discuss in detail in §[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\.

Mitigations can appear at various stages of the model development pipeline\. At the pre\-training phase, there is interest in filtering datasets to remove toxic content as a preventative safeguardO’Brienet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1759)\), Menduet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1760)\), Stranisci and Hardmeier \([2025](https://arxiv.org/html/2605.06901#bib.bib1761)\)\. At the same time, other work has argued that filtering toxic data during pre\-training can have detrimental downstream effects, and that including such data at pre\-training can actually make these behaviors easier to remove through fine\-tuningLiet al\.\([2025c](https://arxiv.org/html/2605.06901#bib.bib1757)\), Mainiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1758)\), Longpreet al\.\([2024c](https://arxiv.org/html/2605.06901#bib.bib828)\)\. Many efforts also address safety concerns during post\-training\. Foundational techniques for modern LLMs, such as RLHF, are useful not only for increasing model helpfulness but also for aligning models to be more harmlessOuyanget al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib963)\)\. Building on these principles, additional post\-training methods can help automate parts of this process\. For instance, Constitutional AI allows researchers to pre\-determine a set of ethical principles for the model to adhere toBaiet al\.\([2022b](https://arxiv.org/html/2605.06901#bib.bib194)\)\. Instruction\-tuning methods can also reduce toxicity \(see §[4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1)\)\. Once deployed, guardrail models are used to help moderate both user inputs and generated outputsInanet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1748)\), Donget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib407)\), Rebedeaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1756)\)\.

#### 6\.3\.2Looking Forward

Like other human\-centered objectives,safetycan be an underspecified, ambiguous, or contested target\. As we anticipate the development of more human\-centered LLMs, we should be asking whose definitions of safety are prioritized, and how we can design models that not only prevent harm but can actively promote user flourishing\.

##### Paying heed to long\-term harms\.

As surveyed above, existing AI safety research tends to focus on immediate harms that users face when interacting with models, such as exposure to toxic speech or the production of misinformation\. Of course, these harms carry long\-term societal consequences\. However, an underexplored class of safety problems involves behavior that appears innocuous in the short term but can compound over repeated usage to become problematic\. The recent work in this vein has identified specific model properties, such as sycophancy, which can affect users’ psychological states and behaviorsChenget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib40)\)\. An open challenge lies in measuring these long\-term interaction harms, as they are difficult to capture with standard evaluation practices like benchmarking\. One alternative approach, as discussed in §[2](https://arxiv.org/html/2605.06901#S2), can be to run controlled experiments to understand the effect of model properties on usersChenget al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib40)\), Kirket al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1735)\)or to conduct qualitative inquiryMathuret al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1722)\)\. However, this process can be time\-intensive and furthermore potentially exposes participants to the very harms being studied\. This concern raises the question of what alternative valid methods exist, such as those that can simulate these harms in silica\. Beyond measurement, mitigations also remain largely underexplored — both in terms of interventions at the model design and user interaction paradigms\. Researchers have identified properties that can exacerbate harms \(e\.g\., steering models towards being relationship\-seeking in model designKirket al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1735)\), sending emotionally laden messages as users try to exit a platformDe Freitaset al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1736)\)\), but translating these insights into preventative measures remains an important and open area for exploration\.

##### Expanding the definition of safety\.

A second area of exploration involves expanding*whose*definition of safety is prioritized\. Definitions of safety vary considerably across demographic groups, along factors such as ethnicity, age, and genderRastogiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1738)\), Aliet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1739)\), Movvaet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1740)\), Gabriel \([2020](https://arxiv.org/html/2605.06901#bib.bib1744)\), Aroyoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1741)\)\. These variances are then encoded into models through alignment processes, significantly changing model behaviorAliet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1739)\)\. Thus, instead of assuming a generic definition of safety, there is interest in better capturing and modeling the diversity of conceptualizations that exist\. This direction presents a natural continuation of the existing focus on pluralistic alignment within human\-centered LLM research\. How do we capture these differing definitions of safety? Some work has tackled the issue through recruiting diverse sets of annotators to rate safety perceptionsRastogiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1738)\), Aroyoet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1741)\)\. Others advocate for engaging with communities in a more participatory fashion to elicit safety goals, which offer a richer understanding of how different communities understand the potential harms of these technologiesQadriet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1131)\), Bergmanet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1743)\)\.

Despite the benefits of moving away from this “view from nowhere” conception of safety, it is important to remember that communities themselves are not monolithic\. Disagreements inevitably arise about what constitutes safe or harmful content or what model behaviors are deemed desired or unacceptableGordonet al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib54),[2021](https://arxiv.org/html/2605.06901#bib.bib55)\)\. There is a legitimate concern that implementing democratic methods could inadvertently drown out the voices of minority groups\. Yet this is not grounds to dismiss democratic or participatory methods for AI safety as a lost cause\. AsZimmermannet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1745)\)outline in their work, there are viable paths forward for reconciling these tensions by drawing on practices from political philosophy\. As we move towards a more pluralistic definition of safety, this requires thinking normatively about the contexts in which we jointly maximize or balance considerations across groups of people, recognizing that collective decisions may at times conflict with individual desires or goals\.

##### Considering not only harms but also benefits\.

Finally, much of the discussion so far has focused on mitigating harmful behaviors\. However, we emphasize that avoiding harm is not the same as maximizing user benefits\. In pursuit of human\-centered LLMs, we must also prioritize building models that bring positive change for users\. This raises important questions about what beneficial model behavior looks like, and whether our current conceptions of safety align with what is truly beneficial\. First, we must challenge the assumption that “safe” models are necessarily the best for promoting benefits\. AsCaiet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1746)\)challenge, perhaps there is a need to design*antagonistic AI*systems that are “actively dismissive, disagreeable, closed\-off, critical, flippant, difficult, interrupting\.” Much like, for instance, a student may be challenged by their teacher in the learning process, when we design for benefits rather than merely minimizing harms, the desired model behavior changes\.

A second provocation concerns the scope of benefit: rather than thinking only about the one\-to\-one benefit of a model on an individual user, what about one\-to\-many benefits? We can envision designing models for collective or group\-level good — for example, models deployed to promote democratic health by finding common ground through deliberationTessleret al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1747)\), or models designed to benefit teams by serving as collaborators within group settings\. Just as there are differing definitions of what safety means, there are similarly diverse conceptions of benefit, raising parallel questions about whose definition should be prioritized and how we reconcile conflicting views of what constitutes a beneficial outcome\.

## 7Case Study: HCLLMs and the Future of Work

LLMs are not only research technologies that a select group of people examine, use, and probe\. They have become a part of the lives of everyday users, supporting tasks ranging from assisting with coding to giving relationship adviceTamkinet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib593)\), Chatterjiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib56)\), Yanget al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1771)\), Jimenezet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1770)\)\. For example, as of September 2025, OpenAI reported that their flagship chatbot, ChatGPT, saw over 700 million weekly active usersChatterjiet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib56)\)\. So, we conclude by discussing how our considerations around the multiple facets of HCLLMs apply in real\-world applications\. For example, are HCLLMs practically feasible? How do HCLLMs affect individuals and what macro effects do HCLLMs have on society?

We focus this chapter on one particular application area: the future of work\. Of course, this is not the only area where LLMs are used; these models have a wide scope of applications including healthcare, education, political science, and so onThirunavukarasuet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1701)\), Adiguzelet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1706)\), Ornsteinet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1426)\)\. Nonetheless, we focus on HCLLMs’ impact on labor and the future of work given public interest, as evidenced by the many headlines and speculation, as well as the individual and societal level impacts that models will have in this areaShaoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib577)\), Acemogluet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1554)\)\.

The adoption of LLMs in the labor market has led to significant shifts in human productivity, in\-demand skills, and even possible macroeconomic changesChenet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib237)\), Eloundouet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib236)\)\. We will now show how model developers can incorporate the human\-centered principles from previous sections to define, develop, and deploy HCLLMs within this evolving ecosystem\. To define HCLLMs here, we will first discuss*who*the stakeholders are and how to account for these differing parties in §[7\.1](https://arxiv.org/html/2605.06901#S7.SS1)\. Then, we will cover HCLLM development, how we ought to be training and evaluating models for future of work purposes in §[7\.2](https://arxiv.org/html/2605.06901#S7.SS2)\. Finally, in §[7\.3](https://arxiv.org/html/2605.06901#S7.SS3), we conclude with considerations for responsibly deploying HCLLMs, such as the potential for widening inequalities or overreliance\. The road map is visualized in Figure[7](https://arxiv.org/html/2605.06901#S7.F7)\.

![Refer to caption](https://arxiv.org/html/2605.06901v1/figures/CaseStudy.png)Figure 7:We present a case study on HCLLMs and the future of work, covering the three key areas of defining, developing, and deploying HCLLMs\. We start by identifying relevant stakeholders \([7\.1](https://arxiv.org/html/2605.06901#S7.SS1)\), then move to examining the model capabilities needed to better suit LLMs for workplace settings \([7\.2](https://arxiv.org/html/2605.06901#S7.SS2)\), and conclude by discussing key societal considerations \([7\.3](https://arxiv.org/html/2605.06901#S7.SS3)\)\.### 7\.1Defining the Stakeholders

To understand how we can design LLMs for the future of work in a human\-centered fashion, we must start by understanding the*who*\. Who is being impacted by HCLLMs in the workforce? Is this the same group of people that these models are being designed for? How might different groups of stakeholders — both direct and indirect — be impacted differently? There are many potential stakeholders, includingworkerswho directly interact with LLMs;employerswho may be in charge of procuring the technology for their organization;shareholderswho are interested in productivity or financial gains; andcustomerswho may see the final artifact or output created from workers using LLMs\.

A natural group to start with are theworkerswho interface and utilize these technologies as part of their work\. Here, a recurring theme is a fundamental mismatch between what people actually want from LLMs and how those technologies are currently being designed\. In part, this misalignment can be attributed to organizational constraints, which are more present in the work setting compared to personal use\. For instance,Challapallyet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1199)\)found that employees frequently resort to using personal accounts to access LLMs \(e\.g\., ChatGPT, Claude\) as they found enterprise deployments fail to meet their needs or feel too restrictive\. Yet, this workaround behavior is symptomatic of a deeper issue\. The way in which LLMs are currently being used in the workplace is misaligned with worker priorities\. For example,Shaoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib577)\)conducted a survey with 1,500 U\.S\. workers to understand what tasks they want AI agents to be used for\. Critically, they found that the tasks workers*want*to use agents for differ substantially from how tools are currently deployed and where industry funding is going\. In tandem with the human\-centered reasons for centering workers’ perspectives, this finding suggests that current AI development trajectories risk prioritizing displacement over augmentation\. Doing so risks repeating dangerous historical patterns in which automation technologies lead to displacement without commensurate productivity gains or wage increasesAcemoglu and Restrepo \([2021](https://arxiv.org/html/2605.06901#bib.bib152)\)\. To close this gap, we must ensure that the needs and desires of workers are incorporated into the design of these systems from the outset\.

Taking into account the perspectives ofemployersorshareholders, the question becomes whether introducing LLMs improves the productivity and quality of work produced\. What complicates this question is the “jagged” nature by which LLMs are usefulDell’Acquaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib87)\)\. For some tasks, LLMs are particularly performant and can automate existing process; some roles will see more of a synmbiotic relationship where there is augmentation rather than replacement; and in others, LLMs are in fact not capable at performing requisite tasks at allMazeikaet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib73)\)\. There is a parallel consideration around literacy — whether workers are well\-equipped to use the technology\. Harkening back to the “gulf of envisioning” discussed in Chapter[2](https://arxiv.org/html/2605.06901#S2), it is possible that LLMs may actually be useful for workers, but workers may not know how to best specify their intent or be unaware that such capabilities exist\. This places a responsibility on employers, organizations, and policymakers to invest in AI literacy programs that equip workers with the conceptual and practical knowledge needed to participate in an AI\-augmented workplace, rather than simply assuming adoption will follow deploymentMaet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib1200)\), Challapallyet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1199)\)\.

Finally, another stakeholder we will highlight arecustomersthat serve as another end\-user in this setting\. For instance, as LLM\-based systems increasingly replace human touchpoints in domains such as customer service or healthcare, the impact on customers warrants equal consideration\. The evidence here is mixed: while interacting with LLM systems offers speed and accuracy compared to human support, these interactions can also raise frustrations around the lack of empathy or potential for miscommunicationHuanget al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib72)\), Liet al\.\([2025e](https://arxiv.org/html/2605.06901#bib.bib77)\)\. Furthermore, recent survey data shows that customers in the U\.S\. still prefer talking with a human representative over an LLM systemGutierrez \([2026](https://arxiv.org/html/2605.06901#bib.bib71)\)\. This concern is heightened in contexts that may be more high\-stakes or emotionally\-laden\. Nonetheless, we also want to highlight that treating this question of AI usages as a binary choice between human\-only and AI\-only presents an inchoate view of the issue as there is meaningful middle ground that leverages the strengths of both\. For instance, existing work has explored how we can use LLMs to upskill professionals to offer better human support, preserving this human “touch” in interaction while enhancing the quality of accessibility\.

### 7\.2Developing HCLLMs for the Future of Work

Next, we discuss how we can apply a human\-centered lens to training and evaluating LLMs to enable a future of work that jointly benefits the different stakeholders we have discussed\. To start, much of the current discourse around LLM systems emphasizes autonomous execution, or systems that complete complex, multi\-step tasks with minimal human involvementShenet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib60)\)\. Yet, focusing primarily on this form of autonomous interaction, overlooks the potential gains for systems in which humans and LLMs collaborate\. For example, researchers such asWanget al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib238)\)have raised exactly this concern in the context of AI coding agents — a domain where models are quite performant and has seen significant adoption especially amongst professionals in this fieldAppelet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1201)\)\. They argue that the field has moved too quickly toward full automation and that human involvement deserves to be treated as a first\-class design consideration rather than a transitional inconvenience\. Taking this critique seriously requires asking what it would actually take, technically, to build human\-centered LLMs for the workforce\.

##### Designing Models as Collaborators\.

One prerequisite is that models need to know*how*to collaborate\. However, collaboration is not simply a matter of adding a confirmation step before a model takes action\. It requires a more nuanced understanding of when to act autonomously, when to defer, and how to communicate uncertainty or request input in ways that feel natural rather than disruptive\. AsShenet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib60)\)show, building more capable models does not necessarily equate to better collaboration, thus necessitating dedicated efforts\. In fact, current training paradigms, such as preference tuning over single conversational turns, can even hinder collaboration abilitiesWuet al\.\([2025a](https://arxiv.org/html/2605.06901#bib.bib67)\)\. Furthermore, successful collaboration does not mean indiscriminately putting a human in the loop\. While ostensibly beneficial, this interaction paradigm can lead to unnecessary verification and inefficiencies that ultimately slow down the collaboration process\. The best form of collaboration will necessarily vary, depending on the task at hand as well as the skillset of the human and LLM involved\. For example, sometimes the human may need to audit the LLM’s work; other times the right form of collaboration may need to be a back\-and\-forth conversation; and in others end\-to\-end execution by the model may suffice\.

To start, we must return to the core question: what are the ingredients for a successful collaboration? In human organizations, a common collaboration paradigm is*delegation*\. At the moment, agents are capable of decomposing complex tasks into more manageable units of action\. However, the next step here is being able to assign these actions\. Thus, we need to know what tasks to hand\-off to these models and which may be better suited for humans to complete or even to delegate across multiple different modelsWanget al\.\([2025e](https://arxiv.org/html/2605.06901#bib.bib63)\)\. This delegation capability must furthermore be adaptive across users, tasks, context, and model capabilitiesTomaševet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib57)\), Guggenbergeret al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib58)\), Fügeneret al\.\([2022](https://arxiv.org/html/2605.06901#bib.bib59)\)\. Beyond being able to delegate, there are many other proprties that help make a good collaborator which are lacking in existing models, such as proactivityLuet al\.\([2024a](https://arxiv.org/html/2605.06901#bib.bib1198)\), transparencyLiao and Vaughan \([2023a](https://arxiv.org/html/2605.06901#bib.bib789)\), and social norm adherenceShaikhet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1647)\), Ziemset al\.\([2023a](https://arxiv.org/html/2605.06901#bib.bib1497)\), Forbeset al\.\([2020](https://arxiv.org/html/2605.06901#bib.bib74)\)— all presenting fruitful areas for future inquiry\.

##### Improving our Understanding of Workers\.

A second technical requirement is a richer understanding of the users these systems are meant to serve\. One push in this direction is developing better user simulators which can capture the behavioral patterns and habits of individualsLeiet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1197)\), Paglieriet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1196)\)\. As our ability to simulate users improves, a question we ought to return to is*which*users are being simulated\. At the moment, many efforts are concentrated on generic user behavior or focused more so on knowledge workers \(e\.g\., developers\)Naouset al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib46)\), Wanget al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib238)\)\. This provides a deep but narrow viewpoint of the workforce\. The population of workers whose jobs will intersect with LLMs is far broader, spanning domains like healthcare, retail, education, logistics, and skilled tradesHandaet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1255)\), Tomlinsonet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1203)\)\. Developing better user simulators for these contexts requires going beyond theoretical modeling to gain a deeper understanding of how people are actually using these systems in the real world\. This is a genuine challenge, because the most valuable behavioral data tends to be held by the companies deploying models and is rarely made available to the broader research community\. Finding ways to bridge this gap — whether through privacy\-preserving data sharing, partnerships, or the careful design of in\-the\-wild studies — is a prerequisite for building systems that serve a broader swath of workers\.

##### Evaluating for Ecologically Valid Tasks\.

Finally, across chapters we have emphasized the importance of data in the development and progress towards human\-centered LLMs\. To see progress in this area, we must also have benchmarks that actually reflect the complexity of real work\. Many of the standard benchmarks used to evaluate LLM performance consist of synthetic or highly constrained tasks that bear little resemblance to the messy, context\-dependent, and often ambiguous nature of professional work\. More general\-purpose benchmarks, such as GDPvalPatwardhanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib62)\), represent a meaningful step toward measuring model performance on more ecologically valid tasks, and domain\-specific benchmarks provides an even more precise look as to how these models perform in settings such as financeFanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib64)\), lawGuhaet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib526)\), Liet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1206)\), and medicineAroraet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib65)\)\. In tandem, we must also consider what these benchmarks are failing to capture but that we may also want to evaluate\. For instance, how can we better account for how workers are actually interacting with the systems or how the performance holds up in the messy, open\-ended conditions of real work?

### 7\.3Responsibly Deploying HCLLMs in the Workforce

Perhaps one of the top\-of\-mind questions when it comes to the future of work are the long\-term societal impacts that these technologies will have\. How will employment be affected by the increasing popularity of LLMs? What skills will be in\-demand and which will be less important? Deploying HCLLMs responsibly means taking into account these long\-term externalities now, and mitigating against potential harms before they occur\.

##### The Paradox of Productivity and Work Intensification\.

A common perception is that integrating LLMs into workflows will inherently reduce the volume of human effort\. Initial stand\-alone studies looking at whether using LLMs improves how quickly workers complete tasks show positive evidence: people complete tasks more quickly with AI assistanceCuiet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib80)\), Penget al\.\([2023b](https://arxiv.org/html/2605.06901#bib.bib86)\), Shen and Tamkin \([2026](https://arxiv.org/html/2605.06901#bib.bib81)\), Karnyet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib83)\)\. However, when it comes to the workforce, people are not completing singular tasks in a vacuum\. They must negotiate tasks with many demands, coordinate with coworkers, navigate company politics, and so on\. Thus, when we look at the impact on productivity in a more holistic setting, the impact of LLMs is less clear\. Counter to the earlier studies, recent work has suggested that rather than freeing up workers, LLMs may actually amplify the intensity of laborRanganathan and Ye \([2026](https://arxiv.org/html/2605.06901#bib.bib70)\), Harveyet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1202)\)\. For example, a recent study conducted by the Harvard Business Review observed 200 employees at a U\.S\.\-based technology company, finding that LLM usage promotes the phenomenon of “work slippage” in which AI leads to task expansion and increased cognitive load rather than a net reduction in hoursRanganathan and Ye \([2026](https://arxiv.org/html/2605.06901#bib.bib70)\)\. This phenomenon is not solely confined to the technology industry where LLM usage is more prevalentHarveyet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1202)\), Johnsonet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib78)\), Acemoglu and Restrepo \([2020](https://arxiv.org/html/2605.06901#bib.bib76)\)\. For example, despite the purported benefits that LLMs have for educators, using models often lead to more teacher time devoted towards validating generated outputs beyond current time spentHarveyet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1202)\)\. Thus, instead of saving time, these tools may in fact create new tasks for workers or potentially introduce friction into existing tasks\.

To address these risks, we provide two approaches\. First, we argue that the way “productivity” is measured in the workplace in light of advancements with LLMs must be updated\. Traditionally, productivity has centered on output per unit time, capturing how efficient people workCompany \([2025](https://arxiv.org/html/2605.06901#bib.bib1190)\)\. However, now that LLMs can quickly produce outputs, efficiency\-based metrics may be defunct\. Instead, echoing practices in existing workShaoet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1124)\), Wanget al\.\([2025e](https://arxiv.org/html/2605.06901#bib.bib63)\), we call for broader evaluations that consider not only whether tasks are completed but also the*quality*of these outcomes\. More broadly, updating productivity metrics to match the current state of the world provides a more precise picture of how LLMs are impacting work\. Our second approach is to find methods that center workers’ voices\. Survey efforts such asShaoet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib577)\)provide insight into what workers want\. Complementing this approach, we argue for additional studies that can capture the “thick” descriptions of what users are experiencing, such as through interviews or ethnographic approachesGeertz \([2008](https://arxiv.org/html/2605.06901#bib.bib1129)\), Anugrahaet al\.\([2026](https://arxiv.org/html/2605.06901#bib.bib1188)\)\. Such methods can shed light on what is not captured through numbers alone, such as hidden forms of labor, additional cognitive load, or coordination costs\. Together, these directions point toward a more holistic framework for evaluating AI in the workplace that accounts for both outcomes and worker experience\.

##### The Risk of Cognitive Deskilling\.

A second concern is the potential for deskilling, where reliance on LLMs erodes the foundational expertise of human workers\. When models handle the “first draft” \(or sometimes the end\-to\-end execution\) of complex tasks, workers may lose the opportunity to engage in learning processes necessary to develop deep domain mastery\. Perhaps in an isolated setting workers may be completing the task faster, but they are also less likely to learn transferable skills or knowledge that can help them in future tasksShen and Tamkin \([2026](https://arxiv.org/html/2605.06901#bib.bib81)\)\. It is important to note that these effects are heterogeneous across workers\. How someone engages with these technologies matters: automating tasks wholesale is likely to accelerate deskilling, whereas more collaborative, iterative interactions where the worker critically evaluates, corrects, and builds on model outputs may partially preserve or even scaffold skill developmentShen and Tamkin \([2026](https://arxiv.org/html/2605.06901#bib.bib81)\)\. Another moderator is expertise level\. For instance, novices may rely on these tools more heavily and more uncritically than experts, which is especially concerning given that AI assistance can impart an illusion of comprehension without providing sufficient expertise in an areaMesseri and Crockett \([2024](https://arxiv.org/html/2605.06901#bib.bib85)\), Shen and Tamkin \([2026](https://arxiv.org/html/2605.06901#bib.bib81)\), Macnamaraet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib84)\)\.

These concerns highlight the importance of carefully designing how humans collaborate with LLMs, as discussed in §[7\.2](https://arxiv.org/html/2605.06901#S7.SS2)\. For example, the risk of deskilling can be included as a factor when considering how to delegate tasks\. In this vein, rather than framing LLMs as tools that automate tasks end\-to\-end, systems should be designed to empower users, supporting human judgment, reflection, and skill development throughout the workflow\. To achieve this goal, this requires intentional design interventions that encourage workers to remain actively engaged with the task, such as mechanisms that promote critical evaluation of model outputs or structures that preserve opportunities for learning and expertise developmentMaet al\.\([2025b](https://arxiv.org/html/2605.06901#bib.bib1555)\), Reichertset al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1553)\)\.

##### Macroeconomic Divides and Digital Accessibility\.

Finally, the integration of LLMs also threatens to exacerbate existing economic disparities\. Traditional automation has historically widened the wage gap between high\-skilled and low\-skilled workersAcemoglu and Restrepo \([2021](https://arxiv.org/html/2605.06901#bib.bib152)\), but LLMs introduce a specific “productivity divide” rooted in accessibility\. Research into usage patterns suggests that ChatGPT adoption is positively correlated with higher education, high socioeconomic status, and residency in urbanized zip codesDaepp and Counts \([2024](https://arxiv.org/html/2605.06901#bib.bib357)\)\. If access to state\-of\-the\-art models remains concentrated within privileged demographics, the resulting disparity in AI\-augmented productivity could further exacerbate systemic inequality\.

Addressing this divide requires interventions at multiple levels\. For one, as open\-source models grow more capable, this can help reduce barriers to entry and broaden who is able to benefit from AI assistance\. However, access alone is insufficient\. As discussed in §[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2), model performance and usability can vary across social groups, meaning that some populations may benefit less from interacting with LLMs even when they are availableHofmannet al\.\([2024b](https://arxiv.org/html/2605.06901#bib.bib1547)\), Hassanet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1191)\), Durmuset al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib420)\)\. Continued efforts to identify and mitigate such disparities in model behavior remain critical\. Finally, improving AI literacy will be essential as a new generation of workers grows up with these technologies at their fingertips\. Educational initiatives that teach users how to critically evaluate, effectively collaborate with, and responsibly leverage AI systems can help ensure that the productivity gains from LLMs are distributed more equitablySolystet al\.\([2025](https://arxiv.org/html/2605.06901#bib.bib1550)\), Morales\-Navarroet al\.\([2024](https://arxiv.org/html/2605.06901#bib.bib1549),[2025](https://arxiv.org/html/2605.06901#bib.bib1548)\), Okolo \([2024](https://arxiv.org/html/2605.06901#bib.bib1551)\), Cardonet al\.\([2023](https://arxiv.org/html/2605.06901#bib.bib1552)\)\.

## 8Conclusion

The development of large language models has reached a critical juncture\. It is no longer sufficient to ask only what these models are capable of doing\. We must also grapple with questions such as who is — and is not — involved in and accounted for in the creation of LLMs; what the impacts of these models are at both the individual and societal level; and which values and principles these technologies uphold and promote\. This survey examines how such human\-centered principles are inherently intertwined with the design, training, and deployment of LLMs\.

Ultimately, the trajectory of LLM development must be guided by more than technical benchmarks and capability milestones\. The questions of inclusion, impact, and values explored in this survey are not peripheral concerns to be addressed after the fact; they are foundational to what these systems become and who they serve\. By centering human\-centered principles at every stage of the LLM lifecycle, from design and data curation to training and deployment, researchers and practitioners can work toward models that are not only more capable, but also more equitable, accountable, and aligned with the diverse needs of the people they affect\. The path forward demands a broader coalition of voices, a more expansive notion of responsibility, and a sustained commitment to ensuring that progress in AI is measured not only by what these models can do, but by the kind of world their development helps to build\.

## Acknowledgments

The idea for this manuscript originated in the Fall 2024 offering of Stanford’s CS 329X course on HCLLMs, taught by the instructor Diyi Yang and assistants Rose E\. Wang and Caleb Ziems\. Diyi Yang developed the initial structure and outline of the paper\. The enrolled students collaborated on a first draft of the manuscript as part of their coursework, with each student pair assigned responsibility for drafting a subsection of the survey according to the course assignment structure\. The teaching assistants subsequently reviewed, graded, and provided feedback on these drafts\. In Winter 2025, a subset of students continued to revise and expand the manuscript under the primary direction of Rose E\. Wang and the secondary direction of Caleb Ziems\.

Caleb Ziems and Dora Zhao largely re\-wrote and restructured the manuscript, with significant conceptual revisions and new chapters, to produce the current version\. This revision phase was supported by additional contributions from Sunny Yu and Advit Deepak\. All authors reviewed and approved the final manuscript\.

The core authors were jointly responsible for deciding on and writing the paper in its final form\. The core are as follows:

- •Diyi Yangdesigned the overall structure, chapter outline, and course material that initialized this survey\. She supervised the project and provided guidance on the direction and scope of the survey at every stage\.
- •Caleb Ziemsrestructured the paper from its initial conception, wrote §[1](https://arxiv.org/html/2605.06901#S1)and §[3](https://arxiv.org/html/2605.06901#S3), and co\-wrote §[4](https://arxiv.org/html/2605.06901#S4), §[5](https://arxiv.org/html/2605.06901#S5), and §[6](https://arxiv.org/html/2605.06901#S6)\. He also contributed to structuring and reviewing all student drafts, and co\-lead the second round of student revisions\.
- •Dora Zhaorestructured the paper from its initial conception, wrote §[2](https://arxiv.org/html/2605.06901#S2)and §[7](https://arxiv.org/html/2605.06901#S7), and co\-wrote §[4](https://arxiv.org/html/2605.06901#S4), §[5](https://arxiv.org/html/2605.06901#S5), and §[6](https://arxiv.org/html/2605.06901#S6)\. Dora also co\-designed the figures\.

Additionally, the leadership of this work included:

- •Rose E\. Wangcontributed to structuring and reviewing all student drafts\. She led the second round of student revisions\.
- •Matthew Jörkecontributed §[2\.4](https://arxiv.org/html/2605.06901#S2.SS4)and provided suggestions on §[2](https://arxiv.org/html/2605.06901#S2)more broadly\.
- •Ahmad Rushdicontributed guidance and final edits\.

Student contributions are as follows:

- •Anshika Agarwalco\-wrote §[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2)and edited §[5](https://arxiv.org/html/2605.06901#S5)\.
- •Harshvardhan Agarwalhelped with the course paper and edited §[5](https://arxiv.org/html/2605.06901#S5)\.
- •Gabriela Aranguiz\-Diasco\-wrote §[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2)\.
- •Aditri Bhagirathhelped with the course paper and second\-round student edits\.
- •Justine Breuchco\-wrote §[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)\.
- •Huanxing Chenhelped with the course paper\.
- •Ruishi Chenhelped with the course paper\.
- •Sarah Chenco\-wrote the first draft of §[6\.1](https://arxiv.org/html/2605.06901#S6.SS1)\.
- •Advit Deepakco\-wrote §[6](https://arxiv.org/html/2605.06901#S6)\.
- •Haocheng Fanco\-wrote §[5\.2](https://arxiv.org/html/2605.06901#S5.SS2)
- •William Fanghelped with the course paper and second\-round student edits\.
- •Cat Gonzales Fergesenhelped with the course paper\.
- •Daniel Freesco\-wrote §[4\.1](https://arxiv.org/html/2605.06901#S4.SS1)\.
- •Tian Gaoco\-wrote §[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\.
- •Ziqing Huangco\-wrote §[5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3)\.
- •Vishal Jainco\-wrote §[6\.3](https://arxiv.org/html/2605.06901#S6.SS3)
- •Yucheng Jiangco\-wrote §[5\.2](https://arxiv.org/html/2605.06901#S5.SS2)
- •Kirill Kalininhelped with second\-round student edits\.
- •Su Doga Karacaco\-wrote §[5\.2](https://arxiv.org/html/2605.06901#S5.SS2)and edited §[5](https://arxiv.org/html/2605.06901#S5)\.
- •Arpandeep Khatuahelped with the course paper and second\-round student edits\.
- •Teland Lahelped with the course paper\.
- •Isabelle Leventhelped with the course paper\.
- •Miranda Lihelped with the course paper and second\-round student edits\.
- •Xinling Lico\-wrote §[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\.
- •Yongce Lico\-wrote §[3\.4](https://arxiv.org/html/2605.06901#S3.SS4)\.
- •Angela Liuhelped with the course paper and second\-round student edits\.
- •Minsik Ohco\-wrote §[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2), and edited §[5](https://arxiv.org/html/2605.06901#S5)\.
- •Nathan J\. Paekhelped with the course paper and second\-round student edits\.
- •Anthony Qinhelped with the course paper\.
- •Emily Redmondco\-wrote §[4\.3](https://arxiv.org/html/2605.06901#S4.SS3)\.
- •Michael J\. Ryanwrote §[4\.5](https://arxiv.org/html/2605.06901#S4.SS5)and co\-wrote the remainder of §[4](https://arxiv.org/html/2605.06901#S4)\.
- •Aadesh Salechaco\-wrote §[5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2)\.
- •Xiaoxian Shenco\-wrote §[3\.3](https://arxiv.org/html/2605.06901#S3.SS3)\.
- •Pranava Singhalhelped with the course paper\.
- •Shashanka Subrahmanyaco\-wrote §[5\.3](https://arxiv.org/html/2605.06901#S5.SS3)
- •Mei Tanco\-wrote §[5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1)\.
- •Irawadee Thawornbuthelped with the course paper\.
- •Michelle Vinocourhelped with the course paper\.
- •Xiaoyue Wangco\-wrote §[3\.4](https://arxiv.org/html/2605.06901#S3.SS4)
- •Zheng Wangco\-wrote §[6\.1](https://arxiv.org/html/2605.06901#S6.SS1)\.
- •Henry Jin Wenghelped with the course paper\.
- •Pawan Wirawarnhelped with the course paper\.
- •Shirley Wuhelped with the course paper\.
- •Sophie Wuco\-wrote §[4\.2](https://arxiv.org/html/2605.06901#S4.SS2)\.
- •Yichen Xieco\-wrote §[4\.2](https://arxiv.org/html/2605.06901#S4.SS2)\.
- •Patrick Yehelped with the course paper and second\-round student edits\.
- •Sunny Yuco\-wrote §[5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2)and §[6](https://arxiv.org/html/2605.06901#S6)\.
- •Sean Zhanghelped with the course paper and second\-round student edits\.
- •Yutong Zhangco\-designed Figures[1](https://arxiv.org/html/2605.06901#S1.F1),[2](https://arxiv.org/html/2605.06901#S2.F2),[5](https://arxiv.org/html/2605.06901#S5.F5),[6](https://arxiv.org/html/2605.06901#S6.F6)\.
- •Cathy Zhouco\-wrote §[4\.1](https://arxiv.org/html/2605.06901#S4.SS1)\.
- •Yiling Zhaoco\-wrote §[4\.6](https://arxiv.org/html/2605.06901#S4.SS6)\.

## References

- M\. Abbasian, E\. Khatibi, I\. Azimi, D\. Oniani, Z\. Shakeri Hossein Abad, A\. Thieme, R\. Sriram, Z\. Yang, Y\. Wang, B\. Lin,et al\.\(2024\)Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative ai\.NPJ Digital Medicine7\(1\),pp\. 82\.Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p3.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p3.1)\.
- S\. Abdurahman, M\. Atari, F\. Karimi\-Malekabadi, M\. J\. Xue, J\. Trager, P\. S\. Park, P\. Golazizian, A\. Omrani, and M\. Dehghani \(2024\)Perils and opportunities in using large language models in psychological research\.PNAS Nexus3\(7\),pp\. pgae245\.External Links:[Document](https://dx.doi.org/10.1093/pnasnexus/pgae245),ISSN 2752\-6542,[Link](https://doi.org/10.1093/pnasnexus/pgae245)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- Building pro\-worker artificial intelligence\.Technical reportNational Bureau of Economic Research\.Cited by:[§7](https://arxiv.org/html/2605.06901#S7.p2.1)\.
- D\. Acemoglu and P\. Restrepo \(2020\)The wrong kind of ai? artificial intelligence and the future of labour demand\.Cambridge Journal of Regions, Economy and Society13\(1\),pp\. 25–35\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- D\. Acemoglu and P\. Restrepo \(2021\)Tasks, automation, and the rise in US wage inequality\.Econometrica89\(5\),pp\. 1973–2019\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p2.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p1.1)\.
- J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.ArXiv preprintabs/2303\.08774\.External Links:[Link](https://arxiv.org/abs/2303.08774)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- A\. Adhvaryu, N\. Kala, and A\. Nyshadham \(2018\)The skills to pay the bills: returns to on\-the\-job soft skills training\.Working PaperTechnical Report24313,Working Paper Series,National Bureau of Economic Research\.External Links:[Document](https://dx.doi.org/10.3386/w24313),[Link](http://www.nber.org/papers/w24313)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p9.1)\.
- T\. Adiguzel, M\. H\. Kaya, and F\. K\. Cansu \(2023\)Revolutionizing education with ai: exploring the transformative potential of chatgpt\.\.Contemporary Educational Technology15\(3\)\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1),[§7](https://arxiv.org/html/2605.06901#S7.p2.1)\.
- D\. Agarwal, M\. Naaman, and A\. Vashistha \(2024\)AI suggestions homogenize writing toward western styles and diminish cultural nuances\.Vol\.abs/2409\.11360\.External Links:[Link](https://arxiv.org/abs/2409.11360)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p4.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p6.1)\.
- M\. Agrawala \(2023\)Unpredictable black boxes are terrible interfaces\.ACM TechTalks\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p2.1)\.
- O\. Ahia, S\. Kumar, H\. Gonen, J\. Kasai, D\. R\. Mortensen, N\. A\. Smith, and Y\. Tsvetkov \(2023\)Do all languages cost the same? tokenization in the era of commercial language models\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,pp\. 9904–9923\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1)\.
- L\. Ahmad, S\. Agarwal, M\. Lampe, and P\. Mishkin \(2025\)OpenAI’s approach to external red teaming for ai models and systems\.arXiv preprint arXiv:2503\.16431\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- A\. Ahmadian, C\. Cremer, M\. Gallé, M\. Fadaee, J\. Kreutzer, O\. Pietquin, A\. Üstün, and S\. Hooker \(2024\)Back to basics: revisiting reinforce style optimization for learning from human feedback in llms\.Vol\.abs/2402\.14740\.External Links:[Link](https://arxiv.org/abs/2402.14740)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p2.1)\.
- AI@Meta \(2024\)Llama 3 model card\.External Links:[Link](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- A\. F\. Aji, G\. I\. Winata, F\. Koto, S\. Cahyawijaya, A\. Romadhony, R\. Mahendra, K\. Kurniawan, D\. Moeljadi, R\. E\. Prasojo, T\. Baldwin,et al\.\(2022\)One country, 700\+ languages: nlp challenges for underrepresented languages and dialects in indonesia\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 7226–7249\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- A\. Albalak, Y\. Elazar, S\. M\. Xie, S\. Longpre, N\. Lambert, X\. Wang, N\. Muennighoff,et al\.\(2024\)A survey on data selection for language models\.ArXiv preprintabs/2402\.16827\.External Links:[Link](https://arxiv.org/abs/2402.16827)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- D\. Ali, D\. Zhao, A\. Koenecke, and O\. Papakyriakopoulos \(2025\)Operationalizing pluralistic values in large language model alignment reveals trade\-offs in safety, inclusivity, and model behavior\.arXiv preprint arXiv:2511\.14476\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- B\. AlKhamissi, M\. ElNokrashy, M\. Alkhamissi, and M\. Diab \(2024\)Investigating cultural alignment of large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 12404–12422\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.671),[Link](https://aclanthology.org/2024.acl-long.671)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p1.1)\.
- D\. R\. Almeida \(2024\)Synthetic data generation \(part 1\)\.Note:Open in GitHubAccessed: 2025\-02\-08External Links:[Link](https://cookbook.openai.com/examples/sdg1)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- A\. Alvarado Garcia, J\. F\. Maestre, M\. Barcham, M\. Iriarte, M\. Wong\-Villacres, O\. A\. Lemus, P\. Dudani, P\. Reynolds\-Cuéllar, R\. Wang, and T\. Cerratto Pargman \(2021\)Decolonial pathways: our manifesto for a decolonizing agenda in hci research and design\.InExtended abstracts of the 2021 CHI conference on human factors in computing systems,pp\. 1–9\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- T\. M\. Amabile \(1983\)The case for a social psychology of creativity\.The Journal of Creative Behavior46\(1\),pp\. 3–15\.External Links:[Document](https://dx.doi.org/10.1007/978-1-4612-5533-8%5F1)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p3.1)\.
- E\. Ameisen, J\. Lindsey, A\. Pearce, W\. Gurnee, N\. L\. Turner, B\. Chen, C\. Citro, D\. Abrahams, S\. Carter, B\. Hosmer,et al\.\(2025\)Circuit tracing: revealing computational graphs in language models\.Transformer Circuits Thread6\.External Links:[Link](https://transformer-circuits.pub/2025/attribution-graphs/methods.html)Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- S\. Amershi, D\. Weld, M\. Vorvoreanu, A\. Fourney, B\. Nushi, P\. Collisson, J\. Suh, S\. Iqbal, P\. N\. Bennett, K\. Inkpen,et al\.\(2019\)Guidelines for human\-ai interaction\.InProceedings of the 2019 chi conference on human factors in computing systems,pp\. 1–13\.Cited by:[§2\.2](https://arxiv.org/html/2605.06901#S2.SS2.p2.1)\.
- J\. Amidei, P\. Piwek, and A\. Willis \(2018\)Rethinking the agreement in human evaluation tasks\.InProceedings of the 27th International Conference on Computational Linguistics,E\. M\. Bender, L\. Derczynski, and P\. Isabelle \(Eds\.\),Santa Fe, New Mexico, USA,pp\. 3318–3329\.External Links:[Link](https://aclanthology.org/C18-1281/)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p4.1)\.
- J\. Amidei, P\. Piwek, and A\. Willis \(2019\)Agreement is overrated: a plea for correlation to assess human evaluation reliability\.InProceedings of the 12th International Conference on Natural Language Generation,K\. van Deemter, C\. Lin, and H\. Takamura \(Eds\.\),Tokyo, Japan,pp\. 344–354\.External Links:[Link](https://aclanthology.org/W19-8642/),[Document](https://dx.doi.org/10.18653/v1/W19-8642)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p4.1)\.
- C\. Andukuri, J\. Fränken, T\. Gerstenberg, and N\. D\. Goodman \(2024\)Star\-gate: teaching language models to ask clarifying questions\.ArXiv preprintabs/2403\.19154\.External Links:[Link](https://arxiv.org/abs/2403.19154)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1)\.
- Anthropic \(2025\)External Links:[Link](https://support.claude.com/en/articles/12119250-model-safety-bug-bounty-program)Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- D\. Anugraha, V\. Padmakumar, and D\. Yang \(2026\)SparkMe: adaptive semi\-structured interviewing for qualitative insight discovery\.arXiv preprint arXiv:2602\.21136\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1)\.
- R\. Appel, P\. McCrory, A\. Tamkin, M\. McCain, T\. Neylon, and M\. Stern \(2025\)Anthropic economic index report: uneven geographic and enterprise ai adoption\.arXiv preprint arXiv:2511\.15080\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.p1.1)\.
- I\. Arawjo, C\. Swoopes, P\. Vaithilingam, M\. Wattenberg, and E\. L\. Glassman \(2024\)Chainforge: a visual toolkit for prompt engineering and llm hypothesis testing\.InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems,pp\. 1–18\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1)\.
- A\. Arditi, O\. B\. Obeso, A\. Syed, D\. Paleka, N\. Panickssery, W\. Gurnee, and N\. Nanda \(2024\)Refusal in language models is mediated by a single direction\.InICML 2024 Workshop on Mechanistic Interpretability,External Links:[Link](https://openreview.net/forum?id=EqF16oDVFf)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- L\. Armstrong, A\. Liu, S\. MacNeil, and D\. Metaxa \(2024\)The silicon ceiling: auditing gpt’s race and gender biases in hiring\.InProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization,pp\. 1–18\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p3.1)\.
- R\. K\. Arora, J\. Wei, R\. S\. Hicks, P\. Bowman, J\. Quiñonero\-Candela, F\. Tsimpourlas, M\. Sharman, M\. Shah, A\. Vallone, A\. Beutel,et al\.\(2025\)Healthbench: evaluating large language models towards improved human health\.arXiv preprint arXiv:2505\.08775\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px3.p1.1)\.
- L\. Aroyo, A\. Taylor, M\. Diaz, C\. Homan, A\. Parrish, G\. Serapio\-García, V\. Prabhakaran, and D\. Wang \(2023\)Dices dataset: diversity in conversational ai evaluation for safety\.Advances in Neural Information Processing Systems36,pp\. 53330–53342\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- E\. Artemova, V\. Blaschke, and B\. Plank \(2024\)Exploring the robustness of task\-oriented dialogue systems for colloquial german varieties\.InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 445–468\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1)\.
- M\. Artetxe, V\. Goswami, S\. Bhosale, A\. Fan, and L\. Zettlemoyer \(2023\)Revisiting machine translation for cross\-lingual classification\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,pp\. 6489–6499\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- A\. Askell, Y\. Bai, A\. Chen, D\. Drain, D\. Ganguli, T\. Henighan, A\. Jones, N\. Joseph, B\. Mann, N\. DasSarma, N\. Elhage, Z\. Hatfield\-Dodds, D\. Hernandez, J\. Kernion, K\. Ndousse, C\. Olsson, D\. Amodei, T\. Brown, J\. Clark, S\. McCandlish, C\. Olah, and J\. Kaplan \(2021\)A general language assistant as a laboratory for alignment\.Vol\.abs/2112\.00861\.External Links:[Link](https://arxiv.org/abs/2112.00861)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p1.1),[§4\.2](https://arxiv.org/html/2605.06901#S4.SS2.p1.1),[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p1.1)\.
- S\. Asthana, J\. Im, Z\. Chen, and N\. Banovic \(2024\)" I know even if you don’t tell me": understanding users’ privacy preferences regarding ai\-based inferences of sensitive information for personalization\.InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems,pp\. 1–21\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- J\. Auernhammer \(2020\)Human\-centered ai: the role of human\-centered design research in the development of ai\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1),[§1](https://arxiv.org/html/2605.06901#S1.p4.1)\.
- K\. L\. Aw, S\. Montariol, B\. AlKhamissi, M\. Schrimpf, and A\. Bosselut \(2023\)Instruction\-tuning aligns llms to the human brain\.Vol\.abs/2312\.00575\.External Links:[Link](https://arxiv.org/abs/2312.00575)Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px3.p1.1)\.
- E\. Awad, S\. Dsouza, R\. Kim, J\. Schulz, J\. Henrich, A\. Shariff, J\. Bonnefon, and I\. Rahwan \(2018\)The moral machine experiment\.Nature563\(7729\),pp\. 59–64\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1)\.
- A\. Awadalla, L\. Xue, O\. Lo, M\. Shu, H\. Lee, E\. Guha, M\. Jordan, S\. Shen, M\. Awadalla, S\. Savarese, C\. Xiong, R\. Xu, Y\. Choi, and L\. Schmidt \(2024\)MINT\-1t: scaling open\-source multimodal data by 10x: a multimodal dataset with one trillion tokens\.Note:arXiv preprint arXiv:2406\.11271v5Preprint, under reviewCited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px3.p2.1)\.
- S\. Baack \(2024\)A critical analysis of the largest source for generative ai training data: common crawl\.InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency,pp\. 2199–2208\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1)\.
- R\. Baeza\-Yates \(2018\)Bias on the web\.Communications of the ACM61\(6\),pp\. 54–61\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1)\.
- Y\. Bai, A\. Jones, K\. Ndousse, A\. Askell, A\. Chen, N\. DasSarma, D\. Drain, S\. Fort, D\. Ganguli, T\. Henighan, N\. Joseph, S\. Kadavath, J\. Kernion, T\. Conerly, S\. El\-Showk, N\. Elhage, Z\. Hatfield\-Dodds, D\. Hernandez, T\. Hume, S\. Johnston, S\. Kravec, L\. Lovitt, N\. Nanda, C\. Olsson, D\. Amodei, T\. Brown, J\. Clark, S\. McCandlish, C\. Olah, B\. Mann, and J\. Kaplan \(2022a\)Training a helpful and harmless assistant with reinforcement learning from human feedback\.Vol\.abs/2204\.05862\.External Links:[Link](https://arxiv.org/abs/2204.05862)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p2.1),[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1),[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p2.1),[§4\.2](https://arxiv.org/html/2605.06901#S4.SS2.p1.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.p1.1)\.
- Y\. Bai, S\. Kadavath, S\. Kundu, A\. Askell, J\. Kernion, A\. Jones, A\. Chen, A\. Goldie, A\. Mirhoseini, C\. McKinnon,et al\.\(2022b\)Constitutional ai: harmlessness from ai feedback\.ArXiv preprintabs/2212\.08073\.External Links:[Link](https://arxiv.org/abs/2212.08073)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p1.1),[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- Y\. Bai, S\. Kadavath, S\. Kundu, A\. Askell, J\. Kernion, A\. Jones, A\. Chen, A\. Goldie, A\. Mirhoseini, C\. McKinnon, C\. Chen, C\. Olsson, C\. Olah, D\. Hernandez, D\. Drain, D\. Ganguli, D\. Li, E\. Tran\-Johnson, E\. Perez, J\. Kerr, J\. Mueller, J\. Ladish, J\. Landau, K\. Ndousse, K\. Lukoiūtė, L\. Lovitt, M\. Sellitto, N\. Elhage, N\. Schiefer, N\. Mercado, N\. Dassarma, R\. Lasenby, R\. Larson, S\. Ringer, S\. Johnston, S\. Kravec, S\. E\. Showk, S\. Fort, T\. Lanham, T\. Telleen\-Lawton, T\. Conerly, T\. Henighan, T\. Hume, S\. Bowman, Z\. Hatfield\-Dodds, B\. Mann, D\. Amodei, N\. Joseph, S\. McCandlish, T\. B\. Brown, and J\. Kaplan \(2022c\)Constitutional ai: harmlessness from ai feedback\.ArXivabs/2212\.08073\.External Links:[Link](https://api.semanticscholar.org/CorpusID:254823489)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1),[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p2.1)\.
- M\. Bak and J\. Chin \(2024\)The potential and limitations of large language models in identification of the states of motivations for facilitating health behavior change\.Journal of the American Medical Informatics Association,pp\. ocae057\.Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p3.1)\.
- N\. Balepur, V\. Padmakumar, F\. Yang, S\. Feng, R\. Rudinger, and J\. L\. Boyd\-Graber \(2025\)Whose boat does it float? improving personalization in preference tuning via inferred user personas\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria\.Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1)\.
- D\. Banerjee, P\. Singh, A\. Avadhanam, and S\. Srivastava \(2023\)Benchmarking llm powered chatbots: methods and metrics\.Vol\.abs/2308\.04624\.External Links:[Link](https://arxiv.org/abs/2308.04624)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p4.1)\.
- G\. Bansal, T\. Wu, J\. Zhou, R\. Fok, B\. Nushi, E\. Kamar, M\. T\. Ribeiro, and D\. Weld \(2021\)Does the whole exceed its parts? the effect of ai explanations on complementary team performance\.InProceedings of the 2021 CHI conference on human factors in computing systems,pp\. 1–16\.Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- P\. Barberá \(2020\)Social media, echo chambers, and political polarization\.Social media and democracy: The state of the field, prospects for reform,pp\. 34–55\.Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p2.1)\.
- S\. Bardzell \(2010\)Feminist hci: taking stock and outlining an agenda for design\.InProceedings of the SIGCHI conference on human factors in computing systems,pp\. 1301–1310\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- S\. Barocas, K\. Crawford, A\. Shapiro, and H\. Wallach \(2017\)The problem with bias: allocative versus representational harms in machine learning\.In9th Annual conference of the special interest group for computing, information and society,pp\. 1\.Cited by:[§3\.2](https://arxiv.org/html/2605.06901#S3.SS2.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p1.1)\.
- F\. Barrera\-Osorio, A\. D\. Kugler, and M\. I\. Silliman \(2020\)Hard and soft skills in vocational training: experimental evidence from colombia\.Working PaperTechnical Report27548,Working Paper Series,National Bureau of Economic Research\.External Links:[Document](https://dx.doi.org/10.3386/w27548),[Link](http://www.nber.org/papers/w27548)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p9.1)\.
- J\. J\. Bartko \(1966\)The intraclass correlation coefficient as a measure of reliability\.Psychological reports19\(1\),pp\. 3–11\.Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- T\. Beck, H\. Schuff, A\. Lauscher, and I\. Gurevych \(2024\)Sensitivity, performance, robustness: deconstructing the effect of sociodemographic prompting\.InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 2589–2615\.External Links:[Link](https://aclanthology.org/2024.eacl-long.159/)Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p1.1)\.
- J\. Becker, N\. Rush, E\. Barnes, and D\. Rein \(2025\)Measuring the impact of early\-2025 ai on experienced open\-source developer productivity\.arXiv preprint arXiv:2507\.09089\.Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p2.1)\.
- A\. Belyaeva, J\. Cosentino, F\. Hormozdiari, K\. Eswaran, S\. Shetty, G\. Corrado, A\. Carroll, C\. Y\. McLean, and N\. A\. Furlotte \(2023\)Multimodal llms for health grounded in individual\-specific data\.Vol\.abs/2307\.09018\.External Links:[Link](https://arxiv.org/abs/2307.09018)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- E\. M\. Bender and B\. Friedman \(2018\)Data statements for natural language processing: toward mitigating system bias and enabling better science\.Transactions of the Association for Computational Linguistics6,pp\. 587–604\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p1.1),[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- E\. M\. Bender, T\. Gebru, A\. McMillan\-Major, and S\. Shmitchell \(2021\)On the dangers of stochastic parrots: can language models be too big?\.InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency,New York\(en\-US\)\.External Links:[Document](https://dx.doi.org/10.1145/3442188.3445922),[Link](https://dl.acm.org/doi/10.1145/3442188.3445922)Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p4.3),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.p2.1),[§6\.3](https://arxiv.org/html/2605.06901#S6.SS3.p1.1)\.
- R\. Benjamin \(2023\)Race after technology\.InSocial Theory Re\-Wired,pp\. 405–415\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p2.1)\.
- S\. Bergman, N\. Marchal, J\. Mellor, S\. Mohamed, I\. Gabriel, and W\. Isaac \(2024\)STELA: a community\-centred approach to norm elicitation for ai alignment\.Scientific Reports14\(1\),pp\. 6616\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p1.1),[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- J\. Bergold and S\. Thomas \(2012\)Participatory research methods: a methodological approach in motion\.Historical Social Research/Historische Sozialforschung,pp\. 191–222\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p1.1)\.
- F\. Bianchi, M\. Suzgun, G\. Attanasio, P\. Rottger, D\. Jurafsky, T\. Hashimoto, and J\. Zou \(2024\)Safety\-tuned llamas: lessons from improving the safety of large language models that follow instructions\.InThe Twelfth International Conference on Learning Representations,Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p2.1)\.
- S\. Biderman, K\. Bicheno, and L\. Gao \(2022\)Datasheet for the pile\.arXiv preprint arXiv:2201\.07311\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p5.1)\.
- S\. Biderman, H\. Schoelkopf, Q\. G\. Anthony, H\. Bradley, K\. O’Brien, E\. Hallahan, M\. A\. Khan, S\. Purohit, U\. S\. Prashanth, E\. Raff,et al\.\(2023\)Pythia: a suite for analyzing large language models across training and scaling\.InInternational conference on machine learning,pp\. 2397–2430\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- P\. Biedma, X\. Yi, L\. Huang, M\. Sun, and X\. Xie \(2024\)Beyond human norms: unveiling unique values of large language models through interdisciplinary approaches\.ArXiv preprintabs/2404\.12744\.External Links:[Link](https://arxiv.org/abs/2404.12744)Cited by:[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p3.1)\.
- S\. Bird \(2024\)Must nlp be extractive?\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 14915–14929\.Cited by:[§4\.6\.2](https://arxiv.org/html/2605.06901#S4.SS6.SSS2.p2.1)\.
- A\. Birhane, P\. Kalluri, D\. Card, W\. Agnew, R\. Dotan, and M\. Bao \(2022\)The values encoded in machine learning research\.InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency,pp\. 173–184\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1),[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px2.p1.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.p1.1)\.
- A\. Birhane and V\. U\. Prabhu \(2021\)Large image datasets: a pyrrhic win for computer vision?\.In2021 IEEE Winter Conference on Applications of Computer Vision \(WACV\),pp\. 1536–1546\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1)\.
- J\. Bisbee, J\. D\. Clinton, C\. Dorff, B\. Kenkel, and J\. M\. Larson \(2024\)Synthetic replacements for human survey data? the perils of large language models\.Political Analysis32\(4\),pp\. 401–416\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- S\. Black, S\. Biderman, E\. Hallahan, Q\. Anthony, L\. Gao, L\. Golding, H\. He, C\. Leahy, K\. McDonell, J\. Phang,et al\.\(2022\)Gpt\-neox\-20b: an open\-source autoregressive language model\.InProceedings of BigScience Episode\# 5–Workshop on Challenges & Perspectives in Creating Large Language Models,pp\. 95–136\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- A\. Blandford, D\. Furniss, and S\. Makri \(2016\)Qualitative hci research: going behind the scenes\.Morgan & Claypool Publishers\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p1.1)\.
- T\. Blevins and L\. Zettlemoyer \(2022\)Language contamination helps explains the cross\-lingual capabilities of english pretrained models\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,pp\. 3563–3574\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p1.1)\.
- S\. L\. Blodgett, S\. Barocas, H\. Daumé III, and H\. Wallach \(2020\)Language \(technology\) is power: a critical survey of “bias” in NLP\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 5454–5476\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.485),[Link](https://aclanthology.org/2020.acl-main.485)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.p1.1)\.
- S\. L\. Blodgett, L\. Green, and B\. O’Connor \(2016\)Demographic dialectal variation in social media: a case study of African\-American English\.InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,J\. Su, K\. Duh, and X\. Carreras \(Eds\.\),Austin, Texas,pp\. 1119–1130\.External Links:[Document](https://dx.doi.org/10.18653/v1/D16-1120),[Link](https://aclanthology.org/D16-1120)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- S\. L\. Blodgett, G\. Lopez, A\. Olteanu, R\. Sim, and H\. Wallach \(2021\)Stereotyping Norwegian salmon: an inventory of pitfalls in fairness benchmark datasets\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),C\. Zong, F\. Xia, W\. Li, and R\. Navigli \(Eds\.\),Online,pp\. 1004–1015\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.acl-long.81),[Link](https://aclanthology.org/2021.acl-long.81)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- S\. L\. Blodgett and B\. O’Connor \(2017\)Racial disparity in natural language processing: a case study of social media african\-american english\.ArXiv preprintabs/1707\.00061\.External Links:[Link](https://arxiv.org/abs/1707.00061)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- S\. L\. Blodgett \(2021\)Sociolinguistically driven approaches for just natural language processing\.Doctoral Dissertation,University of Massachusetts Amherst\.External Links:[Document](https://dx.doi.org/10.7275/20410631),[Link](https://scholarworks.umass.edu/dissertations_2/2092)Cited by:[§3\.2](https://arxiv.org/html/2605.06901#S3.SS2.p1.1)\.
- R\. A\. Bolt \(1980\)“Put\-that\-there” voice and gesture at the graphics interface\.InProceedings of the 7th annual conference on Computer graphics and interactive techniques,pp\. 262–270\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- R\. Bommasani, D\. A\. Hudson, E\. Adeli, R\. Altman, S\. Arora, S\. von Arx, M\. S\. Bernstein, J\. Bohg, A\. Bosselut, E\. Brunskill, E\. Brynjolfsson, S\. Buch, D\. Card, R\. Castellon, N\. Chatterji, A\. Chen, K\. Creel, J\. Q\. Davis, D\. Demszky, C\. Donahue, M\. Doumbouya, E\. Durmus, S\. Ermon, J\. Etchemendy, K\. Ethayarajh, L\. Fei\-Fei, C\. Finn, T\. Gale, L\. Gillespie, K\. Goel, N\. Goodman, S\. Grossman, N\. Guha, T\. Hashimoto, P\. Henderson, J\. Hewitt, D\. E\. Ho, J\. Hong, K\. Hsu, J\. Huang, T\. Icard, S\. Jain, D\. Jurafsky, P\. Kalluri, S\. Karamcheti, G\. Keeling, F\. Khani, O\. Khattab, P\. W\. Koh, M\. Krass, R\. Krishna, R\. Kuditipudi, A\. Kumar, F\. Ladhak, M\. Lee, T\. Lee, J\. Leskovec, I\. Levent, X\. L\. Li, X\. Li, T\. Ma, A\. Malik, C\. D\. Manning, S\. Mirchandani, E\. Mitchell, Z\. Munyikwa, S\. Nair, A\. Narayan, D\. Narayanan, B\. Newman, A\. Nie, J\. C\. Niebles, H\. Nilforoshan, J\. Nyarko, G\. Ogut, L\. Orr, I\. Papadimitriou, J\. S\. Park, C\. Piech, E\. Portelance, C\. Potts, A\. Raghunathan, R\. Reich, H\. Ren, F\. Rong, Y\. Roohani, C\. Ruiz, J\. Ryan, C\. Ré, D\. Sadigh, S\. Sagawa, K\. Santhanam, A\. Shih, K\. Srinivasan, A\. Tamkin, R\. Taori, A\. W\. Thomas, F\. Tramèr, R\. E\. Wang, W\. Wang, B\. Wu, J\. Wu, Y\. Wu, S\. M\. Xie, M\. Yasunaga, J\. You, M\. Zaharia, M\. Zhang, T\. Zhang, X\. Zhang, Y\. Zhang, L\. Zheng, K\. Zhou, and P\. Liang \(2021\)On the opportunities and risks of foundation models\.Vol\.abs/2108\.07258\.External Links:[Link](https://arxiv.org/abs/2108.07258)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.p1.1),[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p2.2)\.
- R\. Bommasani, K\. Klyman, S\. Longpre, S\. Kapoor, N\. Maslej, B\. Xiong, D\. Zhang, and P\. Liang \(2023\)The foundation model transparency index\.ArXiv preprintabs/2310\.12941\.External Links:[Link](https://arxiv.org/abs/2310.12941)Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1)\.
- S\. R\. Bowman and G\. Dahl \(2021\)What will it take to fix benchmarking in natural language understanding?\.InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,K\. Toutanova, A\. Rumshisky, L\. Zettlemoyer, D\. Hakkani\-Tur, I\. Beltagy, S\. Bethard, R\. Cotterell, T\. Chakraborty, and Y\. Zhou \(Eds\.\),Online,pp\. 4843–4855\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.naacl-main.385),[Link](https://aclanthology.org/2021.naacl-main.385)Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px2.p1.1)\.
- S\. R\. Bowman, J\. Hyun, E\. Perez, E\. Chen, C\. Pettit, S\. Heiner, K\. Lukošiūtė, A\. Askell, A\. Jones, A\. Chen, A\. Goldie, A\. Mirhoseini, C\. McKinnon, C\. Olah, D\. Amodei, D\. Amodei, D\. Drain, D\. Li, E\. Tran\-Johnson, J\. Kernion, J\. Kerr, J\. Mueller, J\. Ladish, J\. Landau, K\. Ndousse, L\. Lovitt, N\. Elhage, N\. Schiefer, N\. Joseph, N\. Mercado, N\. DasSarma, R\. Larson, S\. McCandlish, S\. Kundu, S\. Johnston, S\. Kravec, S\. E\. Showk, S\. Fort, T\. Telleen\-Lawton, T\. Brown, T\. Henighan, T\. Hume, Y\. Bai, Z\. Hatfield\-Dodds, B\. Mann, and J\. Kaplan \(2022\)Measuring progress on scalable oversight for large language models\.Vol\.abs/2211\.03540\.External Links:[Link](https://arxiv.org/abs/2211.03540)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p1.1)\.
- B\. Brittain \(2024a\)OpenAI, microsoft defeat us consumer\-privacy lawsuit for now\.Reuters\.External Links:[Link](https://www.reuters.com/legal/transactional/openai-microsoft-defeat-us-consumer-privacy-lawsuit-now-2024-05-24/)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- B\. Brittain \(2024b\)US newspapers sue openai for copyright infringement over ai training\.Reuters\.External Links:[Link](https://www.reuters.com/legal/us-newspapers-sue-openai-copyright-infringement-over-ai-training-2024-04-30/)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- A\. G\. Brooks and C\. Breazeal \(2006\)Working with robots and objects: revisiting deictic reference for achieving spatial common ground\.InProceedings of the 1st ACM SIGCHI/SIGART conference on Human\-robot interaction,pp\. 297–304\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- T\. B\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell, S\. Agarwal, A\. Herbert\-Voss, G\. Krueger, T\. Henighan, R\. Child, A\. Ramesh, D\. M\. Ziegler, J\. Wu, C\. Winter, C\. Hesse, M\. Chen, E\. Sigler, M\. Litwin, S\. Gray, B\. Chess, J\. Clark, C\. Berner, S\. McCandlish, A\. Radford, I\. Sutskever, and D\. Amodei \(2020\)Language models are few\-shot learners\.InAdvances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\-12, 2020, virtual,H\. Larochelle, M\. Ranzato, R\. Hadsell, M\. Balcan, and H\. Lin \(Eds\.\),External Links:[Link](https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1),[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p2.2),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- E\. Brynjolfsson, D\. Li, and L\. Raymond \(2024\)Generative ai at work\.External Links:2304\.11771,[Link](https://arxiv.org/abs/2304.11771)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p2.1)\.
- S\. Bubeck, V\. Chandrasekaran, R\. Eldan, J\. Gehrke, E\. Horvitz, E\. Kamar, P\. Lee, Y\. T\. Lee, Y\. Li, S\. Lundberg, H\. Nori, H\. Palangi, M\. T\. Ribeiro, and Y\. Zhang \(2023\)Sparks of artificial general intelligence: early experiments with gpt\-4\.External Links:2303\.12712,[Link](https://arxiv.org/abs/2303.12712)Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1)\.
- A\. Cai, I\. Arawjo, and E\. L\. Glassman \(2024\)Antagonistic ai\.arXiv preprint arXiv:2402\.07350\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px3.p1.1)\.
- A\. Caliskan, J\. J\. Bryson, and A\. Narayanan \(2016\)Semantics derived automatically from language corpora contain human\-like biases\.Science356,pp\. 183 – 186\.External Links:[Link](https://api.semanticscholar.org/CorpusID:23163324)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1)\.
- T\. Capel and M\. Brereton \(2023\)What is human\-centered about human\-centered ai? A map of the research landscape\.InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23\-28, 2023,A\. Schmidt, K\. Väänänen, T\. Goyal, P\. O\. Kristensson, A\. Peters, S\. Mueller, J\. R\. Williamson, and M\. L\. Wilson \(Eds\.\),pp\. 359:1–359:23\.External Links:[Document](https://dx.doi.org/10.1145/3544548.3580959),[Link](https://doi.org/10.1145/3544548.3580959)Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p4.1),[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- P\. Cardon, C\. Fleischmann, J\. Aritz, M\. Logemann, and J\. Heidewald \(2023\)The challenges and opportunities of ai\-assisted writing: developing ai literacy for the ai age\.Business and Professional Communication Quarterly86\(3\),pp\. 257–295\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- N\. Carlini, D\. Ippolito, M\. Jagielski, K\. Lee, F\. Tramèr, and C\. Zhang \(2023\)Quantifying memorization across neural language models\.InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\-5, 2023,External Links:[Link](https://openreview.net/pdf?id=TatRHT%5C_1cK)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1)\.
- N\. Carlini, C\. Liu, Ú\. Erlingsson, J\. Kos, and D\. Song \(2019\)The secret sharer: evaluating and testing unintended memorization in neural networks\.In28th USENIX security symposium \(USENIX security 19\),pp\. 267–284\.Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1)\.
- N\. Carlini, F\. Tramer, E\. Wallace, M\. Jagielski, A\. Herbert\-Voss, K\. Lee, A\. Roberts, T\. Brown, D\. Song, U\. Erlingsson,et al\.\(2021\)Extracting training data from large language models\.In30th USENIX Security Symposium \(USENIX Security 21\),pp\. 2633–2650\.Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1),[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1)\.
- S\. Casper, X\. Davies, C\. Shi, T\. K\. Gilbert, J\. Scheurer, J\. Rando, R\. Freedman, T\. Korbak, D\. Lindner, P\. Freire, T\. Wang, S\. Marks, C\. Segerie, M\. Carroll, A\. Peng, P\. Christoffersen, M\. Damani, S\. Slocum, U\. Anwar, A\. Siththaranjan, M\. Nadeau, E\. J\. Michaud, J\. Pfau, D\. Krasheninnikov, X\. Chen, L\. Langosco, P\. Hase, E\. Bıyık, A\. Dragan, D\. Krueger, D\. Sadigh, and D\. Hadfield\-Menell \(2023\)Open problems and fundamental limitations of reinforcement learning from human feedback\.Vol\.abs/2307\.15217\.External Links:[Link](https://arxiv.org/abs/2307.15217)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1),[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.p1.1)\.
- L\. Castricato, N\. Lile, R\. Rafailov, J\. Fränken, and C\. Finn \(2025\)Persona: a reproducible testbed for pluralistic alignment\.InProceedings of the 31st International Conference on Computational Linguistics,pp\. 11348–11368\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1)\.
- I\. Caswell, T\. Breiner, D\. van Esch, and A\. Bapna \(2020\)Language ID in the wild: unexpected challenges on the path to a thousand\-language web text corpus\.InProceedings of the 28th International Conference on Computational Linguistics,D\. Scott, N\. Bel, and C\. Zong \(Eds\.\),Barcelona, Spain \(Online\),pp\. 6588–6608\.External Links:[Link](https://aclanthology.org/2020.coling-main.579/),[Document](https://dx.doi.org/10.18653/v1/2020.coling-main.579)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- A\. Celikyilmaz, E\. Clark, and J\. Gao \(2021\)Evaluation of text generation: a survey\.External Links:2006\.14799,[Link](https://arxiv.org/abs/2006.14799)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p3.1)\.
- A\. Challapally, C\. Pease, R\. Raskar, and P\. Chari \(2025\)The genai divide: state of ai in business 2025\.External Links:[Link](https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf)Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p2.1),[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p3.1)\.
- C\. Chan, W\. Chen, Y\. Su, J\. Yu, W\. Xue, S\. Zhang, J\. Fu, and Z\. Liu \(2023\)ChatEval: towards better llm\-based evaluators through multi\-agent debate\.Vol\.abs/2308\.07201\.External Links:[Link](https://arxiv.org/abs/2308.07201)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- Y\. Chan, G\. Pu, A\. Shanker, P\. Suresh, P\. Jenks, J\. Heyer, and S\. Denton \(2024\)Balancing cost and effectiveness of synthetic data generation strategies for llms\.Note:arXiv preprint arXiv:2409\.19759v3 \[cs\.CL\]CC BY 4\.0\. Denotes equal contribution\. Work was done while Yung\-Chieh was interning at Scale AI\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- R\. Chang, Y\. Liu, and A\. Guo \(2024a\)WorldScribe: towards context\-aware live visual descriptions\.InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology,pp\. 1–18\.Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- Y\. Chang, X\. Wang, J\. Wang, Y\. Wu, L\. Yang, K\. Zhu, H\. Chen, X\. Yi, C\. Wang, Y\. Wang, W\. Ye, Y\. Zhang, Y\. Chang, P\. S\. Yu, Q\. Yang, and X\. Xie \(2023\)A survey on evaluation of large language models\.ArXiv preprintabs/2307\.03109\.External Links:[Link](https://arxiv.org/abs/2307.03109)Cited by:[§5\.2](https://arxiv.org/html/2605.06901#S5.SS2.p1.1)\.
- Y\. Chang, X\. Wang, J\. Wang, Y\. Wu, L\. Yang, K\. Zhu, H\. Chen, X\. Yi, C\. Wang, Y\. Wang, W\. Ye, Y\. Zhang, Y\. Chang, P\. S\. Yu, Q\. Yang, and X\. Xie \(2024b\)A survey on evaluation of large language models\.ACM Trans\. Intell\. Syst\. Technol\.15\(3\)\.External Links:ISSN 2157\-6904,[Link](https://doi.org/10.1145/3641289),[Document](https://dx.doi.org/10.1145/3641289)Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- P\. Chao, A\. Robey, E\. Dobriban, H\. Hassani, G\. J\. Pappas, and E\. Wong \(2023\)Jailbreaking black box large language models in twenty queries\.ArXiv preprintabs/2310\.08419\.External Links:[Link](https://arxiv.org/abs/2310.08419)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
- A\. Chatterji, T\. Cunningham, D\. J\. Deming, Z\. Hitzig, C\. Ong, C\. Y\. Shan, and K\. Wadman \(2025\)How people use chatgpt\.Technical reportNational Bureau of Economic Research\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1),[§7](https://arxiv.org/html/2605.06901#S7.p1.1)\.
- C\. Chen, X\. Feng, J\. Zhou, J\. Yin, and X\. Zheng \(2023\)Federated large language model: a position paper\.ArXiv preprintabs/2307\.08925\.External Links:[Link](https://arxiv.org/abs/2307.08925)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- D\. Chen, Y\. Chen, A\. Rege, and R\. K\. Vinayak \(2024a\)PAL: pluralistic alignment framework for learning from heterogeneous preferences\.InNeurIPS 2024 Workshop on Fine\-Tuning in Modern Machine Learning: Principles and Scalability,External Links:[Link](https://openreview.net/forum?id=wVg2kVQOzq)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p5.1)\.
- G\. H\. Chen, S\. Chen, Z\. Liu, F\. Jiang, and B\. Wang \(2024b\)Humans or LLMs as the judge? a study on judgement bias\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 8301–8327\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.474/),[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.474)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- J\. Chen and D\. Yang \(2023\)Unlearn what you want to forget: efficient unlearning for LLMs\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 12041–12052\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.738),[Link](https://aclanthology.org/2023.emnlp-main.738)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- J\. Chen, Y\. Zhang, B\. Wang, W\. X\. Zhao, J\. Wen, and W\. Chen \(2024c\)Unveiling the flaws: exploring imperfections in synthetic data and mitigation strategies for large language models\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 14855–14865\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p2.1),[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px2.p1.1)\.
- K\. Chen, Z\. He, T\. Shi, and K\. Lerman \(2025a\)STEER\-bench: a benchmark for evaluating the steerability of large language models\.arXiv preprint arXiv:2505\.20645\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p1.1)\.
- M\. Chen, J\. Tworek, H\. Jun, Q\. Yuan, H\. P\. de Oliveira Pinto, J\. Kaplan, H\. Edwards, Y\. Burda, N\. Joseph, G\. Brockman, A\. Ray, R\. Puri, G\. Krueger, M\. Petrov, H\. Khlaaf, G\. Sastry, P\. Mishkin, B\. Chan, S\. Gray, N\. Ryder, M\. Pavlov, A\. Power, L\. Kaiser, M\. Bavarian, C\. Winter, P\. Tillet, F\. P\. Such, D\. Cummings, M\. Plappert, F\. Chantzis, E\. Barnes, A\. Herbert\-Voss, W\. H\. Guss, A\. Nichol, A\. Paino, N\. Tezak, J\. Tang, I\. Babuschkin, S\. Balaji, S\. Jain, W\. Saunders, C\. Hesse, A\. N\. Carr, J\. Leike, J\. Achiam, V\. Misra, E\. Morikawa, A\. Radford, M\. Knight, M\. Brundage, M\. Murati, K\. Mayer, P\. Welinder, B\. McGrew, D\. Amodei, S\. McCandlish, I\. Sutskever, and W\. Zaremba \(2021\)Evaluating large language models trained on code\.External Links:2107\.03374,[Link](https://arxiv.org/abs/2107.03374)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- V\. Chen, A\. Talwalkar, R\. Brennan, and G\. Neubig \(2025b\)Code with me or for me? how increasing ai automation transforms developer workflows\.arXiv preprint arXiv:2507\.08149\.Cited by:[§7](https://arxiv.org/html/2605.06901#S7.p3.1)\.
- Y\. Chen, A\. Wu, T\. DePodesta, C\. Yeh, K\. Li, N\. C\. Marin, O\. Patel, J\. Riecke, S\. Raval, O\. Seow,et al\.\(2024d\)Designing a dashboard for transparency and control of conversational ai\.arXiv preprint arXiv:2406\.07882\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p2.1)\.
- M\. Cheng, S\. L\. Blodgett, A\. DeVrio, L\. Egede, and A\. Olteanu \(2025a\)Dehumanizing machines: mitigating anthropomorphic behaviors in text generation systems\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),W\. Che, J\. Nabende, E\. Shutova, and M\. T\. Pilehvar \(Eds\.\),Vienna, Austria\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- M\. Cheng, E\. Durmus, and D\. Jurafsky \(2023a\)Marked personas: using natural language prompts to measure stereotypes in language models\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 1504–1532\.External Links:[Link](https://aclanthology.org/2023.acl-long.84/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.84)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1),[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p1.1)\.
- M\. Cheng, K\. Gligoric, T\. Piccardi, and D\. Jurafsky \(2024a\)AnthroScore: a computational linguistic measure of anthropomorphism\.InProceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics \(Volume 1: Long Papers\),Y\. Graham and M\. Purver \(Eds\.\),St\. Julian’s, Malta,pp\. 807–825\.External Links:[Link](https://aclanthology.org/2024.eacl-long.49)Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- M\. Cheng, C\. Lee, P\. Khadpe, S\. Yu, D\. Han, and D\. Jurafsky \(2025b\)Sycophantic ai decreases prosocial intentions and promotes dependence\.arXiv preprint arXiv:2510\.01395\.Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p2.1),[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px1.p1.1),[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px1.p1.1)\.
- M\. Cheng, T\. Piccardi, and D\. Yang \(2023b\)CoMPosT: characterizing and evaluating caricature in LLM simulations\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 10853–10875\.External Links:[Link](https://aclanthology.org/2023.emnlp-main.669/),[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.669)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1),[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p1.1)\.
- T\. Cheng, Y\. Wang, W\. He, Q\. Wang, Y\. Cheng, Y\. Zhang, R\. Feng, X\. Zhang,et al\.\(2025c\)FineMedLM\-o1: enhancing medical knowledge reasoning ability of llm from supervised fine\-tuning to test\-time training\.InSecond Conference on Language Modeling,Cited by:[§4\.1](https://arxiv.org/html/2605.06901#S4.SS1.p1.1)\.
- Z\. Cheng, Z\. Cheng, J\. He, J\. Sun, K\. Wang, Y\. Lin, Z\. Lian, X\. Peng, and A\. Hauptmann \(2024b\)Emotion\-llama: multimodal emotion recognition and reasoning with instruction tuning\.ArXiv preprintabs/2406\.11161\.External Links:[Link](https://arxiv.org/abs/2406.11161)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- C\. Chiang and H\. Lee \(2023\)Can large language models be an alternative to human evaluations?\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 15607–15631\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.870),[Link](https://aclanthology.org/2023.acl-long.870)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- W\. Chiang, Z\. Li, Z\. Lin, Y\. Sheng, Z\. Wu, H\. Zhang, L\. Zheng, S\. Zhuang, Y\. Zhuang, J\. E\. Gonzalez, I\. Stoica, and E\. P\. Xing \(2023\)Vicuna: an open\-source chatbot impressing gpt\-4 with 90%\* chatgpt quality\.External Links:[Link](https://lmsys.org/blog/2023-03-30-vicuna/)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- W\. Chiang, L\. Zheng, Y\. Sheng, A\. N\. Angelopoulos, T\. Li, D\. Li, B\. Zhu, H\. Zhang, M\. I\. Jordan, J\. E\. Gonzalez, and I\. Stoica \(2024\)Chatbot arena: an open platform for evaluating llms by human preference\.InICML,External Links:[Link](https://openreview.net/forum?id=3MW8GKNyzI)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p3.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px3.p1.1)\.
- J\. Chien and D\. Danks \(2024\)Beyond behaviorist representational harms: a plan for measurement and mitigation\.InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency,pp\. 933–946\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p1.1)\.
- L\. Chioda, D\. Contreras\-Loya, P\. Gertler, and D\. Carney \(2021\)Making entrepreneurs: returns to training youth in hard versus soft business skills\.Working PaperTechnical Report28845,Working Paper Series,National Bureau of Economic Research\.External Links:[Document](https://dx.doi.org/10.3386/w28845),[Link](http://www.nber.org/papers/w28845)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p9.1)\.
- D\. Choi, V\. Huang, S\. Schwettmann, and J\. Steinhardt \(2025\)Scalably extracting latent representations of users\.Note:[https://transluce\.org/user\-modeling](https://transluce.org/user-modeling)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p2.1)\.
- A\. Chowdhery, S\. Narang, J\. Devlin, M\. Bosma, G\. Mishra, A\. Roberts, P\. Barham, H\. W\. Chung, C\. Sutton, S\. Gehrmann,et al\.\(2023\)Palm: scaling language modeling with pathways\.Journal of Machine Learning Research24\(240\),pp\. 1–113\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- P\. F\. Christiano, J\. Leike, T\. B\. Brown, M\. Martic, S\. Legg, and D\. Amodei \(2017\)Deep reinforcement learning from human preferences\.InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4\-9, 2017, Long Beach, CA, USA,I\. Guyon, U\. von Luxburg, S\. Bengio, H\. M\. Wallach, R\. Fergus, S\. V\. N\. Vishwanathan, and R\. Garnett \(Eds\.\),pp\. 4299–4307\.External Links:[Link](https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p1.1)\.
- Z\. Chu, Z\. Wang, and W\. Zhang \(2024\)Fairness in large language models: a taxonomic survey\.SIGKDD Explor\. Newsl\.26\(1\),pp\. 34–48\.External Links:[Document](https://dx.doi.org/10.1145/3682112.3682117),ISSN 1931\-0145,[Link](https://doi.org/10.1145/3682112.3682117)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1)\.
- H\. W\. Chung, L\. Hou, S\. Longpre, B\. Zoph, Y\. Tay, W\. Fedus, Y\. Li, X\. Wang, M\. Dehghani, S\. Brahma,et al\.\(2024\)Scaling instruction\-finetuned language models\.Journal of Machine Learning Research25\(70\),pp\. 1–53\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p1.1),[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- J\. Chung, E\. Kamar, and S\. Amershi \(2023\)Increasing diversity while maintaining accuracy: text data generation with large language models and human interventions\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 575–593\.External Links:[Link](https://aclanthology.org/2023.acl-long.34/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.34)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p2.1)\.
- M\. Cinelli, G\. De Francisci Morales, A\. Galeazzi, W\. Quattrociocchi, and M\. Starnini \(2021\)The echo chamber effect on social media\.Proceedings of the National Academy of Sciences118\(9\)\.Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p2.1)\.
- T\. Claburn \(2023\)GitHub, microsoft, openai fail to wriggle out of copilot copyright lawsuit\.The Register\.External Links:[Link](https://www.theregister.com/2023/05/12/github_microsoft_openai_copilot/)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- K\. Clark and C\. D\. Manning \(2016\)Deep reinforcement learning for mention\-ranking coreference models\.InProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,J\. Su, K\. Duh, and X\. Carreras \(Eds\.\),Austin, Texas,pp\. 2256–2262\.External Links:[Document](https://dx.doi.org/10.18653/v1/D16-1245),[Link](https://aclanthology.org/D16-1245)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p2.1)\.
- M\. Cohn, M\. Pushkarna, G\. O\. Olanubi, J\. M\. Moran, D\. Padgett, Z\. Mengesha, and C\. Heldreth \(2024\)Believing anthropomorphism: examining the role of anthropomorphic cues on trust in large language models\.InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems,pp\. 1–15\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- M\. &\. Company \(2025\)What is productivity?\.Note:[https://www\.mckinsey\.com/featured\-insights/mckinsey\-explainers/what\-is\-productivity](https://www.mckinsey.com/featured-insights/mckinsey-explainers/what-is-productivity)Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1)\.
- V\. Conitzer, R\. Freedman, J\. Heitzig, W\. H\. Holliday, B\. M\. Jacobs, N\. Lambert, M\. Mossé, E\. Pacuit, S\. Russell, H\. Schoelkopf,et al\.\(2024\)Position: social choice should guide ai alignment in dealing with diverse human feedback\.InForty\-first International Conference on Machine Learning,Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p1.1)\.
- A\. Conneau, K\. Khandelwal, N\. Goyal, V\. Chaudhary, G\. Wenzek, F\. Guzmán, E\. Grave, M\. Ott, L\. Zettlemoyer, and V\. Stoyanov \(2020\)Unsupervised cross\-lingual representation learning at scale\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 8440–8451\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.747),[Link](https://aclanthology.org/2020.acl-main.747)Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p1.1)\.
- A\. Conneau and G\. Lample \(2019\)Cross\-lingual language model pretraining\.InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8\-14, 2019, Vancouver, BC, Canada,H\. M\. Wallach, H\. Larochelle, A\. Beygelzimer, F\. d’Alché\-Buc, E\. B\. Fox, and R\. Garnett \(Eds\.\),pp\. 7057–7067\.External Links:[Link](https://proceedings.neurips.cc/paper/2019/hash/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- K\. Crawford \(2021\)The atlas of ai: power, politics, and the planetary costs of artificial intelligence\.Yale University Press\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1)\.
- Y\. Cui, H\. Chen, H\. Deng, X\. Huang, X\. Li, J\. Liu, Y\. Liu, Z\. Luo, J\. Wang, W\. Wang,et al\.\(2025a\)Emu3\.5: native multimodal models are world learners\.arXiv preprint arXiv:2510\.26583\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- Z\. K\. Cui, M\. Demirer, S\. Jaffe, L\. Musolff, S\. Peng, and T\. Salz \(2025b\)The effects of generative ai on high\-skilled work: evidence from three field experiments with software developers\.Available at SSRN 4945566\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- M\. G\. Cuna and E\. Shen \(2025\)The hidden curriculum\.External Links:[Link](https://mgcuna.github.io/website/JMP_latest.pdf)Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p2.1)\.
- H\. Cyberey, Y\. Ji, and D\. K\. Evans \(2025\)Do prevalent bias metrics capture allocational harms from llms?\.InThe Sixth Workshop on Insights from Negative Results in NLP,pp\. 34–45\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p1.1)\.
- M\. I\. G\. Daepp and S\. Counts \(2024\)The emerging AI divide in the United States\.Working Paper\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p1.1)\.
- J\. Dai, X\. Pan, R\. Sun, J\. Ji, X\. Xu, M\. Liu, Y\. Wang, and Y\. Yang \(2023\)Safe rlhf: safe reinforcement learning from human feedback\.Vol\.abs/2310\.12773\.External Links:[Link](https://arxiv.org/abs/2310.12773)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p4.1)\.
- H\. T\. Dang \(2006\)DUC 2005: evaluation of question\-focused summarization systems\.InProceedings of the Workshop on Task\-Focused Summarization and Question Answering,T\. Chua, J\. Goldstein, S\. Teufel, and L\. Vanderwende \(Eds\.\),Sydney, Australia,pp\. 48–55\.External Links:[Link](https://aclanthology.org/W06-0707)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p2.1)\.
- J\. Dang, A\. Ahmadian, K\. Marchisio, J\. Kreutzer, A\. Üstün, and S\. Hooker \(2024\)RLHF can speak many languages: unlocking multilingual preference optimization for llms\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 13134–13156\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- D\. Das, K\. D\. Langis, A\. Martin\-Boyle, J\. Kim, M\. Lee, Z\. M\. Kim, S\. A\. Hayati, R\. Owan, B\. Hu, R\. Parkar, R\. Koo, J\. Park, A\. Tyagi, L\. Ferland, S\. Roy, V\. Liu, and D\. Kang \(2024\)Under the surface: tracking the artifactuality of llm\-generated data\.Vol\.abs/2401\.14698\.External Links:[Link](https://arxiv.org/abs/2401.14698)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p2.1)\.
- Databricks \(2023\)Databricks dolly 15k: an open instruction\-tuned dataset\.Note:[https://github\.com/databricks/dolly](https://github.com/databricks/dolly)Databricks blog and dataset release\. Licensed under CC BY\-SA 3\.0\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p2.1)\.
- A\. De, S\. S\. Gudipudi, S\. Panchanan, and M\. S\. Desarkar \(2022\)ComplAI: theory of a unified framework for multi\-factor assessment of black\-box supervised machine learning models\.ArXivabs/2212\.14599\.External Links:[Link](https://api.semanticscholar.org/CorpusID:255340443)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p3.1)\.
- J\. De Freitas, Z\. Oguz\-Uguralp, and A\. Kaan\-Uguralp \(2025a\)Emotional manipulation by ai companions\.arXiv preprint arXiv:2508\.19258\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px1.p1.1)\.
- J\. De Freitas, Z\. Oğuz\-Uğuralp, A\. K\. Uğuralp, and S\. Puntoni \(2025b\)AI companions reduce loneliness\.Journal of Consumer Research,pp\. ucaf040\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- F\. Delgado, S\. Yang, M\. Madaio, and Q\. Yang \(2023\)The participatory turn in ai design: theoretical foundations and the current state of practice\.InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization,pp\. 1–23\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- F\. Dell’Acqua, E\. McFowland III, E\. R\. Mollick, H\. Lifshitz\-Assaf, K\. Kellogg, S\. Rajendran, L\. Krayer, F\. Candelon, and K\. R\. Lakhani \(2023\)Navigating the jagged technological frontier: field experimental evidence of the effects of ai on knowledge worker productivity and quality\.Harvard business school technology & operations mgt\. Unit working paper\(24\-013\)\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p3.1)\.
- D\. Demszky, D\. Yang, D\. S\. Yeager, C\. J\. Bryan, M\. Clapper, S\. Chandhok, J\. C\. Eichstaedt, C\. Hecht, J\. Jamieson, M\. Johnson, and et al\. \(2023\)Using large language models in psychology\.Nature Reviews Psychology\.External Links:[Link](https://www.nature.com/articles/s44159-023-00241-5#citeas),[Document](https://dx.doi.org/10.1038/s44159-023-00241-5)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p4.1)\.
- E\. Denton, A\. Hanna, R\. Amironesei, A\. Smart, and H\. Nicole \(2021\)On the genealogy of machine learning datasets: a critical history of imagenet\.Big Data & Society8\(2\),pp\. 20539517211035955\.Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1)\.
- A\. Deshpande, V\. Murahari, T\. Rajpurohit, A\. Kalyan, and K\. Narasimhan \(2023\)Toxicity in chatgpt: analyzing persona\-assigned language models\.InFindings of the Association for Computational Linguistics: EMNLP 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 1236–1270\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.88),[Link](https://aclanthology.org/2023.findings-emnlp.88)Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- S\. Dev, E\. Sheng, J\. Zhao, A\. Amstutz, J\. Sun, Y\. Hou, M\. Sanseverino, J\. Kim, A\. Nishi, N\. Peng,et al\.\(2022\)On measures of biases and harms in nlp\.InFindings of the association for computational linguistics: AACL\-IJCNLP 2022,pp\. 246–267\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- P\. G\. Devine \(2001\)Implicit prejudice and stereotyping: how automatic are they? introduction to the special section\.\.Journal of Personality and Social Psychology81\(5\),pp\. 757\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p1.1)\.
- J\. Devlin, M\. Chang, K\. Lee, and K\. Toutanova \(2019\)BERT: pre\-training of deep bidirectional transformers for language understanding\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 4171–4186\.External Links:[Link](https://aclanthology.org/N19-1423/),[Document](https://dx.doi.org/10.18653/v1/N19-1423)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1)\.
- J\. Dhamala, T\. Sun, V\. Kumar, S\. Krishna, Y\. Pruksachatkun, K\. Chang, and R\. Gupta \(2021\)Bold: dataset and metrics for measuring biases in open\-ended language generation\.InProceedings of the 2021 ACM conference on fairness, accountability, and transparency,pp\. 862–872\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- D\. Difallah, E\. Filatova, and P\. Ipeirotis \(2018\)Demographics and dynamics of mechanical turk workers\.InProceedings of the Eleventh ACM International Conference on Web Search and Data Mining,WSDM ’18,New York, NY, USA,pp\. 135–143\.External Links:ISBN 9781450355810,[Link](https://doi.org/10.1145/3159652.3159661),[Document](https://dx.doi.org/10.1145/3159652.3159661)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p3.1)\.
- J\. Dodge, M\. Sap, A\. Marasović, W\. Agnew, G\. Ilharco, D\. Groeneveld, M\. Mitchell, and M\. Gardner \(2021\)Documenting large webtext corpora: a case study on the colossal clean crawled corpus\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 1286–1305\.External Links:[Link](https://aclanthology.org/2021.emnlp-main.98/),[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.98)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- H\. Dong, W\. Xiong, B\. Pang, H\. Wang, H\. Zhao, Y\. Zhou, N\. Jiang, D\. Sahoo, C\. Xiong, and T\. Zhang \(2024a\)RLHF workflow: from reward modeling to online rlhf\.Vol\.abs/2405\.07863\.External Links:[Link](https://arxiv.org/abs/2405.07863)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p2.1)\.
- Y\. Dong, R\. Mu, G\. Jin, Y\. Qi, J\. Hu, X\. Zhao, J\. Meng, W\. Ruan, and X\. Huang \(2024b\)Building guardrails for large language models\.ArXiv preprintabs/2402\.01822\.External Links:[Link](https://arxiv.org/abs/2402.01822)Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- Z\. Dong, Z\. Zhou, C\. Yang, J\. Shao, and Y\. Qiao \(2024c\)Attacks, defenses and evaluations for LLM conversation safety: a survey\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 6734–6747\.External Links:[Link](https://aclanthology.org/2024.naacl-long.375)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p1.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px2.p1.1)\.
- P\. Dongre, M\. Behravan, K\. Gupta, M\. Billinghurst, and D\. Gračanin \(2024\)Integrating physiological data with large language models for empathic human\-ai interaction\.Vol\.abs/2404\.15351\.External Links:[Link](https://arxiv.org/abs/2404.15351)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- R\. Dotan and S\. Milli \(2020\)Value\-laden disciplinary shifts in machine learning\.InProceedings of the 2020 Conference on Fairness, Accountability, and Transparency,pp\. 294–294\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- B\. D\. Douglas, P\. J\. Ewell, and M\. Brauer \(2023\)Data quality in online human\-subjects research: comparisons between mturk, prolific, cloudresearch, qualtrics, and sona\.PLOS ONE18\(3\),pp\. 1–17\.External Links:[Document](https://dx.doi.org/10.1371/journal.pone.0279720),[Link](https://doi.org/10.1371/journal.pone.0279720)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p4.1),[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p3.1)\.
- J\. Du and H\. Mi \(2021\)Dp\-fp: differentially private forward propagation for large models\.ArXiv preprintabs/2112\.14430\.External Links:[Link](https://arxiv.org/abs/2112.14430)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- K\. Du, V\. Snæbjarnarson, N\. Stoehr, J\. White, A\. Schein, and R\. Cotterell \(2024\)Context versus prior knowledge in language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 13211–13235\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.714),[Link](https://aclanthology.org/2024.acl-long.714)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- N\. Du, Y\. Huang, A\. M\. Dai, S\. Tong, D\. Lepikhin, Y\. Xu, M\. Krikun, Y\. Zhou, A\. W\. Yu, O\. Firat, B\. Zoph, L\. Fedus, M\. P\. Bosma, Z\. Zhou, T\. Wang, Y\. E\. Wang, K\. Webster, M\. Pellat, K\. Robinson, K\. S\. Meier\-Hellstern, T\. Duke, L\. Dixon, K\. Zhang, Q\. V\. Le, Y\. Wu, Z\. Chen, and C\. Cui \(2022\)GLaM: efficient scaling of language models with mixture\-of\-experts\.InInternational Conference on Machine Learning, ICML 2022, 17\-23 July 2022, Baltimore, Maryland, USA,K\. Chaudhuri, S\. Jegelka, L\. Song, C\. Szepesvári, G\. Niu, and S\. Sabato \(Eds\.\),Proceedings of Machine Learning Research, Vol\.162,pp\. 5547–5569\.External Links:[Link](https://proceedings.mlr.press/v162/du22c.html)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- M\. H\. Dupré \(2024\)Note:Accessed: 2025\-04\-28External Links:[Link](https://futurism.com/ai-chatbots-teens-self-harm)Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- E\. Durmus, K\. Nguyen, T\. I\. Liao, N\. Schiefer, A\. Askell, A\. Bakhtin, C\. Chen, Z\. Hatfield\-Dodds, D\. Hernandez, N\. Joseph,et al\.\(2023\)Towards measuring the representation of subjective global opinions in language models\.ArXiv preprintabs/2306\.16388\.External Links:[Link](https://arxiv.org/abs/2306.16388)Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p1.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- E\. Durmus, K\. Nguyen, T\. Liao, N\. Schiefer, A\. Askell, A\. Bakhtin, C\. Chen, Z\. Hatfield\-Dodds, D\. Hernandez, N\. Joseph, L\. Lovitt, S\. McCandlish, O\. Sikder, A\. Tamkin, J\. Thamkul, J\. Kaplan, J\. Clark, and D\. Ganguli \(2024\)Towards measuring the representation of subjective global opinions in language models\.InFirst Conference on Language Modeling,External Links:[Link](https://openreview.net/forum?id=zl16jLb91v)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§4\.5](https://arxiv.org/html/2605.06901#S4.SS5.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- C\. Dwork, M\. Hardt, T\. Pitassi, O\. Reingold, and R\. Zemel \(2012\)Fairness through awareness\.InProceedings of the 3rd innovations in theoretical computer science conference,pp\. 214–226\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p4.1)\.
- Y\. Elazar, A\. Bhagia, I\. Magnusson, A\. Ravichander, D\. Schwenk, A\. Suhr, E\. P\. Walsh, D\. Groeneveld, L\. Soldaini, S\. Singh,et al\.\(2024\)What’s in my big data?\.InICLR,Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p5.1)\.
- R\. Eldan and M\. Russinovich \(2023\)Who’s harry potter? approximate unlearning in llms\.ArXiv preprintabs/2310\.02238\.External Links:[Link](https://arxiv.org/abs/2310.02238)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- T\. Eloundou, S\. Manning, P\. Mishkin, and D\. Rock \(2024\)GPTs are gpts: labor market impact potential of llms\.Science384\(6702\),pp\. 1306–1308\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p4.1),[§7](https://arxiv.org/html/2605.06901#S7.p3.1)\.
- D\. Emelin, R\. Le Bras, J\. D\. Hwang, M\. Forbes, and Y\. Choi \(2021\)Moral stories: situated reasoning about norms, intents, actions, and their consequences\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 698–718\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.54),[Link](https://aclanthology.org/2021.emnlp-main.54)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1)\.
- J\. Engel, K\. Somasundaram, M\. Goesele, A\. Sun, A\. Gamino, A\. Turner, A\. Talattof, A\. Yuan, B\. Souti, B\. Meredith,et al\.\(2023\)Project aria: a new tool for egocentric multi\-modal ai research\.ArXiv preprintabs/2308\.13561\.External Links:[Link](https://arxiv.org/abs/2308.13561)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- M\. Eriksson, E\. Purificato, A\. Noroozian, J\. Vinagre, G\. Chaslot, E\. Gomez, and D\. Fernandez\-Llorca \(2025\)Can we trust ai benchmarks? an interdisciplinary review of current issues in ai evaluation\.InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society,Vol\.8,pp\. 850–864\.Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1),[§5](https://arxiv.org/html/2605.06901#S5.p2.1)\.
- K\. Ethayarajh and D\. Jurafsky \(2020\)Utility is in the eye of the user: a critique of NLP leaderboards\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),B\. Webber, T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 4846–4853\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.393),[Link](https://aclanthology.org/2020.emnlp-main.393)Cited by:[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p1.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.p1.1)\.
- K\. Ethayarajh \(2024\)Behavior\-bound machine learning\.Ph\.D\. Thesis,Stanford University\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p2.1)\.
- J\. Etxaniz, G\. Azkune, A\. Soroa, O\. L\. de Lacalle, and M\. Artetxe \(2024\)Do multilingual language models think better in english?\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 2: Short Papers\),pp\. 550–564\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- V\. Eubanks \(2018\)Automating inequality: how high\-tech tools profile, police, and punish the poor\.St\. Martin’s Press\.Cited by:[§3\.2](https://arxiv.org/html/2605.06901#S3.SS2.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- T\. Fan, Y\. Yang, Y\. Jiang, Y\. Zhang, Y\. Chen, and C\. Huang \(2025\)AI\-trader: benchmarking autonomous agents in real\-time financial markets\.arXiv preprint arXiv:2512\.10971\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px3.p1.1)\.
- M\. Feffer, A\. Sinha, W\. H\. Deng, Z\. C\. Lipton, and H\. Heidari \(2024\)Red\-teaming for generative ai: silver bullet or security theater?\.InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society,Vol\.7,pp\. 421–437\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- S\. Feng, C\. Y\. Park, Y\. Liu, and Y\. Tsvetkov \(2023\)From pretraining data to language models to downstream tasks: tracking the trails of political biases leading to unfair NLP models\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 11737–11762\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.656),[Link](https://aclanthology.org/2023.acl-long.656)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1)\.
- S\. Feng, T\. Sorensen, Y\. Liu, J\. Fisher, C\. Y\. Park, Y\. Choi, and Y\. Tsvetkov \(2024\)Modular pluralism: pluralistic alignment via multi\-LLM collaboration\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 4151–4171\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.240)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- J\. Ferrao, M\. van der Lende, I\. Lichkovski, and C\. Neo \(2025\)The anatomy of alignment: decomposing preference optimization by steering sparse features\.arXiv preprint arXiv:2509\.12934\.Cited by:[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- E\. Fleisig, G\. Smith, M\. Bossi, I\. Rustagi, X\. Yin, and D\. Klein \(2024\)Linguistic bias in chatgpt: language models reinforce dialect discrimination\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 13541–13564\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1)\.
- M\. Forbes, J\. D\. Hwang, V\. Shwartz, M\. Sap, and Y\. Choi \(2020\)Social chemistry 101: learning to reason about social and moral norms\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),pp\. 653–670\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- G\. Franceschelli and M\. Musolesi \(2022\)Copyright in generative deep learning\.Data & Policy4,pp\. e17\.Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- E\. Frick, T\. Li, C\. Chen, W\. Chiang, A\. N\. Angelopoulos, J\. Jiao, B\. Zhu, J\. E\. Gonzalez, and I\. Stoica \(2024\)How to evaluate reward models for rlhf\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- B\. Friedman \(1996\)Value\-sensitive design\.interactions3\(6\),pp\. 16–23\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px4.p1.1)\.
- L\. Fu, G\. Datta, H\. Huang, W\. C\. Panitch, J\. Drake, J\. Ortiz, M\. Mukadam, M\. Lambeta, R\. Calandra, and K\. Goldberg \(2024\)A touch, vision, and language dataset for multimodal alignment\.InForty\-first International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=tFEOOH9eH0)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- A\. Fügener, J\. Grahl, A\. Gupta, and W\. Ketter \(2022\)Cognitive challenges in human–artificial intelligence collaboration: investigating the path toward productive delegation\.Information systems research33\(2\),pp\. 678–696\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- I\. Gabriel \(2020\)Artificial intelligence, values, and alignment\.Minds and machines30\(3\),pp\. 411–437\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1),[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- W\. Gan, Z\. Qi, J\. Wu, and J\. C\. Lin \(2023\)Large language models in education: vision and opportunities\.In2023 IEEE international conference on big data \(BigData\),pp\. 4776–4785\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- D\. Ganguli, L\. Lovitt, J\. Kernion, A\. Askell, Y\. Bai, S\. Kadavath, B\. Mann, E\. Perez, N\. Schiefer, K\. Ndousse,et al\.\(2022\)Red teaming language models to reduce harms: methods, scaling behaviors, and lessons learned\.arXiv preprint arXiv:2209\.07858\.Cited by:[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p2.1),[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- G\. Gao, A\. Taymanov, E\. Salinas, P\. Mineiro, and D\. Misra \(2024a\)Aligning llm agents by learning latent preference from user edits\.Vol\.abs/2404\.15269\.External Links:[Link](https://arxiv.org/abs/2404.15269)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- L\. Gao, S\. Biderman, S\. Black, L\. Golding, T\. Hoppe, C\. Foster, J\. Phang, H\. He, A\. Thite, N\. Nabeshima, S\. Presser, and C\. Leahy \(2021\)The pile: an 800gb dataset of diverse text for language modeling\.Vol\.abs/2101\.00027\.External Links:[Link](https://arxiv.org/abs/2101.00027)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p5.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- M\. Gao, X\. Hu, J\. Ruan, X\. Pu, and X\. Wan \(2024b\)LLM\-based nlg evaluation: current status and challenges\.Vol\.abs/2402\.01383\.External Links:[Link](https://arxiv.org/abs/2402.01383)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- Y\. Gao, D\. Dligach, T\. Miller, J\. Caskey, B\. Sharma, M\. M\. Churpek, and M\. Afshar \(2023\)Dr\. bench: diagnostic reasoning benchmark for clinical natural language processing\.Journal of biomedical informatics138,pp\. 104286\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1)\.
- X\. Ge, C\. Xu, D\. Misaki, H\. R\. Markus, and J\. L\. Tsai \(2024\)How culture shapes what people want from ai\.InProceedings of the 2024 CHI conference on human factors in computing systems,pp\. 1–15\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- T\. Gebru, J\. Morgenstern, B\. Vecchione, J\. W\. Vaughan, H\. Wallach, H\. D\. Iii, and K\. Crawford \(2021\)Datasheets for datasets\.Communications of the ACM64\(12\),pp\. 86–92\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p1.1)\.
- C\. Geertz \(2008\)Thick description: toward an interpretive theory of culture\.InThe cultural geography reader,pp\. 41–51\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p1.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1)\.
- S\. Gehman, S\. Gururangan, M\. Sap, Y\. Choi, and N\. A\. Smith \(2020\)RealToxicityPrompts: evaluating neural toxic degeneration in language models\.InFindings of the Association for Computational Linguistics: EMNLP 2020,T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 3356–3369\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.findings-emnlp.301),[Link](https://aclanthology.org/2020.findings-emnlp.301)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- S\. Gehrmann, E\. Clark, and T\. Sellam \(2023\)Repairing the cracked foundation: a survey of obstacles in evaluation practices for generated text\.Journal of Artificial Intelligence Research77,pp\. 103–166\.Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p3.1)\.
- D\. Gergle and D\. S\. Tan \(2014\)Experimental research in hci\.InWays of Knowing in HCI,pp\. 191–227\.Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p1.1)\.
- M\. Geva, R\. Schuster, J\. Berant, and O\. Levy \(2021\)Transformer feed\-forward layers are key\-value memories\.InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Online and Punta Cana, Dominican Republic,pp\. 5484–5495\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.emnlp-main.446),[Link](https://aclanthology.org/2021.emnlp-main.446)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- S\. Ghosh and A\. Caliskan \(2023\)Chatgpt perpetuates gender bias in machine translation and ignores non\-gendered pronouns: findings across bengali and five other low\-resource languages\.InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society,pp\. 901–912\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1)\.
- L\. H\. Gilpin, D\. Bau, B\. Z\. Yuan, A\. Bajwa, M\. Specter, and L\. Kagal \(2018\)Explaining explanations: an overview of interpretability of machine learning\.In2018 IEEE 5th International Conference on Data Science and Advanced Analytics \(DSAA\),Vol\.,pp\. 80–89\.External Links:[Document](https://dx.doi.org/10.1109/DSAA.2018.00018)Cited by:[§6\.1](https://arxiv.org/html/2605.06901#S6.SS1.p1.1)\.
- M\. Giulianelli, J\. Baan, W\. Aziz, R\. Fern’andez, and B\. Plank \(2023\)What comes next? evaluating uncertainty in neural text generators against human production variability\.InConference on Empirical Methods in Natural Language Processing,External Links:[Link](https://api.semanticscholar.org/CorpusID:258823318)Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- J\. A\. Goldstein, J\. Chao, S\. Grossman, A\. Stamos, and M\. Tomz \(2024\)How persuasive is ai\-generated propaganda?\.PNAS nexus3\(2\),pp\. pgae034\.Cited by:[§6\.3](https://arxiv.org/html/2605.06901#S6.SS3.p1.1)\.
- M\. L\. Gordon, M\. S\. Lam, J\. S\. Park, K\. Patel, J\. Hancock, T\. Hashimoto, and M\. S\. Bernstein \(2022\)Jury learning: integrating dissenting voices into machine learning models\.InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems,pp\. 1–19\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p2.1)\.
- M\. L\. Gordon, K\. Zhou, K\. Patel, T\. Hashimoto, and M\. S\. Bernstein \(2021\)The disagreement deconvolution: bringing machine learning performance metrics in line with reality\.InProceedings of the 2021 chi conference on human factors in computing systems,pp\. 1–14\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p2.1)\.
- R\. Grosse, J\. Bae, C\. Anil, N\. Elhage, A\. Tamkin, A\. Tajdini, B\. Steiner, D\. Li, E\. Durmus, E\. Perez, E\. Hubinger, K\. Lukošiūtė, K\. Nguyen, N\. Joseph, S\. McCandlish, J\. Kaplan, and S\. R\. Bowman \(2023\)Studying large language model generalization with influence functions\.Vol\.abs/2308\.03296\.External Links:[Link](https://arxiv.org/abs/2308.03296)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- A\. Gudibande, E\. Wallace, C\. Snell, X\. Geng, H\. Liu, P\. Abbeel, S\. Levine, and D\. Song \(2023\)The false promise of imitating proprietary llms\.ArXiv preprintabs/2305\.15717\.External Links:[Link](https://arxiv.org/abs/2305.15717)Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p1.1)\.
- T\. Guggenberger, L\. Lämmermann, N\. Urbach, A\. M\. Walter, and P\. Hofmann \(2023\)Task delegation from ai to humans: a principal\-agent perspective\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- N\. Guha, C\. M\. Lawrence, L\. A\. Gailmard, K\. T\. Rodolfa, F\. Surani, R\. Bommasani, I\. D\. Raji, M\. Cuéllar, C\. Honigsberg, P\. Liang, and D\. E\. Ho \(2024\)AI regulation has its own alignment problem: the technical and institutional feasibility of disclosure, registration, licensing, and auditing\.George Washington Law Review\.Note:forthcomingExternal Links:[Link](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4634443)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p1.1)\.
- N\. Guha, J\. Nyarko, D\. E\. Ho, C\. Ré, A\. Chilton, A\. K, A\. Chohlas\-Wood, A\. Peters, B\. Waldon, D\. N\. Rockmore, D\. Zambrano, D\. Talisman, E\. Hoque, F\. Surani, F\. Fagan, G\. Sarfaty, G\. M\. Dickinson, H\. Porat, J\. Hegland, J\. Wu, J\. Nudell, J\. Niklaus, J\. J\. Nay, J\. H\. Choi, K\. Tobia, M\. Hagan, M\. Ma, M\. A\. Livermore, N\. Rasumov\-Rahe, N\. Holzenberger, N\. Kolt, P\. Henderson, S\. Rehaag, S\. Goel, S\. Gao, S\. Williams, S\. Gandhi, T\. Zur, V\. Iyer, and Z\. Li \(2023\)LegalBench: A collaboratively built benchmark for measuring legal reasoning in large language models\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/89e44582fd28ddfea1ea4dcb0ebbf4b0-Abstract-Datasets%5C_and%5C_Benchmarks.html)Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px3.p1.1)\.
- C\. Gulcehre, T\. L\. Paine, S\. Srinivasan, K\. Konyushkova, L\. Weerts, A\. Sharma, A\. Siddhant, A\. Ahern, M\. Wang, C\. Gu, W\. Macherey, A\. Doucet, O\. Firat, and N\. de Freitas \(2023\)Reinforced self\-training \(rest\) for language modeling\.Vol\.abs/2308\.08998\.External Links:[Link](https://arxiv.org/abs/2308.08998)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p3.1)\.
- S\. Gunasekar, Y\. Zhang, J\. Aneja, C\. C\. T\. Mendes, A\. Del Giorno, S\. Gopi, M\. Javaheripi, P\. Kauffmann, G\. de Rosa, O\. Saarikivi,et al\.\(2023\)Textbooks are all you need\.arXiv preprint arXiv:2306\.11644\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p1.1)\.
- T\. Guo, X\. Chen, Y\. Wang, R\. Chang, S\. Pei, N\. V\. Chawla, O\. Wiest, and X\. Zhang \(2024\)Large language model based multi\-agents: a survey of progress and challenges\.InProceedings of the Thirty\-Third International Joint Conference on Artificial Intelligence,pp\. 8048–8057\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p3.1)\.
- X\. Guo and Y\. Chen \(2024\)Generative ai for synthetic data generation: methods, challenges and the future\.arXiv preprint arXiv:2403\.04190\.Note:Version 1 submitted on 7 Mar 2024External Links:[Link](https://doi.org/10.48550/arXiv.2403.04190),[Document](https://dx.doi.org/10.48550/arXiv.2403.04190)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- V\. Gupta, P\. N\. Venkit, H\. Laurençon, S\. Wilson, and R\. J\. Passonneau \(2023\)CALM: a multi\-task benchmark for comprehensive assessment of language model bias\.ArXiv preprintabs/2308\.12539\.External Links:[Link](https://arxiv.org/abs/2308.12539)Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- S\. Gururangan, D\. Card, S\. Dreier, E\. Gade, L\. Wang, Z\. Wang, L\. Zettlemoyer, and N\. A\. Smith \(2022\)Whose language counts as high quality? measuring language ideologies in text data selection\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,Y\. Goldberg, Z\. Kozareva, and Y\. Zhang \(Eds\.\),Abu Dhabi, United Arab Emirates,pp\. 2562–2580\.External Links:[Link](https://aclanthology.org/2022.emnlp-main.165/),[Document](https://dx.doi.org/10.18653/v1/2022.emnlp-main.165)Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px2.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- S\. Gururangan, A\. Marasović, S\. Swayamdipta, K\. Lo, I\. Beltagy, D\. Downey, and N\. A\. Smith \(2020\)Don’t stop pretraining: adapt language models to domains and tasks\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 8342–8360\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.740),[Link](https://aclanthology.org/2020.acl-main.740)Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p3.1)\.
- S\. Gutierrez \(2026\)Customer service trends & statistics for 2026: why consumers still trust humans over ai\.Survey Monkey\.External Links:[Link](https://www.surveymonkey.com/curiosity/customer-service-statistics/)Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p4.1)\.
- K\. Guu, A\. Webson, E\. Pavlick, L\. Dixon, I\. Tenney, and T\. Bolukbasi \(2023\)Simfluence: modeling the influence of individual training examples by simulating training runs\.Vol\.abs/2303\.08114\.External Links:[Link](https://arxiv.org/abs/2303.08114)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- T\. Haberle, C\. Cleveland, G\. L\. Snow, C\. Barber, N\. Stookey, C\. Thornock, L\. Younger, B\. Mullahkhel, and D\. Ize\-Ludlow \(2024\)The impact of nuance dax ambient listening ai documentation: a cohort study\.Journal of the American Medical Informatics Association31\(4\),pp\. 975–979\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px4.p1.1)\.
- T\. Hagendorff \(2020\)The ethics of ai ethics: an evaluation of guidelines\.Minds and machines30\(1\),pp\. 99–120\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1)\.
- J\. Haidt \(2012\)The righteous mind: why good people are divided by politics and religion\.Vintage\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1)\.
- A\. Halevy, P\. Norvig, and F\. Pereira \(2009\)The unreasonable effectiveness of data\.IEEE intelligent systems24\(2\),pp\. 8–12\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- P\. Hämäläinen, M\. Tavast, and A\. Kunnari \(2023\)Evaluating large language models in generating synthetic HCI research data: a case study\.InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023, Hamburg, Germany, April 23\-28, 2023,A\. Schmidt, K\. Väänänen, T\. Goyal, P\. O\. Kristensson, A\. Peters, S\. Mueller, J\. R\. Williamson, and M\. L\. Wilson \(Eds\.\),pp\. 433:1–433:19\.External Links:[Document](https://dx.doi.org/10.1145/3544548.3580688),[Link](https://doi.org/10.1145/3544548.3580688)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1)\.
- K\. Handa, D\. Bent, A\. Tamkin, M\. McCain, E\. Durmus, M\. Stern, M\. Schiraldi, S\. Huang, S\. Ritchie, S\. Syverud, K\. Jagadish, M\. Vo, M\. Bell, and D\. Ganguli \(2025a\)External Links:[Link](https://www.anthropic.com/news/anthropic-education-report-how-university-students-use-claude)Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- K\. Handa, A\. Tamkin, M\. McCain, S\. Huang, E\. Durmus, S\. Heck, J\. Mueller, J\. Hong, S\. Ritchie, T\. Belonax,et al\.\(2025b\)Which economic tasks are performed with ai? evidence from millions of claude conversations\.arXiv preprint arXiv:2503\.04761\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1)\.
- M\. Hanna and O\. Bojar \(2021\)A fine\-grained analysis of BERTScore\.InProceedings of the Sixth Conference on Machine Translation,L\. Barrault, O\. Bojar, F\. Bougares, R\. Chatterjee, M\. R\. Costa\-jussa, C\. Federmann, M\. Fishel, A\. Fraser, M\. Freitag, Y\. Graham, R\. Grundkiewicz, P\. Guzman, B\. Haddow, M\. Huck, A\. J\. Yepes, P\. Koehn, T\. Kocmi, A\. Martins, M\. Morishita, and C\. Monz \(Eds\.\),Online,pp\. 507–517\.External Links:[Link](https://aclanthology.org/2021.wmt-1.59/)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p2.1)\.
- M\. Hardt, E\. Price, and N\. Srebro \(2016\)Equality of opportunity in supervised learning\.InAdvances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5\-10, 2016, Barcelona, Spain,D\. D\. Lee, M\. Sugiyama, U\. von Luxburg, I\. Guyon, and R\. Garnett \(Eds\.\),pp\. 3315–3323\.External Links:[Link](https://proceedings.neurips.cc/paper/2016/hash/9d2682367c3935defcb1f9e247a97c0d-Abstract.html)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p4.1)\.
- C\. Harrington, S\. Erete, and A\. M\. Piper \(2019\)Deconstructing community\-based collaborative design: towards more equitable participatory design engagements\.Proceedings of the ACM on human\-computer interaction3\(CSCW\),pp\. 1–25\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p3.1)\.
- R\. W\. Harris \(2016\)How ict4d research fails the poor\.Information Technology for Development22\(1\),pp\. 177–192\.Cited by:[§4\.6\.2](https://arxiv.org/html/2605.06901#S4.SS6.SSS2.p3.1)\.
- T\. Hartvigsen, S\. Gabriel, H\. Palangi, M\. Sap, D\. Ray, and E\. Kamar \(2022\)ToxiGen: a large\-scale machine\-generated dataset for adversarial and implicit hate speech detection\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 3309–3326\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.234),[Link](https://aclanthology.org/2022.acl-long.234)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px2.p1.1)\.
- E\. Harvey, A\. Koenecke, and R\. F\. Kizilcec \(2025\)" Don’t forget the teachers": towards an educator\-centered understanding of harms from large language models in education\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–19\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- T\. B\. Hashimoto, H\. Zhang, and P\. Liang \(2019\)Unifying human and statistical evaluation for natural language generation\.InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 \(Long and Short Papers\),J\. Burstein, C\. Doran, and T\. Solorio \(Eds\.\),Minneapolis, Minnesota,pp\. 1689–1701\.External Links:[Document](https://dx.doi.org/10.18653/v1/N19-1169),[Link](https://aclanthology.org/N19-1169)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p4.1)\.
- M\. F\. Hassan, F\. K\. Khattak, and L\. Seyyed\-Kalantari \(2025\)Dialectic preference bias in large language models\.InProceedings of the AAAI Symposium Series,Vol\.5,pp\. 365–369\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- A\. Haupt and E\. Brynjolfsson \(2025\)Ai should not be an imitation game: centaur evaluations\.URL https://www\. andyhaupt\. com/assets/papers/Centaur Evaluations\. pdf\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1)\.
- S\. A\. Hayati, M\. Lee, D\. Rajagopal, and D\. Kang \(2024\)How far can we extract diverse perspectives from large language models?\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 5336–5366\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.306)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- G\. R\. Hayes \(2011\)The relationship of action research to human\-computer interaction\.ACM Transactions on Computer\-Human Interaction \(TOCHI\)18\(3\),pp\. 1–20\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p1.1)\.
- L\. He, M\. Xia, and P\. Henderson \(2024\)What is in your safe data? identifying benign data that breaks safety\.Vol\.abs/2404\.01099\.External Links:[Link](https://arxiv.org/abs/2404.01099)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- L\. Hebert, K\. Sayana, A\. Jash, A\. Karatzoglou, S\. Sodhi, S\. Doddapaneni, Y\. Cai, and D\. Kuzmin \(2024\)PERSOMA: personalized soft prompt adapter architecture for personalized language prompting\.Vol\.abs/2408\.00960\.External Links:[Link](https://arxiv.org/abs/2408.00960)Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p3.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- A\. Heidt \(2025\)Walking in two worlds: how an indigenous computer scientist is using ai to preserve threatened languages\.Nature641\(8062\),pp\. 548–550\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p2.1)\.
- W\. Held, D\. Hall, P\. Liang, and D\. Yang \(2025a\)Relative scaling laws for llms\.arXiv preprint arXiv:2510\.24626\.Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p1.1)\.
- W\. Held, C\. Harris, M\. Best, and D\. Yang \(2023\)A material lens on coloniality in nlp\.arXiv preprint arXiv:2311\.08391\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p1.1),[§4\.6](https://arxiv.org/html/2605.06901#S4.SS6.p1.1)\.
- W\. Held, B\. Paranjape, P\. S\. Koura, M\. Lewis, F\. Zhang, and T\. Mihaylov \(2025b\)Optimizing pretraining data mixtures with llm\-estimated utility\.arXiv preprint arXiv:2501\.11747\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1),[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- D\. Hendrycks, C\. Burns, S\. Basart, A\. Critch, J\. Li, D\. Song, and J\. Steinhardt \(2021a\)Aligning AI with shared human values\.In9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3\-7, 2021,External Links:[Link](https://openreview.net/forum?id=dNy%5C_RKzJacY)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p3.1)\.
- D\. Hendrycks, C\. Burns, S\. Basart, A\. Zou, M\. Mazeika, D\. Song, and J\. Steinhardt \(2021b\)Measuring massive multitask language understanding\.InInternational Conference on Learning Representations,Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1)\.
- D\. Hernandez, T\. Brown, T\. Conerly, N\. DasSarma, D\. Drain, S\. El\-Showk, N\. Elhage, Z\. Hatfield\-Dodds, T\. Henighan, T\. Hume,et al\.\(2022\)Scaling laws and interpretability of learning from repeated data\.arXiv preprint arXiv:2205\.10487\.Cited by:[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1),[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p2.1)\.
- D\. Hernandez, J\. Kaplan, T\. Henighan, and S\. McCandlish \(2021\)Scaling laws for transfer\.ArXiv preprintabs/2102\.01293\.External Links:[Link](https://arxiv.org/abs/2102.01293)Cited by:[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p4.1)\.
- J\. Hoffmann, S\. Borgeaud, A\. Mensch, E\. Buchatskaya, T\. Cai, E\. Rutherford, D\. d\. L\. Casas, L\. A\. Hendricks, J\. Welbl, A\. Clark,et al\.\(2022\)Training compute\-optimal large language models\.ArXiv preprintabs/2203\.15556\.External Links:[Link](https://arxiv.org/abs/2203.15556)Cited by:[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p2.1),[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p4.3),[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- V\. Hofmann, P\. R\. Kalluri, D\. Jurafsky, and S\. King \(2024a\)AI generates covertly racist decisions about people based on their dialect\.Nature,pp\. 1–8\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p2.1)\.
- V\. Hofmann, P\. R\. Kalluri, D\. Jurafsky, and S\. King \(2024b\)Dialect prejudice predicts ai decisions about people’s character, employability, and criminality\.External Links:2403\.00742,[Link](https://arxiv.org/abs/2403.00742)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.p1.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- S\. Hoory, A\. Feder, A\. Tendler, S\. Erell, A\. Peled\-Cohen, I\. Laish, H\. Nakhost, U\. Stemmer, A\. Benjamini, A\. Hassidim, and Y\. Matias \(2021\)Learning and evaluating a differentially private pre\-trained language model\.InFindings of the Association for Computational Linguistics: EMNLP 2021,M\. Moens, X\. Huang, L\. Specia, and S\. W\. Yih \(Eds\.\),Punta Cana, Dominican Republic,pp\. 1178–1189\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.findings-emnlp.102),[Link](https://aclanthology.org/2021.findings-emnlp.102)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- J\. Hu, W\. Xia, X\. Zhang, C\. Fu, W\. Wu, Z\. Huan, A\. Li, Z\. Tang, and J\. Zhou \(2024a\)Enhancing sequential recommendation via llm\-based semantic embedding learning\.InCompanion Proceedings of the ACM Web Conference 2024,pp\. 103–111\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- T\. Hu and X\. Zhou \(2024\)Unveiling llm evaluation focused on metrics: challenges and solutions\.External Links:2404\.09135,[Link](https://arxiv.org/abs/2404.09135)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p1.1)\.
- T\. Hu, Y\. Kyrychenko, S\. Rathje, N\. Collier, S\. van der Linden, and J\. Roozenbeek \(2024b\)Generative language models exhibit social identity biases\.External Links:2310\.15819,[Link](https://arxiv.org/abs/2310.15819)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p1.1)\.
- X\. Hu, M\. Gao, S\. Hu, Y\. Zhang, Y\. Chen, T\. Xu, and X\. Wan \(2024c\)Are llm\-based evaluators confusing nlg quality criteria?\.Vol\.abs/2402\.12055\.External Links:[Link](https://arxiv.org/abs/2402.12055)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- D\. Huang, D\. G\. Markovitch, and R\. A\. Stough \(2024a\)Can chatbot customer service match human service agents on customer satisfaction? an investigation in the role of trust\.Journal of Retailing and Consumer Services76,pp\. 103600\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p4.1)\.
- H\. Huang, T\. Tang, D\. Zhang, X\. Zhao, T\. Song, Y\. Xia, and F\. Wei \(2023a\)Not all languages are created equal in LLMs: improving multilingual capability by cross\-lingual\-thought prompting\.InFindings of the Association for Computational Linguistics: EMNLP 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 12365–12394\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.826),[Link](https://aclanthology.org/2023.findings-emnlp.826)Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- J\. Huang, W\. Wang, E\. J\. Li, M\. H\. Lam, S\. Ren, Y\. Yuan, W\. Jiao, Z\. Tu, and M\. Lyu \(2023b\)On the humanity of conversational ai: evaluating the psychological portrayal of llms\.InThe Twelfth International Conference on Learning Representations,Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- L\. Huang, S\. Ma, D\. Zhang, F\. Wei, and H\. Wang \(2022\)Zero\-shot cross\-lingual transfer of prompt\-based tuning with a unified multilingual prompt\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,pp\. 11488–11497\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- Q\. Huang, X\. Liu, T\. Ko, B\. Wu, W\. Wang, Y\. Zhang, and L\. Tang \(2024b\)Selective prompting tuning for personalized conversations with llms\.Vol\.abs/2406\.18187\.External Links:[Link](https://arxiv.org/abs/2406.18187)Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- R\. Huang, M\. Li, D\. Yang, J\. Shi, X\. Chang, Z\. Ye, Y\. Wu, Z\. Hong, J\. Huang, J\. Liu, Y\. Ren, Y\. Zou, Z\. Zhao, and S\. Watanabe \(2024c\)AudioGPT: understanding and generating speech, music, sound, and talking head\.InThirty\-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty\-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20\-27, 2024, Vancouver, Canada,M\. J\. Wooldridge, J\. G\. Dy, and S\. Natarajan \(Eds\.\),pp\. 23802–23804\.External Links:[Document](https://dx.doi.org/10.1609/AAAI.V38I21.30570),[Link](https://doi.org/10.1609/aaai.v38i21.30570)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- S\. Huang, E\. Durmus, M\. McCain, K\. Handa, A\. Tamkin, J\. Hong, M\. Stern, A\. Somani, X\. Zhang, and D\. Ganguli \(2025\)Values in the wild: discovering and analyzing values in real\-world language model interactions\.arXiv preprint arXiv:2504\.15236\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p1.1)\.
- T\. Huang, S\. Hu, F\. Ilhan, S\. F\. Tekin, and L\. Liu \(2024d\)Harmful fine\-tuning attacks and defenses for large language models: a survey\.Vol\.abs/2409\.18169\.External Links:[Link](https://arxiv.org/abs/2409.18169)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- X\. Huang, W\. Ruan, W\. Huang, G\. Jin, Y\. Dong, C\. Wu, S\. Bensalem, R\. Mu, Y\. Qi, X\. Zhao, K\. Cai, Y\. Zhang, S\. Wu, P\. Xu, D\. Wu, A\. Freitas, and M\. A\. Mustafa \(2023c\)A survey of safety and trustworthiness of large language models through the lens of verification and validation\.External Links:2305\.11391,[Link](https://arxiv.org/abs/2305.11391)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.p1.1)\.
- Y\. Huang, S\. Gupta, M\. Xia, K\. Li, and D\. Chen \(2023d\)Catastrophic jailbreak of open\-source llms via exploiting generation\.Vol\.abs/2310\.06987\.External Links:[Link](https://arxiv.org/abs/2310.06987)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p5.1)\.
- E\. Hubinger, C\. Denison, J\. Mu, M\. Lambert, M\. Tong, M\. MacDiarmid, T\. Lanham, D\. M\. Ziegler, T\. Maxwell, N\. Cheng, A\. Jermyn, A\. Askell, A\. Radhakrishnan, C\. Anil, D\. Duvenaud, D\. Ganguli, F\. Barez, J\. Clark, K\. Ndousse, K\. Sachan, M\. Sellitto, M\. Sharma, N\. DasSarma, R\. Grosse, S\. Kravec, Y\. Bai, Z\. Witten, M\. Favaro, J\. Brauner, H\. Karnofsky, P\. Christiano, S\. R\. Bowman, L\. Graham, J\. Kaplan, S\. Mindermann, R\. Greenblatt, B\. Shlegeris, N\. Schiefer, and E\. Perez \(2024\)Sleeper agents: training deceptive llms that persist through safety training\.Vol\.abs/2401\.05566\.External Links:[Link](https://arxiv.org/abs/2401.05566)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- M\. Huh, B\. Cheung, T\. Wang, and P\. Isola \(2024\)Position: the platonic representation hypothesis\.InForty\-first International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=BH8TYy0r6u)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- J\. Huynh, J\. Bigham, and M\. Eskenazi \(2021\)A survey of nlp\-related crowdsourcing hits: what works and what does not\.External Links:2111\.05241,[Link](https://arxiv.org/abs/2111.05241)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p4.1)\.
- L\. Ibrahim, S\. Huang, L\. Ahmad, and M\. Anderljung \(2024\)Beyond static ai evaluations: advancing human interaction evaluations for llm harms and risks\.ArXiv preprintabs/2405\.10632\.External Links:[Link](https://arxiv.org/abs/2405.10632)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- G\. Ilharco, M\. T\. Ribeiro, M\. Wortsman, L\. Schmidt, H\. Hajishirzi, and A\. Farhadi \(2023\)Editing models with task arithmetic\.InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\-5, 2023,External Links:[Link](https://openreview.net/pdf?id=6t0Kwf8-jrj)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- H\. Inan, K\. Upasani, J\. Chi, R\. Rungta, K\. Iyer, Y\. Mao, M\. Tontchev, Q\. Hu, B\. Fuller, D\. Testuggine,et al\.\(2023\)Llama guard: llm\-based input\-output safeguard for human\-ai conversations\.arXiv preprint arXiv:2312\.06674\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- Interaction Design Foundation \(2025\)External Links:[Link](https://www.interaction-design.org/literature/topics/human-centered-design)Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.p1.1)\.
- P\. G\. Ipeirotis, F\. Provost, and J\. Wang \(2010\)Quality management on amazon mechanical turk\.InProceedings of the ACM SIGKDD Workshop on Human Computation,HCOMP ’10,New York, NY, USA,pp\. 64–67\.External Links:ISBN 9781450302227,[Link](https://doi.org/10.1145/1837885.1837906),[Document](https://dx.doi.org/10.1145/1837885.1837906)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p3.1)\.
- L\. Irani, J\. Vertesi, P\. Dourish, K\. Philip, and R\. E\. Grinter \(2010\)Postcolonial computing: a lens on design and development\.InProceedings of the SIGCHI conference on human factors in computing systems,pp\. 1311–1320\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- M\. Ivgi, Y\. Carmon, and J\. Berant \(2022\)Scaling laws under the microscope: predicting transformer performance from small scale experiments\.InFindings of the Association for Computational Linguistics: EMNLP 2022,Y\. Goldberg, Z\. Kozareva, and Y\. Zhang \(Eds\.\),Abu Dhabi, United Arab Emirates,pp\. 7354–7371\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.findings-emnlp.544),[Link](https://aclanthology.org/2022.findings-emnlp.544)Cited by:[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p3.1)\.
- S\. Jain, E\. S\. Lubana, K\. Oksuz, T\. Joy, P\. Torr, A\. Sanyal, and P\. K\. Dokania \(2024\)What makes and breaks safety fine\-tuning? a mechanistic study\.InICML 2024 Workshop on Mechanistic Interpretability,External Links:[Link](https://openreview.net/forum?id=BS2CbUkJpy)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- J\. Jang, S\. Kim, B\. Y\. Lin, Y\. Wang, J\. Hessel, L\. Zettlemoyer, H\. Hajishirzi, Y\. Choi, and P\. Ammanabrolu \(2023\)Personalized soups: personalized large language model alignment via post\-hoc parameter merging\.ArXiv preprintabs/2310\.11564\.External Links:[Link](https://arxiv.org/abs/2310.11564)Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1)\.
- M\. Jegorova, C\. Kaul, C\. Mayor, A\. Q\. O’Neil, A\. Weir, R\. Murray\-Smith, and S\. A\. Tsaftaris \(2022\)Survey: leakage and privacy at inference time\.IEEE Transactions on Pattern Analysis and Machine Intelligence45\(7\),pp\. 9090–9108\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- J\. Ji, M\. Liu, J\. Dai, X\. Pan, C\. Zhang, C\. Bian, B\. Chen, R\. Sun, Y\. Wang, and Y\. Yang \(2023\)BeaverTails: towards improved safety alignment of llm via a human\-preference dataset\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 24678–24704\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2023/file/4dbb61cb68671edc4ca3712d70083b9f-Paper-Datasets_and_Benchmarks.pdf)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p2.1)\.
- Q\. Jia, S\. Ren, Y\. Liu, and K\. Zhu \(2023\)Zero\-shot faithfulness evaluation for text summarization with foundation language model\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 11017–11031\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.679),[Link](https://aclanthology.org/2023.emnlp-main.679)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- P\. Jiang, J\. Rayan, S\. P\. Dow, and H\. Xia \(2023\)Graphologue: exploring large language model responses with interactive diagrams\.InProceedings of the 36th annual ACM symposium on user interface software and technology,pp\. 1–20\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p1.1)\.
- C\. E\. Jimenez, J\. Yang, A\. Wettig, S\. Yao, K\. Pei, O\. Press, and K\. R\. Narasimhan \(2024\)SWE\-bench: can language models resolve real\-world github issues?\.InThe Twelfth International Conference on Learning Representations,Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1),[§7](https://arxiv.org/html/2605.06901#S7.p1.1)\.
- Q\. Jin, B\. Dhingra, Z\. Liu, W\. Cohen, and X\. Lu \(2019\)PubMedQA: a dataset for biomedical research question answering\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 2567–2577\.External Links:[Document](https://dx.doi.org/10.18653/v1/D19-1259),[Link](https://aclanthology.org/D19-1259)Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1)\.
- A\. Jobin, M\. Ienca, and E\. Vayena \(2019\)The global landscape of ai ethics guidelines\.Nature machine intelligence1\(9\),pp\. 389–399\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1)\.
- C\. Johnson, T\. Yenamandra, J\. Ciskey, T\. Ambrogi, and L\. Olmedo Gunio \(2025\)The ai productivity paradox in software development: a meta\-analysis of implementation outcomes and roi patterns \(2024\-2025\)\.Available at SSRN 5902163\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- K\. S\. Jones and J\. R\. Galliers \(1995\)The framework: scope and concepts\.InEvaluating Natural Language Processing Systems: An Analysis and Review,K\. S\. Jones and J\. R\. Galliers \(Eds\.\),pp\. 3–63\.External Links:[Document](https://dx.doi.org/10.1007/BFb0027472),ISBN 978\-3\-540\-68452\-7,[Link](https://doi.org/10.1007/BFb0027472)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- M\. Jörke, D\. Genç, V\. Teutschbein, S\. Sapkota, S\. Chung, P\. Schmiedmayer, M\. I\. Campero, A\. C\. King, E\. Brunskill, and J\. A\. Landay \(2026\)Bloom: designing for llm\-augmented behavior change interactions\.InProceedings of the 2026 CHI Conference on Human Factors in Computing Systems,CHI ’26,New York, NY, USA\.Cited by:[§2\.4\.1](https://arxiv.org/html/2605.06901#S2.SS4.SSS1.p1.1)\.
- M\. Jörke, S\. Sapkota, L\. Warkenthien, N\. Vainio, P\. Schmiedmayer, E\. Brunskill, and J\. A\. Landay \(2025\)GPTCoach: towards llm\-based physical activity coaching\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,CHI ’25,New York, NY, USA\.External Links:ISBN 9798400713941,[Link](https://doi.org/10.1145/3706598.3713819),[Document](https://dx.doi.org/10.1145/3706598.3713819)Cited by:[1st item](https://arxiv.org/html/2605.06901#S2.I2.i1.p1.1),[§2\.4\.1](https://arxiv.org/html/2605.06901#S2.SS4.SSS1.p1.1)\.
- J\. Joselowitz, R\. Majumdar, A\. Jagota, M\. Bou, N\. Patel, S\. Krishna, and S\. Parbhoo \(2025\)Insights from the inverse: reconstructing llm training goals through inverse reinforcement learning\.InSecond Conference on Language Modeling,Cited by:[§6](https://arxiv.org/html/2605.06901#S6.p2.1)\.
- A\. Joshi, R\. Dabre, D\. Kanojia, Z\. Li, H\. Zhan, G\. Haffari, and D\. Dippold \(2025\)Natural language processing for dialects of a language: a survey\.ACM Computing Surveys57\(6\),pp\. 1–37\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1)\.
- P\. Joshi, S\. Santy, A\. Budhiraja, K\. Bali, and M\. Choudhury \(2020\)The state and fate of linguistic diversity and inclusion in the NLP world\.InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics,D\. Jurafsky, J\. Chai, N\. Schluter, and J\. Tetreault \(Eds\.\),Online,pp\. 6282–6293\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.acl-main.560),[Link](https://aclanthology.org/2020.acl-main.560)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- A\. Joulin, E\. Grave, P\. Bojanowski, and T\. Mikolov \(2017\)Bag of tricks for efficient text classification\.InProceedings of the 15th conference of the European chapter of the association for computational linguistics: volume 2, short papers,pp\. 427–431\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- P\. Kalluri \(2020\)Don’t ask if artificial intelligence is good or fair, ask how it shifts power\.Nature583\(7815\),pp\. 169–169\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p3.1)\.
- N\. Kandpal, E\. Wallace, and C\. Raffel \(2022\)Deduplicating training data mitigates privacy risks in language models\.InInternational Conference on Machine Learning, ICML 2022, 17\-23 July 2022, Baltimore, Maryland, USA,K\. Chaudhuri, S\. Jegelka, L\. Song, C\. Szepesvári, G\. Niu, and S\. Sabato \(Eds\.\),Proceedings of Machine Learning Research, Vol\.162,pp\. 10697–10707\.External Links:[Link](https://proceedings.mlr.press/v162/kandpal22a.html)Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1),[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- A\. Kantharuban, I\. Vulić, and A\. Korhonen \(2023\)Quantifying the dialect gap and its correlates across languages\.InFindings of the Association for Computational Linguistics: EMNLP 2023,pp\. 7226–7245\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1)\.
- J\. Kaplan, S\. McCandlish, T\. Henighan, T\. B\. Brown, B\. Chess, R\. Child, S\. Gray, A\. Radford, J\. Wu, and D\. Amodei \(2020\)Scaling laws for neural language models\.ArXiv preprintabs/2001\.08361\.External Links:[Link](https://arxiv.org/abs/2001.08361)Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1),[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p1.5),[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p2.1),[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p4.3),[§4\.3](https://arxiv.org/html/2605.06901#S4.SS3.p1.3),[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- A\. Karamolegkou, J\. Li, L\. Zhou, and A\. Søgaard \(2023\)Copyright violations and large language models\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 7403–7412\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.458),[Link](https://aclanthology.org/2023.emnlp-main.458)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1),[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- S\. Karny, A\. Baez, and P\. Pataranutaporn \(2025\)Neural transparency: mechanistic interpretability interfaces for anticipating model behaviors for personalized ai\.arXiv preprint arXiv:2511\.00230\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p2.1)\.
- S\. Karny, L\. W\. Mayer, J\. Ayoub, M\. Song, H\. Su, D\. Tian, E\. Moradi\-Pari, and M\. Steyvers \(2024\)Learning with ai assistance: a path to better task performance or dependence?\.InProceedings of the ACM collective intelligence conference,pp\. 10–17\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- D\. M\. Katz, M\. J\. Bommarito, S\. Gao, and P\. Arredondo \(2024\)Gpt\-4 passes the bar exam\.Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences382\(2270\)\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- P\. Ke, B\. Wen, A\. Feng, X\. Liu, X\. Lei, J\. Cheng, S\. Wang, A\. Zeng, Y\. Dong, H\. Wang, J\. Tang, and M\. Huang \(2024\)CritiqueLLM: towards an informative critique generation model for evaluation of large language model generation\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 13034–13054\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.704),[Link](https://aclanthology.org/2024.acl-long.704)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- M\. Khalifa, D\. Wadden, E\. Strubell, H\. Lee, L\. Wang, I\. Beltagy, and H\. Peng \(2024\)Source\-aware training enables knowledge attribution in language models\.Vol\.abs/2404\.01019\.External Links:[Link](https://arxiv.org/abs/2404.01019)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- M\. Khan and A\. Hanna \(2022\)The subjects and stages of ai dataset development: a framework for dataset accountability\.Ohio St\. Tech\. LJ19,pp\. 171\.Cited by:[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1)\.
- D\. Khashabi, S\. Min, T\. Khot, A\. Sabharwal, O\. Tafjord, P\. Clark, and H\. Hajishirzi \(2020\)UNIFIEDQA: crossing format boundaries with a single qa system\.InFindings of the Association for Computational Linguistics: EMNLP 2020,pp\. 1896–1907\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- O\. Khattab, A\. Singhvi, P\. Maheshwari, Z\. Zhang, K\. Santhanam, S\. V\. A, S\. Haq, A\. Sharma, T\. T\. Joshi, H\. Moazam, H\. Miller, M\. Zaharia, and C\. Potts \(2024\)DSPy: compiling declarative language model calls into state\-of\-the\-art pipelines\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=sY5N0zY5Od)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1)\.
- D\. Kiela, M\. Bartolo, Y\. Nie, D\. Kaushik, A\. Geiger, Z\. Wu, B\. Vidgen, G\. Prasad, A\. Singh, P\. Ringshia, Z\. Ma, T\. Thrush, S\. Riedel, Z\. Waseem, P\. Stenetorp, R\. Jia, M\. Bansal, C\. Potts, and A\. Williams \(2021\)Dynabench: rethinking benchmarking in NLP\.InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,K\. Toutanova, A\. Rumshisky, L\. Zettlemoyer, D\. Hakkani\-Tur, I\. Beltagy, S\. Bethard, R\. Cotterell, T\. Chakraborty, and Y\. Zhou \(Eds\.\),Online,pp\. 4110–4124\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.naacl-main.324),[Link](https://aclanthology.org/2021.naacl-main.324)Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px3.p1.1)\.
- N\. Kilbertus, M\. Rojas\-Carulla, G\. Parascandolo, M\. Hardt, D\. Janzing, and B\. Schölkopf \(2018\)Avoiding discrimination through causal reasoning\.External Links:1706\.02744,[Link](https://arxiv.org/abs/1706.02744)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p4.1)\.
- S\. S\. Kim, J\. W\. Vaughan, Q\. V\. Liao, T\. Lombrozo, and O\. Russakovsky \(2025\)Fostering appropriate reliance on large language models: the role of explanations, sources, and inconsistencies\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–19\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- N\. M\. Kirch, S\. Field, and S\. Casper \(2024\)What features in prompts jailbreak llms? investigating the mechanisms behind attacks\.Vol\.abs/2411\.03343\.External Links:[Link](https://arxiv.org/abs/2411.03343)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- H\. Kirk, A\. Bean, B\. Vidgen, P\. Rottger, and S\. Hale \(2023\)The past, present and better future of feedback learning in large language models for subjective human preferences and values\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 2409–2430\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.148),[Link](https://aclanthology.org/2023.emnlp-main.148)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.p1.1)\.
- H\. R\. Kirk, H\. Davidson, E\. Saunders, L\. Luettgau, B\. Vidgen, S\. A\. Hale, and C\. Summerfield \(2025a\)Neural steering vectors reveal dose and exposure\-dependent impacts of human\-ai relationships\.arXiv preprint arXiv:2512\.01991\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px1.p1.1)\.
- H\. R\. Kirk, I\. Gabriel, C\. Summerfield, B\. Vidgen, and S\. A\. Hale \(2025b\)Why human–ai relationships need socioaffective alignment\.Humanities and Social Sciences Communications12\(1\),pp\. 1–9\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- H\. R\. Kirk, A\. Whitefield, P\. Röttger, A\. M\. Bean, K\. Margatina, R\. Mosquera, J\. M\. Ciro, M\. Bartolo, A\. Williams, H\. He, B\. Vidgen, and S\. A\. Hale \(2024\)The PRISM alignment dataset: what participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models\.InThe Thirty\-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track,External Links:[Link](https://openreview.net/forum?id=DFr5hteojx)Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1),[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p4.1),[§4\.4\.2](https://arxiv.org/html/2605.06901#S4.SS4.SSS2.p2.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1)\.
- R\. Konrad, N\. Padmanaban, J\. G\. Buckmaster, K\. C\. Boyle, and G\. Wetzstein \(2024\)Gazegpt: augmenting human capabilities using gaze\-contingent contextual ai for smart eyewear\.ArXiv preprintabs/2401\.17217\.External Links:[Link](https://arxiv.org/abs/2401.17217)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- A\. Köpf, Y\. Kilcher, D\. Von Rütte, S\. Anagnostidis, Z\. R\. Tam, K\. Stevens, A\. Barhoum, D\. Nguyen, O\. Stanley, R\. Nagyfi,et al\.\(2023\)Openassistant conversations\-democratizing large language model alignment\.Advances in neural information processing systems36,pp\. 47669–47681\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p3.1)\.
- R\. Korom, S\. Kiptinness, N\. Adan, K\. Said, C\. Ithuli, O\. Rotich, B\. Kimani, I\. King’ori, S\. Kamau, E\. Atemba,et al\.\(2025\)AI\-based clinical decision support for primary care: a real\-world study\.arXiv preprint arXiv:2507\.16947\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px4.p1.1),[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p2.1)\.
- J\. Kreutzer, I\. Caswell, L\. Wang, A\. Wahab, D\. Van Esch, N\. Ulzii\-Orshikh, A\. Tapo, N\. Subramani, A\. Sokolov, C\. Sikasote,et al\.\(2022\)Quality at a glance: an audit of web\-crawled multilingual datasets\.Transactions of the Association for Computational Linguistics10,pp\. 50–72\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- N\. Kshetri \(2023\)Cybercrime and privacy threats of large language models\.IT Professional25\(3\),pp\. 9–13\.Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1)\.
- S\. Kudugunta, I\. Caswell, B\. Zhang, X\. Garcia, D\. Xin, A\. Kusupati, R\. Stella, A\. Bapna, and O\. Firat \(2023\)Madlad\-400: a multilingual and document\-level large audited dataset\.Advances in Neural Information Processing Systems36,pp\. 67284–67296\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- P\. Kung and N\. Peng \(2023\)Do models really learn to follow instructions? an empirical study of instruction tuning\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 2: Short Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 1317–1328\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-short.113),[Link](https://aclanthology.org/2023.acl-short.113)Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p1.1)\.
- B\. Kurdi, A\. E\. Seitchik, J\. R\. Axt, T\. J\. Carroll, A\. Karapetyan, N\. Kaushik, D\. Tomezsko, A\. G\. Greenwald, and M\. R\. Banaji \(2019\)Relationship between the implicit association test and intergroup behavior: a meta\-analysis\.\.American psychologist74\(5\),pp\. 569\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p1.1)\.
- M\. J\. Kusner, J\. R\. Loftus, C\. Russell, and R\. Silva \(2017\)Counterfactual fairness\.InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4\-9, 2017, Long Beach, CA, USA,I\. Guyon, U\. von Luxburg, S\. Bengio, H\. M\. Wallach, R\. Fergus, S\. V\. N\. Vishwanathan, and R\. Garnett \(Eds\.\),pp\. 4066–4076\.External Links:[Link](https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p4.1)\.
- L\. Laestadius, A\. Bishop, M\. Gonzalez, D\. Illenčík, and C\. Campos\-Castillo \(2024\)Too human and not human enough: a grounded theory analysis of mental health harms from emotional dependence on the social chatbot replika\.New Media & Society26\(10\),pp\. 5923–5941\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- V\. Lai, C\. Nguyen, N\. Ngo, T\. Nguyen, F\. Dernoncourt, R\. Rossi, and T\. Nguyen \(2023\)Okapi: instruction\-tuned large language models in multiple languages with reinforcement learning from human feedback\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,Y\. Feng and E\. Lefever \(Eds\.\),Singapore,pp\. 318–327\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-demo.28),[Link](https://aclanthology.org/2023.emnlp-demo.28)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p3.1),[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- T\. Lake, E\. Choi, and G\. Durrett \(2024\)From distributional to overton pluralism: investigating large language model alignment\.InPluralistic Alignment Workshop at NeurIPS 2024,External Links:[Link](https://openreview.net/forum?id=roe8GMahZL)Cited by:[item 1](https://arxiv.org/html/2605.06901#S4.I1.i1.p1.1),[item 3](https://arxiv.org/html/2605.06901#S4.I1.i3.p1.1)\.
- N\. Lambert, V\. Pyatkin, J\. Morrison, L\. J\. V\. Miranda, B\. Y\. Lin, K\. Chandu, N\. Dziri, S\. Kumar, T\. Zick, Y\. Choi,et al\.\(2025\)Rewardbench: evaluating reward models for language modeling\.InFindings of the Association for Computational Linguistics: NAACL 2025,pp\. 1755–1797\.Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p4.1),[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- M\. Lan, P\. Torr, A\. Meek, A\. Khakzar, D\. Krueger, and F\. Barez \(2024\)Sparse autoencoders reveal universal feature spaces across large language models\.ArXiv preprintabs/2410\.06981\.External Links:[Link](https://arxiv.org/abs/2410.06981)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- T\. Lanham, A\. Chen, A\. Radhakrishnan, B\. Steiner, C\. Denison, D\. Hernandez, D\. Li, E\. Durmus, E\. Hubinger, J\. Kernion, K\. Lukošiūtė, K\. Nguyen, N\. Cheng, N\. Joseph, N\. Schiefer, O\. Rausch, R\. Larson, S\. McCandlish, S\. Kundu, S\. Kadavath, S\. Yang, T\. Henighan, T\. Maxwell, T\. Telleen\-Lawton, T\. Hume, Z\. Hatfield\-Dodds, J\. Kaplan, J\. Brauner, S\. R\. Bowman, and E\. Perez \(2023\)Measuring faithfulness in chain\-of\-thought reasoning\.Vol\.abs/2307\.13702\.External Links:[Link](https://arxiv.org/abs/2307.13702)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px1.p1.1),[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p3.1)\.
- H\. Laurençon, L\. Saulnier, T\. Wang, C\. Akiki, A\. Villanova del Moral, T\. Le Scao, L\. Von Werra, C\. Mou, E\. González Ponferrada, H\. Nguyen,et al\.\(2022\)The bigscience roots corpus: a 1\.6 tb composite multilingual dataset\.Advances in Neural Information Processing Systems35,pp\. 31809–31826\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- H\. R\. Lawrence, R\. A\. Schneider, S\. B\. Rubin, M\. J\. Matarić, D\. J\. McDuff, and M\. J\. Bell \(2024\)The opportunities and risks of large language models in mental health\.JMIR Mental Health11\(1\),pp\. e59479\.Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- C\. A\. Le Dantec and S\. Fox \(2015\)Strangers at the gate: gaining access, building rapport, and co\-constructing community\-based research\.InProceedings of the 18th ACM conference on computer supported cooperative work & social computing,pp\. 1348–1358\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p3.1)\.
- H\. Lee, S\. Phatale, H\. Mansoor, T\. Mesnard, J\. Ferret, K\. Lu, C\. Bishop, E\. Hall, V\. Carbune, A\. Rastogi,et al\.\(2024\)RLAIF vs\. rlhf: scaling reinforcement learning from human feedback with ai feedback\.InProceedings of the 41st International Conference on Machine Learning,pp\. 26874–26901\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1)\.
- K\. Lee, D\. Ippolito, A\. Nystrom, C\. Zhang, D\. Eck, C\. Callison\-Burch, and N\. Carlini \(2022\)Deduplicating training data makes language models better\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 8424–8445\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.577),[Link](https://aclanthology.org/2022.acl-long.577)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- M\. K\. Lee, D\. Kusbit, A\. Kahng, J\. T\. Kim, X\. Yuan, A\. Chan, D\. See, R\. Noothigattu, S\. Lee, A\. Psomas,et al\.\(2019\)WeBuildAI: participatory framework for algorithmic governance\.Proceedings of the ACM on human\-computer interaction3\(CSCW\),pp\. 1–35\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p2.1)\.
- Y\. Lei, T\. Wang, J\. Lian, Z\. Hu, D\. Lian, and X\. Xie \(2026\)HumanLLM: towards personalized understanding and simulation of human nature\.arXiv preprint arXiv:2601\.15793\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1)\.
- J\. Leike, D\. Krueger, T\. Everitt, M\. Martic, V\. Maini, and S\. Legg \(2018\)Scalable agent alignment via reward modeling: a research direction\.Vol\.abs/1811\.07871\.External Links:[Link](https://arxiv.org/abs/1811.07871)Cited by:[§4\.2](https://arxiv.org/html/2605.06901#S4.SS2.p1.1)\.
- S\. Levy, W\. Adler, T\. S\. Karver, M\. Dredze, and M\. R\. Kaufman \(2024\)Gender bias in decision\-making with large language models: a study of relationship conflicts\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 5777–5800\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p2.1)\.
- B\. Li, Y\. Zhang, L\. Chen, J\. Wang, F\. Pu, J\. A\. Cahyono, J\. Yang, C\. Li, and Z\. Liu \(2025a\)Otter: a multi\-modal model with in\-context instruction tuning\.IEEE Transactions on Pattern Analysis and Machine Intelligence\.Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1),[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- C\. Li, M\. Chen, J\. Wang, S\. Sitaram, and X\. Xie \(2024a\)CultureLLM: incorporating cultural differences into large language models\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=sIsbOkQmBL)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p7.1)\.
- C\. Li, Z\. Gan, Z\. Yang, J\. Yang, L\. Li, L\. Wang, J\. Gao,et al\.\(2024b\)Multimodal foundation models: from specialists to general\-purpose assistants\.Foundations and Trends® in Computer Graphics and Vision16\(1\-2\),pp\. 1–214\.Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- D\. Li, A\. S\. Rawat, M\. Zaheer, X\. Wang, M\. Lukasik, A\. Veit, F\. Yu, and S\. Kumar \(2023a\)Large language models with controllable working memory\.InFindings of the Association for Computational Linguistics: ACL 2023,A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 1774–1793\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-acl.112),[Link](https://aclanthology.org/2023.findings-acl.112)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- H\. Li, J\. Chen, J\. Yang, Q\. Ai, W\. Jia, Y\. Liu, K\. Lin, Y\. Wu, G\. Yuan, Y\. Hu,et al\.\(2025b\)Legalagentbench: evaluating llm agents in legal domain\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 2322–2344\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px3.p1.1)\.
- H\. Li, Q\. Dong, J\. Chen, H\. Su, Y\. Zhou, Q\. Ai, Z\. Ye, and Y\. Liu \(2024c\)LLMs\-as\-judges: a comprehensive survey on llm\-based evaluation methods\.External Links:2412\.05579,[Link](https://arxiv.org/abs/2412.05579)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- J\. Li, A\. Fang, G\. Smyrnis, M\. Ivgi, M\. Jordan, S\. Y\. Gadre, H\. Bansal, E\. Guha, S\. S\. Keh, K\. Arora,et al\.\(2024d\)Datacomp\-lm: in search of the next generation of training sets for language models\.Advances in Neural Information Processing Systems37,pp\. 14200–14282\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- J\. Li, D\. Li, S\. Savarese, and S\. C\. H\. Hoi \(2023b\)BLIP\-2: bootstrapping language\-image pre\-training with frozen image encoders and large language models\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 19730–19742\.External Links:[Link](https://proceedings.mlr.press/v202/li23q.html)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- J\. Li, C\. Peris, N\. Mehrabi, P\. Goyal, K\. Chang, A\. Galstyan, R\. Zemel, and R\. Gupta \(2024e\)The steerability of large language models toward data\-driven personas\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 7290–7305\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p3.1)\.
- K\. Li, Y\. Chen, F\. Viégas, and M\. Wattenberg \(2025c\)When bad data leads to good models\.InForty\-second International Conference on Machine Learning,Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- K\. Li, O\. Patel, F\. B\. Viégas, H\. Pfister, and M\. Wattenberg \(2023c\)Inference\-time intervention: eliciting truthful answers from a language model\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/81b8390039b7302c909cb769f8b6cd93-Abstract-Conference.html)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- M\. Li, W\. Barr Held, M\. J\. Ryan, K\. Pipatanakul, P\. Manakul, H\. Zhu, and D\. Yang \(2025d\)Mind the gap\! static and interactive evaluations of large audio models\.arXiv e\-prints,pp\. arXiv–2502\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px2.p1.1)\.
- S\. Li, R\. Lin, and S\. Pei \(2024f\)Multi\-modal preference alignment remedies degradation of visual instruction tuning on language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 14188–14200\.External Links:[Link](https://aclanthology.org/2024.acl-long.765/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.765)Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- S\. Li, T\. Sun, Q\. Cheng, and X\. Qiu \(2024g\)Agent alignment in evolving social norms\.External Links:2401\.04620,[Link](https://arxiv.org/abs/2401.04620)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p9.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- T\. Li, X\. Zhang, C\. Du, T\. Pang, Q\. Liu, Q\. Guo, C\. Shen, and Y\. Liu \(2024h\)Your large language model is secretly a fairness proponent and you should prompt it like one\.Vol\.abs/2402\.12150\.External Links:[Link](https://arxiv.org/abs/2402.12150)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1)\.
- T\. Li, S\. Das, H\. Lee, D\. Wang, B\. Yao, and Z\. Zhang \(2024i\)Human\-centered privacy research in the age of large language models\.InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems,pp\. 1–4\.Cited by:[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1)\.
- X\. L\. Li, U\. Khandelwal, and K\. Guu \(2024j\)Few\-shot recalibration of language models\.Vol\.abs/2403\.18286\.External Links:[Link](https://arxiv.org/abs/2403.18286)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- X\. Li, Z\. C\. Lipton, and L\. Leqi \(2024k\)Personalized language modeling from personalized human feedback\.Vol\.abs/2402\.05133\.External Links:[Link](https://arxiv.org/abs/2402.05133)Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- X\. Li, F\. Tramèr, P\. Liang, and T\. Hashimoto \(2022\)Large language models can be strong differentially private learners\.InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\-29, 2022,External Links:[Link](https://openreview.net/forum?id=bVuP3ltATMz)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- Y\. Li, Z\. Gan, and B\. Zheng \(2025e\)How do artificial intelligence chatbots affect customer purchase? uncovering the dual pathways of anthropomorphism on service evaluation\.Information Systems Frontiers27\(1\),pp\. 283–300\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p4.1)\.
- P\. Liang, R\. Bommasani, T\. Lee, D\. Tsipras, D\. Soylu, M\. Yasunaga, Y\. Zhang, D\. Narayanan, Y\. Wu, A\. Kumar,et al\.\(2022\)Holistic evaluation of language models\.ArXiv preprintabs/2211\.09110\.External Links:[Link](https://arxiv.org/abs/2211.09110)Cited by:[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p3.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1),[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- Q\. V\. Liao and S\. S\. Sundar \(2022\)Designing for responsible trust in ai systems: a communication perspective\.InProceedings of the 2022 ACM conference on fairness, accountability, and transparency,pp\. 1257–1268\.Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- Q\. V\. Liao and J\. W\. Vaughan \(2023a\)Ai transparency in the age of llms: a human\-centered research roadmap\.ArXiv preprintabs/2306\.01941\.External Links:[Link](https://arxiv.org/abs/2306.01941)Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- Q\. Liao and J\. Vaughan \(2023b\)AI transparency in the age of llms: a human\-centered research roadmap\.ArXiv preprintabs/2306\.01941\.External Links:[Link](https://arxiv.org/abs/2306.01941)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p6.1)\.
- C\. Lin \(2004\)ROUGE: a package for automatic evaluation of summaries\.InText Summarization Branches Out,Barcelona, Spain,pp\. 74–81\.External Links:[Link](https://aclanthology.org/W04-1013)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p2.1)\.
- S\. Lin, J\. Hilton, and O\. Evans \(2022\)TruthfulQA: measuring how models mimic human falsehoods\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 3214–3252\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.229),[Link](https://aclanthology.org/2022.acl-long.229)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px2.p1.1)\.
- Y\. Lin, P\. He, H\. Xu, Y\. Xing, M\. Yamada, H\. Liu, and J\. Tang \(2024\)Towards understanding jailbreak attacks in llms: a representation space analysis\.arXiv preprint arXiv:2406\.10794\.Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
- C\. Liu, W\. Zhang, Y\. Zhao, L\. A\. Tuan, and L\. Bing \(2025\)Is translation all you need? a study on solving multilingual tasks with large language models\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 9594–9614\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- N\. F\. Liu, K\. Lin, J\. Hewitt, A\. Paranjape, M\. Bevilacqua, F\. Petroni, and P\. Liang \(2024a\)Lost in the middle: how language models use long contexts\.Transactions of the Association for Computational Linguistics12,pp\. 157–173\.Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px1.p1.1)\.
- N\. F\. Liu, T\. Zhang, and P\. Liang \(2023\)Evaluating verifiability in generative search engines\.arXiv preprint arXiv:2304\.09848\.Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- S\. Liu, Y\. Yao, J\. Jia, S\. Casper, N\. Baracaldo, P\. Hase, Y\. Yao, C\. Y\. Liu, X\. Xu, H\. Li, K\. R\. Varshney, M\. Bansal, S\. Koyejo, and Y\. Liu \(2024b\)Rethinking machine unlearning for large language models\.Vol\.abs/2402\.08787\.External Links:[Link](https://arxiv.org/abs/2402.08787)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- W\. Liu, W\. Zeng, K\. He, Y\. Jiang, and J\. He \(2024c\)What makes good data for alignment? a comprehensive study of automatic data selection in instruction tuning\.InICLR,Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- Y\. Liu, Y\. Liu, X\. Chen, P\. Chen, D\. Zan, M\. Kan, and T\. Ho \(2024d\)The devil is in the neurons: interpreting and mitigating social biases in language models\.InThe twelfth international conference on learning representations,Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- Y\. Liu, T\. Cong, Z\. Zhao, M\. Backes, Y\. Shen, and Y\. Zhang \(2024e\)Robustness over time: understanding adversarial examples’ effectiveness on longitudinal versions of large language models\.External Links:2308\.07847,[Link](https://arxiv.org/abs/2308.07847)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p4.1)\.
- L\. Long, R\. Wang, R\. Xiao, J\. Zhao, X\. Ding, G\. Chen, and H\. Wang \(2024a\)On llms\-driven synthetic data generation, curation, and evaluation: a survey\.Note:arXiv preprint arXiv:2406\.15126v1 \[cs\.CL\]License: arXiv\.org perpetual non\-exclusive licenseExternal Links:[Link](https://arxiv.org/abs/2406.15126v1)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p1.1)\.
- T\. Long, K\. I\. Gero, and L\. B\. Chilton \(2024b\)Not just novelty: a longitudinal study on utility and customization of an ai workflow\.Vol\.abs/2402\.09894\.External Links:[Link](https://arxiv.org/abs/2402.09894)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p7.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p8.1)\.
- S\. Longpre, S\. Biderman, A\. Albalak, H\. Schoelkopf, D\. McDuff, S\. Kapoor, K\. Klyman, K\. Lo, G\. Ilharco, N\. San,et al\.\(2024a\)The responsible foundation model development cheatsheet: a review of tools & resources\.Transactions on Machine Learning Research\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- S\. Longpre, L\. Hou, T\. Vu, A\. Webson, H\. W\. Chung, Y\. Tay, D\. Zhou, Q\. V\. Le, B\. Zoph, J\. Wei, and A\. Roberts \(2023a\)The flan collection: designing data and methods for effective instruction tuning\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 22631–22648\.External Links:[Link](https://proceedings.mlr.press/v202/longpre23a.html)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- S\. Longpre, R\. Mahari, A\. Chen, N\. Obeng\-Marnu, D\. Sileo, W\. Brannon, N\. Muennighoff, N\. Khazam, J\. Kabbara, K\. Perisetla,et al\.\(2023b\)The data provenance initiative: a large scale audit of dataset licensing & attribution in ai\.Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1)\.
- S\. Longpre, R\. Mahari, A\. Lee, C\. Lund, H\. Oderinwale, W\. Brannon, N\. Saxena, N\. Obeng\-Marnu, T\. South, C\. Hunter,et al\.\(2024b\)Consent in crisis: the rapid decline of the ai data commons\.Advances in Neural Information Processing Systems37,pp\. 108042–108087\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- S\. Longpre, G\. Yauney, E\. Reif, K\. Lee, A\. Roberts, B\. Zoph, D\. Zhou, J\. Wei, K\. Robinson, D\. Mimno, and D\. Ippolito \(2024c\)A pretrainer’s guide to training data: measuring the effects of data age, domain coverage, quality, & toxicity\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 3245–3276\.External Links:[Link](https://aclanthology.org/2024.naacl-long.179)Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- A\. Lopez\-Cardona, C\. Segura, A\. Karatzoglou, S\. Abadal, and I\. Arapakis \(2024\)Seeing eye to ai: human alignment via gaze\-based response rewards for large language models\.ArXiv preprintabs/2410\.01532\.External Links:[Link](https://arxiv.org/abs/2410.01532)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- Y\. Lu, S\. Yang, C\. Qian, G\. Chen, Q\. Luo, Y\. Wu, H\. Wang, X\. Cong, Z\. Zhang, Y\. Lin,et al\.\(2024a\)Proactive agent: shifting llm agents from reactive responses to active assistance\.arXiv preprint arXiv:2410\.12361\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- Y\. Lu, D\. Jiang, W\. Chen, W\. Y\. Wang, Y\. Choi, and B\. Y\. Lin \(2024b\)Wildvision: evaluating vision\-language models in the wild with human preferences\.Advances in Neural Information Processing Systems37,pp\. 48224–48255\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px3.p1.1)\.
- A\. S\. Luccioni and J\. D\. Viviano \(2021\)What’s in the box? a preliminary analysis of undesirable content in the common crawl corpus\.Vol\.abs/2105\.02732\.External Links:[Link](https://arxiv.org/abs/2105.02732)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1)\.
- J\. Łucki, B\. Wei, Y\. Huang, P\. Henderson, F\. Tramèr, and J\. Rando \(2024\)An adversarial perspective on machine unlearning for ai safety\.arXiv preprint arXiv:2409\.18025\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- L\. Lucy, S\. Gururangan, L\. Soldaini, E\. Strubell, D\. Bamman, L\. Klein, and J\. Dodge \(2024\)AboutMe: using self\-descriptions in webpages to document the effects of English pretraining data filters\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 7393–7420\.External Links:[Link](https://aclanthology.org/2024.acl-long.400/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.400)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- M\. Lutz, I\. Sen, G\. Ahnert, E\. Rogers, and M\. Strohmaier \(2025\)The prompt makes the person \(a\): a systematic evaluation of sociodemographic persona prompting for large language models\.arXiv preprint arXiv:2507\.16076\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1)\.
- B\. Lwowski and A\. Rios \(2021\)The risk of racial bias while tracking influenza\-related content on social media using machine learning\.Journal of the American Medical Informatics Association28\(4\),pp\. 839–849\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1)\.
- Q\. Ma, K\. Koedinger, and T\. Wu \(2025a\)Not everyone wins with llms: behavioral patterns and pedagogical implications in ai\-assisted data analysis\.arXiv preprint arXiv:2509\.21890\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p3.1)\.
- S\. Ma, Q\. Chen, X\. Wang, C\. Zheng, Z\. Peng, M\. Yin, and X\. Ma \(2025b\)Towards human\-ai deliberation: design and evaluation of llm\-empowered deliberative ai for ai\-assisted decision\-making\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–23\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px2.p2.1)\.
- Z\. Ma, Y\. Mei, Y\. Long, Z\. Su, and K\. Z\. Gajos \(2024\)Evaluating the experience of lgbtq\+ people using large language model based chatbots for mental health support\.InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems,pp\. 1–15\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p2.1)\.
- B\. N\. Macnamara, I\. Berber, M\. C\. Çavuşoğlu, E\. A\. Krupinski, N\. Nallapareddy, N\. E\. Nelson, P\. J\. Smith, A\. L\. Wilson\-Delfosse, and S\. Ray \(2024\)Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers’ awareness?\.Cognitive Research: Principles and Implications9\(1\),pp\. 46\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px2.p1.1)\.
- P\. Maini, S\. Goyal, D\. Sam, A\. Robey, Y\. Savani, Y\. Jiang, A\. Zou, M\. Fredrikson, Z\. C\. Lipton, and J\. Z\. Kolter \(2025\)Safety pretraining: toward the next generation of safe ai\.arXiv preprint arXiv:2504\.16980\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- P\. Maini, S\. Seto, R\. Bai, D\. Grangier, Y\. Zhang, and N\. Jaitly \(2024\)Rephrasing the web: a recipe for compute and data\-efficient language modeling\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 14044–14072\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p1.1)\.
- R\. I\. Masoud, Z\. Liu, M\. Ferianc, P\. Treleaven, and M\. Rodrigues \(2023\)Cultural alignment in large language models: an explanatory analysis based on hofstede’s cultural dimensions\.Vol\.abs/2309\.12342\.External Links:[Link](https://arxiv.org/abs/2309.12342)Cited by:[item 2](https://arxiv.org/html/2605.06901#S4.I1.i2.p1.1)\.
- D\. Masson, S\. Malacria, G\. Casiez, and D\. Vogel \(2024\)Directgpt: a direct manipulation interface to interact with large language models\.InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems,pp\. 1–16\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1)\.
- N\. Mathur, T\. Zubatiy, A\. Rozga, J\. Forlizzi, and E\. Mynatt \(2025\)" Sometimes you need facts, and sometimes a hug": understanding older adults’ preferences for explanations in llm\-based conversational ai systems\.arXiv preprint arXiv:2510\.06697\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px1.p1.1)\.
- M\. Mazeika, A\. Gatti, C\. Menghini, U\. M\. Sehwag, S\. Singhal, Y\. Orlovskiy, S\. Basart, M\. Sharma, D\. Peskoff, E\. Lau,et al\.\(2025\)Remote labor index: measuring ai automation of remote work\.arXiv preprint arXiv:2510\.26787\.Cited by:[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p3.1)\.
- T\. R\. McIntosh, T\. Susnjak, N\. Arachchilage, T\. Liu, P\. Watters, and M\. N\. Halgamuge \(2024\)Inadequacies of large language model benchmarks in the era of generative artificial intelligence\.ArXiv preprintabs/2402\.09880\.External Links:[Link](https://arxiv.org/abs/2402.09880)Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- N\. Mehrabi, F\. Morstatter, N\. Saxena, K\. Lerman, and A\. Galstyan \(2021\)A survey on bias and fairness in machine learning\.ACM computing surveys \(CSUR\)54\(6\),pp\. 1–35\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p1.1)\.
- A\. Mehrotra, M\. Zampetakis, P\. Kassianik, B\. Nelson, H\. Anderson, Y\. Singer, and A\. Karbasi \(2023\)Tree of attacks: jailbreaking black\-box llms automatically\. corr, abs/2312\.02119, 2023\. doi: 10\.48550\.ArXiv preprintabs/2312\.02119\.External Links:[Link](https://arxiv.org/abs/2312.02119)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
- L\. Meincke, E\. R\. Mollick, and C\. Terwiesch \(2024\)Prompting diverse ideas: increasing ai idea variance\.Vol\.abs/2402\.01727\.External Links:[Link](https://arxiv.org/abs/2402.01727)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1)\.
- S\. K\. Mendu, H\. Yenala, A\. Gulati, S\. Kumar, and P\. Agrawal \(2025\)Towards safer pretraining: analyzing and filtering harmful content in webscale datasets for responsible llms\.arXiv preprint arXiv:2505\.02009\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- K\. Meng, A\. S\. Sharma, A\. J\. Andonian, Y\. Belinkov, and D\. Bau \(2023\)Mass\-editing memory in a transformer\.InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\-5, 2023,External Links:[Link](https://openreview.net/pdf?id=MkbcAHIYgyS)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- L\. Messeri and M\. J\. Crockett \(2024\)Artificial intelligence and illusions of understanding in scientific research\.Nature627\(8002\),pp\. 49–58\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px2.p1.1)\.
- M\. Miceli, J\. Posada, and T\. Yang \(2022\)Studying up machine learning data: why talk about bias when we mean power?\.Proceedings of the ACM on Human\-Computer Interaction6\(GROUP\),pp\. 1–14\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- M\. Miceli, M\. Schuessler, and T\. Yang \(2020\)Between subjectivity and imposition: power dynamics in data annotation for computer vision\.Proceedings of the ACM on Human\-Computer Interaction4\(CSCW2\),pp\. 1–25\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1)\.
- E\. Miehling, M\. Desmond, K\. N\. Ramamurthy, E\. M\. Daly, K\. R\. Varshney, E\. Farchi, P\. Dognin, J\. Rios, D\. Bouneffouf, M\. Liu,et al\.\(2025\)Evaluating the prompt steerability of large language models\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p1.1)\.
- R\. Mihalcea, O\. Ignat, L\. Bai, A\. Borah, L\. Chiruzzo, Z\. Jin, C\. Kwizera, J\. Nwatu, S\. Poria, and T\. Solorio \(2025\)Why ai is weird and shouldn’t be this way: towards ai for everyone, with everyone, by everyone\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 28657–28670\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p6.1)\.
- S\. Minaee, T\. Mikolov, N\. Nikzad, M\. Chenaghlu, R\. Socher, X\. Amatriain, and J\. Gao \(2024\)Large language models: a survey\.arXiv preprint arXiv:2402\.06196\.Cited by:[§4](https://arxiv.org/html/2605.06901#S4.p2.1)\.
- S\. Mishra, D\. Khashabi, C\. Baral, and H\. Hajishirzi \(2022\)Cross\-task generalization via natural language crowdsourcing instructions\.InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 3470–3487\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.acl-long.244),[Link](https://aclanthology.org/2022.acl-long.244)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p1.1),[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- M\. Mitchell, S\. Wu, A\. Zaldivar, P\. Barnes, L\. Vasserman, B\. Hutchinson, E\. Spitzer, I\. D\. Raji, and T\. Gebru \(2019\)Model cards for model reporting\.InProceedings of the Conference on Fairness, Accountability, and Transparency,FAT\* ’19,New York, NY, USA,pp\. 220–229\.External Links:[Document](https://dx.doi.org/10.1145/3287560.3287596),ISBN 9781450361255,[Link](https://doi.org/10.1145/3287560.3287596)Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p1.1)\.
- D\. Mizrahi, A\. B\. L\. Larsen, J\. Allardice, S\. Petryk, Y\. Gorokhov, J\. Li, A\. Fang, J\. Gardner, T\. Gunter, and A\. Dehghan \(2025\)Language models improve when pretraining data matches target tasks\.arXiv preprint arXiv:2507\.12466\.Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- J\. Mökander, J\. Schuett, H\. R\. Kirk, and L\. Floridi \(2024\)Auditing large language models: a three\-layered approach\.AI and Ethics4\(4\),pp\. 1085–1115\.Cited by:[§3\.1](https://arxiv.org/html/2605.06901#S3.SS1.p1.1)\.
- Y\. Moon, H\. Nam, W\. Choi, N\. Kim, S\. Kwak, and T\. Oh \(2024\)SYNAuG: exploiting synthetic data for data imbalance problems\.Note:[https://arxiv\.org/abs/2308\.00994v3](https://arxiv.org/abs/2308.00994v3)External Links:2308\.00994Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- J\. Moore, T\. Deshpande, and D\. Yang \(2024\)Are large language models consistent over value\-laden questions?\.InFindings of the Association for Computational Linguistics: EMNLP 2024,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 15185–15221\.External Links:[Link](https://aclanthology.org/2024.findings-emnlp.891)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1)\.
- L\. Morales\-Navarro, Y\. B\. Kafai, L\. Vogelstein, E\. Yu, and D\. Metaxa \(2025\)Learning about algorithm auditing in five steps: scaffolding how high school youth can systematically and critically evaluate machine learning applications\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 29186–29194\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- L\. Morales\-Navarro, Y\. Kafai, V\. Konda, and D\. Metaxa \(2024\)Youth as peer auditors: engaging teenagers with algorithm auditing of machine learning applications\.InProceedings of the 23rd Annual ACM Interaction Design and Children Conference,pp\. 560–573\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- H\. Morrin, L\. Nicholls, M\. Levin, J\. Yiend, U\. Iyengar, F\. DelGuidice, S\. Bhattacharyya, J\. MacCabe, S\. Tognin, R\. Twumasi,et al\.\(2025\)Delusions by design? how everyday ais might be fuelling psychosis \(and what can be done about it\)\.OSF\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- R\. Movva, P\. W\. Koh, and E\. Pierson \(2024\)Annotation alignment: comparing llm and human annotations of conversational safety\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 9048–9062\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- R\. Movva, S\. Milli, S\. Min, and E\. Pierson \(2025\)What’s in my human feedback? learning interpretable descriptions of preference data\.arXiv preprint arXiv:2510\.26202\.Cited by:[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- H\. Mozaffari \(2013\)An analytical rubric for assessing creativity in creative writing\.Theory and Practice in Language Studies3\(12\)\.External Links:[Document](https://dx.doi.org/10.4304/tpls.3.12.2214-2219)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p3.1)\.
- H\. Mozannar, V\. Chen, M\. Alsobay, S\. Das, S\. Zhao, D\. Wei, M\. Nagireddy, P\. Sattigeri, A\. Talwalkar, and D\. Sontag \(2024\)The realhumaneval: evaluating large language models’ abilities to support programmers\.Vol\.abs/2404\.02806\.External Links:[Link](https://arxiv.org/abs/2404.02806)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p5.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p6.1)\.
- N\. Muennighoff, Q\. Liu, A\. Zebaze, Q\. Zheng, B\. Hui, T\. Y\. Zhuo, S\. Singh, X\. Tang, L\. von Werra, and S\. Longpre \(2023a\)OctoPack: instruction tuning code large language models\.Vol\.abs/2308\.07124\.External Links:[Link](https://arxiv.org/abs/2308.07124)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1.p3.1)\.
- N\. Muennighoff, T\. Wang, L\. Sutawika, A\. Roberts, S\. Biderman, T\. Le Scao, M\. S\. Bari, S\. Shen, Z\. X\. Yong, H\. Schoelkopf, X\. Tang, D\. Radev, A\. F\. Aji, K\. Almubarak, S\. Albanie, Z\. Alyafeai, A\. Webson, E\. Raff, and C\. Raffel \(2023b\)Crosslingual generalization through multitask finetuning\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 15991–16111\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.891),[Link](https://aclanthology.org/2023.acl-long.891)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- N\. Muennighoff, Z\. Yang, W\. Shi, X\. L\. Li, L\. Fei\-Fei, H\. Hajishirzi, L\. Zettlemoyer, P\. Liang, E\. Candès, and T\. B\. Hashimoto \(2025\)S1: simple test\-time scaling\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 20286–20332\.Cited by:[§4\.1](https://arxiv.org/html/2605.06901#S4.SS1.p1.1)\.
- D\. F\. Mujtaba and N\. R\. Mahapatra \(2019\)Ethical considerations in ai\-based recruitment\.In2019 IEEE International Symposium on Technology and Society \(ISTAS\),pp\. 1–7\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- M\. Nadeem, A\. Bethke, and S\. Reddy \(2021\)StereoSet: measuring stereotypical bias in pretrained language models\.InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing \(Volume 1: Long Papers\),C\. Zong, F\. Xia, W\. Li, and R\. Navigli \(Eds\.\),Online,pp\. 5356–5371\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.acl-long.416),[Link](https://aclanthology.org/2021.acl-long.416)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p2.1)\.
- N\. Nangia, C\. Vania, R\. Bhalerao, and S\. R\. Bowman \(2020\)CrowS\-pairs: a challenge dataset for measuring social biases in masked language models\.InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing \(EMNLP\),B\. Webber, T\. Cohn, Y\. He, and Y\. Liu \(Eds\.\),Online,pp\. 1953–1967\.External Links:[Document](https://dx.doi.org/10.18653/v1/2020.emnlp-main.154),[Link](https://aclanthology.org/2020.emnlp-main.154)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p2.1)\.
- T\. Naous, P\. Laban, W\. Xu, and J\. Neville \(2025\)Flipping the dialogue: training and evaluating user language models\.arXiv preprint arXiv:2510\.06552\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1)\.
- T\. Naous, M\. J\. Ryan, A\. Ritter, and W\. Xu \(2024\)Having beer after prayer? measuring cultural bias in large language models\.Association for Computational Linguistics,Bangkok, Thailand\.External Links:[Link](https://aclanthology.org/2024.acl-long.862/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.862)Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- R\. Navigli, S\. Conia, and B\. Ross \(2023\)Biases in large language models: origins, inventory, and discussion\.ACM Journal of Data and Information Quality15\(2\),pp\. 1–21\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1)\.
- S\. Neel and P\. Chang \(2023\)Privacy issues in large language models: a survey\.arXiv preprint arXiv:2312\.06717\.Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- B\. Nguyen, M\. Yu, Y\. Huang, and M\. Jiang \(2024\)Reference\-based metrics disprove themselves in question generation\.External Links:2403\.12242,[Link](https://arxiv.org/abs/2403.12242)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p4.1)\.
- Y\. Nie, Y\. Kong, X\. Dong, J\. M\. Mulvey, H\. V\. Poor, Q\. Wen, and S\. Zohren \(2024\)A survey of large language models for financial applications: progress, prospects and challenges\.arXiv preprint arXiv:2406\.11903\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- D\. A\. Norman \(1988\)The design of everyday things\.Revised and Expanded Edition edition,Basic Books,New York, NY\.External Links:ISBN 978\-0465050659Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p1.1)\.
- M\. Nye, A\. J\. Andreassen, G\. Gur\-Ari, H\. Michalewski, J\. Austin, D\. Bieber, D\. Dohan, A\. Lewkowycz, M\. Bosma, D\. Luan, C\. Sutton, and A\. Odena \(2021\)Show your work: scratchpads for intermediate computation with language models\.Vol\.abs/2112\.00114\.External Links:[Link](https://arxiv.org/abs/2112.00114)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p2.1)\.
- K\. O’Brien, S\. Casper, Q\. Anthony, T\. Korbak, R\. Kirk, X\. Davies, I\. Mishra, G\. Irving, Y\. Gal, and S\. Biderman \(2025\)Deep ignorance: filtering pretraining data builds tamper\-resistant safeguards into open\-weight llms\.arXiv preprint arXiv:2508\.06601\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- Z\. Obermeyer, B\. Powers, C\. Vogeli, and S\. Mullainathan \(2019\)Dissecting racial bias in an algorithm used to manage the health of populations\.Science366\(6464\),pp\. 447–453\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- D\. Occhipinti, S\. Tekiroglu, and M\. Guerini \(2024\)PRODIGy: a PROfile\-based DIalogue generation dataset\.InFindings of the Association for Computational Linguistics: NAACL 2024,K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 3500–3514\.External Links:[Link](https://aclanthology.org/2024.findings-naacl.222)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1)\.
- C\. T\. Okolo \(2024\)Beyond ai hype: a hands\-on workshop series for enhancing ai literacy in middle and high school students\.InProceedings of the 2024 on RESPECT Annual Conference,pp\. 86–93\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- T\. Olmo, A\. Ettinger, A\. Bertsch, B\. Kuehl, D\. Graham, D\. Heineman, D\. Groeneveld, F\. Brahman, F\. Timbers, H\. Ivison, J\. Morrison, J\. Poznanski, K\. Lo, L\. Soldaini, M\. Jordan, M\. Chen, M\. Noukhovitch, N\. Lambert, P\. Walsh, P\. Dasigi, R\. Berry, S\. Malik, S\. Shah, S\. Geng, S\. Arora, S\. Gupta, T\. Anderson, T\. Xiao, T\. Murray, T\. Romero, V\. Graf, A\. Asai, A\. Bhagia, A\. Wettig, A\. Liu, A\. Rangapur, C\. Anastasiades, C\. Huang, D\. Schwenk, H\. Trivedi, I\. Magnusson, J\. Lochner, J\. Liu, L\. J\. V\. Miranda, M\. Sap, M\. Morgan, M\. Schmitz, M\. Guerquin, M\. Wilson, R\. Huff, R\. L\. Bras, R\. Xin, R\. Shao, S\. Skjonsberg, S\. Z\. Shen, S\. S\. Li, T\. Wilde, V\. Pyatkin, W\. Merrill, Y\. Chang, Y\. Gu, Z\. Zeng, A\. Sabharwal, L\. Zettlemoyer, P\. W\. Koh, A\. Farhadi, N\. A\. Smith, and H\. Hajishirzi \(2025\)Olmo 3\.arXiv preprint arXiv:2512\.13961\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2512.13961),[Link](https://arxiv.org/abs/2512.13961)Cited by:[§4\.1](https://arxiv.org/html/2605.06901#S4.SS1.p1.1)\.
- J\. S\. Olson and W\. A\. Kellogg \(2014\)Ways of knowing in hci\.Vol\.2,Springer\.Cited by:[§2\.3](https://arxiv.org/html/2605.06901#S2.SS3.p1.1)\.
- OpenAI \(2024a\)Learning To Reason With LLMs\.Note:[https://openai\.com/index/learning\-to\-reason\-with\-llms/](https://openai.com/index/learning-to-reason-with-llms/)\[Accessed 09\-02\-2025\]Cited by:[§4\.3\.4](https://arxiv.org/html/2605.06901#S4.SS3.SSS4.p2.1)\.
- OpenAI \(2024b\)Memory and new controls for chatgpt\.Note:https://openai\.com/index/memory\-and\-new\-controls\-for\-chatgpt/Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- OpenAI \(2025a\)External Links:[Link](https://openai.com/gpt-5-bio-bug-bounty/)Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- OpenAI \(2025b\)Measuring the performance of our models on real\-world tasks\.Note:[https://openai\.com/index/gdpval/](https://openai.com/index/gdpval/)Accessed 2025\-10\-13Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1)\.
- I\. Orife, J\. Kreutzer, B\. Sibanda, D\. Whitenack, K\. Siminyu, L\. Martinus, J\. T\. Ali, J\. Abbott, V\. Marivate, S\. Kabongo,et al\.\(2020\)Masakhane–machine translation for africa\.arXiv preprint arXiv:2003\.11529\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p2.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- J\. T\. Ornstein, E\. N\. Blasingame, and J\. S\. Truscott \(2023\)How to train your stochastic parrot: large language models for political texts\.Political Science Research and Methods,pp\. 1–18\.Cited by:[§7](https://arxiv.org/html/2605.06901#S7.p2.1)\.
- A\. Oulasvirta \(2008\)Field experiments in hci: promises and challenges\.InFuture interaction design II,pp\. 87–116\.Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p1.1)\.
- L\. Ouyang, J\. Wu, X\. Jiang, D\. Almeida, C\. L\. Wainwright, P\. Mishkin, C\. Zhang, S\. Agarwal, K\. Slama, A\. Ray, J\. Schulman, J\. Hilton, F\. Kelton, L\. Miller, M\. Simens, A\. Askell, P\. Welinder, P\. F\. Christiano, J\. Leike, and R\. Lowe \(2022\)Training language models to follow instructions with human feedback\.InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 \- December 9, 2022,S\. Koyejo, S\. Mohamed, A\. Agarwal, D\. Belgrave, K\. Cho, and A\. Oh \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html)Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1),[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p2.1),[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1),[§4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1.p3.1),[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p1.1),[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.p1.1),[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- D\. Paglieri, L\. Cross, W\. A\. Cunningham, J\. Z\. Leibo, and A\. S\. Vezhnevets \(2026\)Persona generators: generating diverse synthetic personas at scale\.arXiv preprint arXiv:2602\.03545\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1)\.
- S\. Palan and C\. Schitter \(2018\)Prolific\. ac—a subject pool for online experiments\.Journal of behavioral and experimental finance17,pp\. 22–27\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p4.1)\.
- W\. Pang, K\. Q\. Lin, X\. Jian, X\. He, and P\. Torr \(2025\)Paper2Poster: towards multimodal poster automation from scientific papers\.arXiv preprint arXiv:2505\.21497\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- A\. Panickssery, S\. R\. Bowman, and S\. Feng \(2024\)LLM evaluators recognize and favor their own generations\.Vol\.abs/2404\.13076\.External Links:[Link](https://arxiv.org/abs/2404.13076)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- A\. Papenmeier, D\. Kern, G\. Englebienne, and C\. Seifert \(2022\)It’s complicated: the relationship between user trust, model accuracy and explanations in ai\.ACM Transactions on Computer\-Human Interaction \(TOCHI\)29\(4\),pp\. 1–33\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- K\. Papineni, S\. Roukos, T\. Ward, and W\. Zhu \(2002\)Bleu: a method for automatic evaluation of machine translation\.InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics,P\. Isabelle, E\. Charniak, and D\. Lin \(Eds\.\),Philadelphia, Pennsylvania, USA,pp\. 311–318\.External Links:[Document](https://dx.doi.org/10.3115/1073083.1073135),[Link](https://aclanthology.org/P02-1040)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p2.1)\.
- E\. Park, W\. H\. Deng, V\. Varadarajan, M\. Yan, G\. Kim, M\. Sap, and M\. Eslami \(2025\)Critical or compliant? the double\-edged sword of reasoning in chain\-of\-thought explanations\.arXiv preprint arXiv:2511\.12001\.Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p1.1)\.
- J\. S\. Park, J\. O’Brien, C\. J\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2023a\)Generative agents: interactive simulacra of human behavior\.InProceedings of the 36th annual acm symposium on user interface software and technology,pp\. 1–22\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p3.1)\.
- J\. S\. Park, L\. Popowski, C\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2022\)Social simulacra: creating populated prototypes for social computing systems\.InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology,pp\. 1–18\.Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p4.1)\.
- K\. Park, Y\. J\. Choe, and V\. Veitch \(2023b\)The linear representation hypothesis and the geometry of large language models\.InCausal Representation Learning Workshop at NeurIPS 2023,External Links:[Link](https://openreview.net/forum?id=T0PoOJg8cK)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- M\. Park, S\. Kim, S\. Lee, S\. Kwon, and K\. Kim \(2024\)Empowering personalized learning through a conversation\-based tutoring system with student modeling\.InExtended Abstracts of the CHI Conference on Human Factors in Computing Systems,pp\. 1–10\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- S\. M\. Park, K\. Georgiev, A\. Ilyas, G\. Leclerc, and A\. Madry \(2023c\)TRAK: attributing model behavior at scale\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 27074–27113\.External Links:[Link](https://proceedings.mlr.press/v202/park23c.html)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- B\. Parmanto, B\. Aryoyudanta, W\. Soekinto, I\. Setiawan, Y\. Wang, H\. Hu, A\. Saptono, and Y\. K\. Choi \(2024\)Development of a reliable and accessible caregiving language model \(calm\)\.arXiv preprint arXiv:2403\.06857\.Cited by:[§5\.2](https://arxiv.org/html/2605.06901#S5.SS2.p1.1)\.
- A\. Parrish, A\. Chen, N\. Nangia, V\. Padmakumar, J\. Phang, J\. Thompson, P\. M\. Htut, and S\. Bowman \(2022\)BBQ: a hand\-built bias benchmark for question answering\.InFindings of the Association for Computational Linguistics: ACL 2022,S\. Muresan, P\. Nakov, and A\. Villavicencio \(Eds\.\),Dublin, Ireland,pp\. 2086–2105\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.findings-acl.165),[Link](https://aclanthology.org/2022.findings-acl.165)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- P\. Pataranutaporn, S\. Karny, C\. Archiwaranguprok, C\. Albrecht, A\. R\. Liu, and P\. Maes \(2025\)“My boyfriend is ai”’: a computational analysis of human\-ai companionship in reddit’s ai community\.arXiv preprint arXiv:2509\.11391\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- T\. Patwardhan, R\. Dias, E\. Proehl, G\. Kim, M\. Wang, O\. Watkins, S\. P\. Fishman, M\. Aljubeh, P\. Thacker, L\. Fauconnet,et al\.\(2025\)Gdpval: evaluating ai model performance on real\-world economically valuable tasks\.arXiv preprint arXiv:2510\.04374\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px3.p1.1)\.
- A\. Paullada, I\. D\. Raji, E\. M\. Bender, E\. Denton, and A\. Hanna \(2021\)Data and its \(dis\) contents: a survey of dataset development and use in machine learning research\.Patterns2\(11\)\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1),[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- S\. R\. Pendse, D\. Gergle, R\. Kornfield, J\. Meyerhoff, D\. Mohr, J\. Suh, A\. Wescott, C\. Williams, and J\. Schleider \(2025\)When testing ai tests us: safeguarding mental health on the digital frontlines\.InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency,pp\. 1793–1804\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1)\.
- G\. Penedo, H\. Kydlíček, A\. Lozhkov, M\. Mitchell, C\. A\. Raffel, L\. Von Werra, T\. Wolf,et al\.\(2024\)The fineweb datasets: decanting the web for the finest text data at scale\.Advances in Neural Information Processing Systems37,pp\. 30811–30849\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- G\. Penedo, Q\. Malartic, D\. Hesslow, R\. Cojocaru, A\. Cappelli, H\. Alobeidli, B\. Pannier, E\. Almazrouei, and J\. Launay \(2023\)The refinedweb dataset for falcon llm: outperforming curated corpora with web data, and web data only\.arXiv preprint arXiv:2306\.01116\.External Links:[Link](https://arxiv.org/abs/2306.01116)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- B\. Peng, C\. Li, P\. He, M\. Galley, and J\. Gao \(2023a\)Instruction tuning with gpt\-4\.ArXiv preprintabs/2304\.03277\.External Links:[Link](https://arxiv.org/abs/2304.03277)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- J\. Peng, S\. Cheng, E\. Diau, Y\. Shih, P\. Chen, Y\. Lin, and Y\. Chen \(2024\)A survey of useful llm evaluation\.External Links:2406\.00936,[Link](https://arxiv.org/abs/2406.00936)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p5.1)\.
- S\. Peng, E\. Kalliamvakou, P\. Cihon, and M\. Demirer \(2023b\)The impact of ai on developer productivity: evidence from github copilot\.arXiv preprint arXiv:2302\.06590\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- P\. Pengpun, C\. Udomcharoenchaikit, W\. Buaphet, and P\. Limkonchotiwat \(2024\)Seed\-free synthetic data generation framework for instruction\-tuning llms: a case study in thai\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 4: Student Research Workshop\),pp\. 445–464\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px2.p1.1)\.
- I\. Pentina, T\. Hancock, and T\. Xie \(2023\)Exploring relationship development with social chatbots: a mixed\-method study of replika\.Computers in Human Behavior140,pp\. 107600\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- E\. Perez, S\. Huang, F\. Song, T\. Cai, R\. Ring, J\. Aslanides, A\. Glaese, N\. McAleese, and G\. Irving \(2022\)Red teaming language models with language models\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,pp\. 3419–3448\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p1.1)\.
- E\. Perez, S\. Ringer, K\. Lukosiute, K\. Nguyen, E\. Chen, S\. Heiner, C\. Pettit, C\. Olsson, S\. Kundu, S\. Kadavath,et al\.\(2023\)Discovering language model behaviors with model\-written evaluations\.InFindings of the association for computational linguistics: ACL 2023,pp\. 13387–13434\.Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p1.1)\.
- F\. Perez and I\. Ribeiro \(2022\)Ignore previous prompt: attack techniques for language models\.Vol\.abs/2211\.09527\.External Links:[Link](https://arxiv.org/abs/2211.09527)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p1.1)\.
- D\. Pessach and E\. Shmueli \(2022\)A review on fairness in machine learning\.ACM Computing Surveys \(CSUR\)55\(3\),pp\. 1–44\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- M\. Phutane and A\. Vashistha \(2025\)Disability across cultures: a human\-centered audit of ableism in western and indic llms\.InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society,Vol\.8,pp\. 2000–2014\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- S\. Poddar, Y\. Wan, H\. Ivison, A\. Gupta, and N\. Jaques \(2024\)Personalizing reinforcement learning from human feedback with variational preference learning\.InThe Thirty\-eighth Annual Conference on Neural Information Processing Systems,External Links:[Link](https://openreview.net/forum?id=gRG6SzbW9p)Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1)\.
- S\. Prabhumoye, M\. Patwary, M\. Shoeybi, and B\. Catanzaro \(2023\)Adding instructions during pretraining: effective way of controlling toxicity in language models\.InProceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics,A\. Vlachos and I\. Augenstein \(Eds\.\),Dubrovnik, Croatia,pp\. 2636–2651\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.eacl-main.193),[Link](https://aclanthology.org/2023.eacl-main.193)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1.p2.1),[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p2.1)\.
- A\. Prokofieva, F\. A\. Celikyilmaz, D\. Z\. Hakkani\-Tur, L\. Heck, and M\. Slaney \(2019\)Eye gaze for spoken language understanding in multi\-modal conversational interactions\.Google Patents\.Note:US Patent 10,317,992Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1)\.
- R\. Qadri, M\. Diaz, D\. Wang, and M\. Madaio \(2025\)The case for" thick evaluations" of cultural representation in ai\.arXiv preprint arXiv:2503\.19075\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1),[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p2.1),[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- X\. Qi, Y\. Zeng, T\. Xie, P\. Chen, R\. Jia, P\. Mittal, and P\. Henderson \(2023\)Fine\-tuning aligned language models compromises safety, even when users do not intend to\!\.External Links:2310\.03693Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- L\. Qin, Q\. Chen, Y\. Zhou, Z\. Chen, Y\. Li, L\. Liao, M\. Li, W\. Che, and P\. S\. Yu \(2025\)A survey of multilingual large language models\.Patterns6\(1\)\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p1.1),[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- J\. W\. Rae, S\. Borgeaud, T\. Cai, K\. Millican, J\. Hoffmann, F\. Song, J\. Aslanides, S\. Henderson, R\. Ring, S\. Young,et al\.\(2021\)Scaling language models: methods, analysis & insights from training gopher\.ArXiv preprintabs/2112\.11446\.External Links:[Link](https://arxiv.org/abs/2112.11446)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- R\. Rafailov, A\. Sharma, E\. Mitchell, C\. D\. Manning, S\. Ermon, and C\. Finn \(2023\)Direct preference optimization: your language model is secretly a reward model\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html)Cited by:[§4\.2\.2](https://arxiv.org/html/2605.06901#S4.SS2.SSS2.p1.1)\.
- C\. Raffel, N\. Shazeer, A\. Roberts, K\. Lee, S\. Narang, M\. Matena, Y\. Zhou, W\. Li, and P\. J\. Liu \(2020\)Exploring the limits of transfer learning with a unified text\-to\-text transformer\.J\. Mach\. Learn\. Res\.21,pp\. 140:1–140:67\.External Links:[Link](http://jmlr.org/papers/v21/20-074.html)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1),[§4\.3\.3](https://arxiv.org/html/2605.06901#S4.SS3.SSS3.p4.1)\.
- S\. C\. Rainie, T\. Kukutai, M\. Walter, O\. L\. Figueroa\-Rodríguez, J\. Walker, and P\. Axelsson \(2019\)Indigenous data sovereignty\.Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p3.1)\.
- I\. D\. Raji, E\. M\. Bender, A\. Paullada, E\. Denton, and A\. Hanna \(2021\)AI and the everything in the whole wide world benchmark\.ArXiv preprintabs/2111\.15366\.External Links:[Link](https://arxiv.org/abs/2111.15366)Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p1.1)\.
- I\. D\. Raji, I\. E\. Kumar, A\. Horowitz, and A\. Selbst \(2022\)The fallacy of ai functionality\.In2022 ACM Conference on Fairness, Accountability, and Transparency,FAccT ’22,pp\. 959–972\.External Links:[Link](http://dx.doi.org/10.1145/3531146.3533158),[Document](https://dx.doi.org/10.1145/3531146.3533158)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p6.1)\.
- V\. Ramesh, R\. Zhao, and N\. Goel \(2023\)Decentralised, scalable and privacy\-preserving synthetic data generation\.Vol\.abs/2310\.20062\.External Links:[Link](https://arxiv.org/abs/2310.20062)Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p2.1)\.
- L\. Ranaldi, G\. Pucci, and A\. Freitas \(2024\)Empowering cross\-lingual abilities of instruction\-tuned large language models by translation\-following demonstrations\.InFindings of the Association for Computational Linguistics: ACL 2024,pp\. 7961–7973\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- A\. Ranganathan and X\. M\. Ye \(2026\)AI doesn’t reduce work—it intensifies it\.Harvard Business Review\.External Links:[Link](https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies-it)Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1)\.
- C\. Rastogi, T\. H\. Teh, P\. Mishra, R\. Patel, D\. Wang, M\. Diaz, A\. Parrish, A\. M\. Davani, Z\. Ashwood, M\. Paganini,et al\.\(2025\)Whose view of safety? a deep dive dataset for pluralistic alignment of text\-to\-image models\.InThe Thirty\-ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track,Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p1.1)\.
- N\. Rathi, D\. Jurafsky, and K\. Zhou \(2025\)Humans overrely on overconfident language models, across languages\.InSecond Conference on Language Modeling,Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p1.1)\.
- T\. Rebedea, R\. Dinu, M\. N\. Sreedhar, C\. Parisien, and J\. Cohen \(2023\)Nemo guardrails: a toolkit for controllable and safe llm applications with programmable rails\.InProceedings of the 2023 conference on empirical methods in natural language processing: system demonstrations,pp\. 431–445\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- L\. Reicherts, Z\. T\. Zhang, E\. Von Oswald, Y\. Liu, Y\. Rogers, and M\. Hassib \(2025\)AI, help me think—but for myself: assisting people in complex decision\-making by providing different kinds of cognitive support\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–19\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px2.p2.1)\.
- M\. Reid, N\. Savinov, D\. Teplyashin, D\. Lepikhin, T\. Lillicrap, J\. Alayrac, R\. Soricut, A\. Lazaridou, O\. Firat, J\. Schrittwieser,et al\.\(2024\)Gemini 1\.5: unlocking multimodal understanding across millions of tokens of context\.ArXiv preprintabs/2403\.05530\.External Links:[Link](https://arxiv.org/abs/2403.05530)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- T\. Reinhart \(1980\)Conditions for text coherence\.Poetics Today1\(4\),pp\. 161–180\.Note:Narratology II: The Fictional Text and the Reader \(Summer, 1980\)External Links:[Document](https://dx.doi.org/10.2307/1771893),[Link](https://www.jstor.org/stable/1771893)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p2.1)\.
- M\. T\. Ribeiro, S\. Singh, and C\. Guestrin \(2016\)"Why should I trust you?": explaining the predictions of any classifier\.InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13\-17, 2016,B\. Krishnapuram, M\. Shah, A\. J\. Smola, C\. C\. Aggarwal, D\. Shen, and R\. Rastogi \(Eds\.\),pp\. 1135–1144\.External Links:[Document](https://dx.doi.org/10.1145/2939672.2939778),[Link](https://doi.org/10.1145/2939672.2939778)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px1.p1.1)\.
- C\. Richardson, Y\. Zhang, K\. Gillespie, S\. Kar, A\. Singh, Z\. Raeesy, O\. Z\. Khan, and A\. Sethy \(2023\)Integrating summarization and retrieval for enhanced personalization via large language models\.Vol\.abs/2310\.20081\.External Links:[Link](https://arxiv.org/abs/2310.20081)Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px2.p1.1)\.
- N\. Rimsky, N\. Gabrieli, J\. Schulz, M\. Tong, E\. Hubinger, and A\. Turner \(2024\)Steering llama 2 via contrastive activation addition\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 15504–15522\.Cited by:[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- R\. G\. Rinderknecht, L\. Doan, and L\. C\. Sayer \(2025\)The daily lives of crowdsourced us respondents: a time use comparison of mturk, prolific, and atus\.Sociological Methodology,pp\. 00811750241312226\.Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p4.1)\.
- K\. Roehrick \(2020\)Valence aware dictionary and sentiment reasoner \(vader\)\.Note:R package version 0\.2\.1External Links:[Link](https://cran.r-project.org/package=vader)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- E\. Rolf, T\. T\. Worledge, B\. Recht, and M\. Jordan \(2021\)Representation matters: assessing the importance of subgroup allocations in training data\.InInternational conference on machine learning,pp\. 9040–9051\.Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p1.1)\.
- P\. Röttger, M\. Hinck, V\. Hofmann, K\. Hackenburg, V\. Pyatkin, F\. Brahman, and D\. Hovy \(2025\)IssueBench: millions of realistic prompts for measuring issue bias in llm writing assistance\.arXiv preprint arXiv:2502\.08395\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px2.p1.1)\.
- P\. Röttger, V\. Hofmann, V\. Pyatkin, M\. Hinck, H\. Kirk, H\. Schütze, and D\. Hovy \(2024a\)Political compass or spinning arrow? towards more meaningful evaluations for values and opinions in large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 15295–15311\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- P\. Röttger, F\. Pernisi, B\. Vidgen, and D\. Hovy \(2024b\)Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety\.ArXiv preprintabs/2404\.05399\.External Links:[Link](https://arxiv.org/abs/2404.05399)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.p2.1)\.
- P\. K\. Rubenstein, C\. Asawaroengchai, D\. D\. Nguyen, A\. Bapna, Z\. Borsos, F\. de Chaumont Quitry, P\. Chen, D\. E\. Badawy, W\. Han, E\. Kharitonov, H\. Muckenhirn, D\. Padfield, J\. Qin, D\. Rozenberg, T\. Sainath, J\. Schalkwyk, M\. Sharifi, M\. T\. Ramanovich, M\. Tagliasacchi, A\. Tudor, M\. Velimirović, D\. Vincent, J\. Yu, Y\. Wang, V\. Zayats, N\. Zeghidour, Y\. Zhang, Z\. Zhang, L\. Zilka, and C\. Frank \(2023\)AudioPaLM: a large language model that can speak and listen\.Vol\.abs/2306\.12925\.External Links:[Link](https://arxiv.org/abs/2306.12925)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- R\. Rudinger, J\. Naradowsky, B\. Leonard, and B\. Van Durme \(2018\)Gender bias in coreference resolution\.InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 \(Short Papers\),M\. Walker, H\. Ji, and A\. Stent \(Eds\.\),New Orleans, Louisiana,pp\. 8–14\.External Links:[Document](https://dx.doi.org/10.18653/v1/N18-2002),[Link](https://aclanthology.org/N18-2002)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p2.1)\.
- M\. J\. Ryan, W\. Held, and D\. Yang \(2024\)Unintended impacts of LLM alignment on global representation\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 16121–16140\.External Links:[Link](https://aclanthology.org/2024.acl-long.853/),[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.853)Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p1.1),[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p5.1)\.
- M\. J\. Ryan, O\. Shaikh, A\. Bhagirath, D\. Frees, W\. B\. Held, and D\. Yang \(2025\)Synthesizeme\! inducing persona\-guided prompts for personalized reward models in llms\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 8045–8078\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1),[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p3.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1)\.
- N\. Sachdeva, B\. Coleman, W\. Kang, J\. Ni, L\. Hong, E\. H\. Chi, J\. Caverlee, J\. McAuley, and D\. Z\. Cheng \(2024\)How to train data\-efficient llms\.arXiv preprint arXiv:2402\.09668\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- S\. Sadeghi, S\. Gupta, S\. Gramatovici, J\. Lu, H\. Ai, and R\. Zhang \(2022\)Novelty and primacy: a long\-term estimator for online experiments\.Technometrics64\(4\),pp\. 524–534\.External Links:[Document](https://dx.doi.org/10.1080/00401706.2022.2124309),[Link](https://doi.org/10.1080/00401706.2022.2124309),https://doi\.org/10\.1080/00401706\.2022\.2124309Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p8.1)\.
- A\. B\. Sai, A\. K\. Mohankumar, and M\. M\. Khapra \(2022\)A survey of evaluation metrics used for nlg systems\.ACM Comput\. Surv\.55\(2\)\.External Links:ISSN 0360\-0300,[Link](https://doi.org/10.1145/3485766),[Document](https://dx.doi.org/10.1145/3485766)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p1.1)\.
- A\. Salemi, S\. Mysore, M\. Bendersky, and H\. Zamani \(2024\)LaMP: when large language models meet personalization\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 7370–7392\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.399),[Link](https://aclanthology.org/2024.acl-long.399)Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p3.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px2.p1.1)\.
- A\. Salinas, P\. Shah, Y\. Huang, R\. McCormack, and F\. Morstatter \(2023\)The unequal opportunities of large language models: examining demographic biases in job recommendations by chatgpt and llama\.InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization,pp\. 1–15\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p3.1)\.
- A\. V\. Sambra, E\. Mansour, S\. Hawke, M\. Zereba, N\. Greco, A\. Ghanem, D\. Zagidulin, A\. Aboulnaga, and T\. Berners\-Lee \(2016\)Solid: a platform for decentralized social applications based on linked data\.MIT CSAIL & Qatar Computing Research Institute, Tech\. Rep\.2016\.Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p1.1)\.
- V\. Samuel, H\. P\. Zou, Y\. Zhou, S\. Chaudhari, A\. Kalyan, T\. Rajpurohit, A\. Deshpande, K\. Narasimhan, and V\. Murahari \(2024\)Personagym: evaluating persona agents and llms\.arXiv preprint arXiv:2407\.18416\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p3.1)\.
- P\. A\. Samuelson \(1948\)Consumption theory in terms of revealed preference\.Economica15\(60\),pp\. 243–253\.Cited by:[§4\.4\.2](https://arxiv.org/html/2605.06901#S4.SS4.SSS2.p1.1)\.
- S\. Santurkar, E\. Durmus, F\. Ladhak, C\. Lee, P\. Liang, and T\. Hashimoto \(2023\)Whose opinions do language models reflect?\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 29971–30004\.External Links:[Link](https://proceedings.mlr.press/v202/santurkar23a.html)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1),[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.p1.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p4.1)\.
- M\. Sap, D\. Card, S\. Gabriel, Y\. Choi, and N\. A\. Smith \(2019\)The risk of racial bias in hate speech detection\.InProceedings of the 57th annual meeting of the association for computational linguistics,pp\. 1668–1678\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1),[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p2.1)\.
- D\. Sasu, K\. A\. Yamoah, B\. Quartey, and N\. Schluter \(2025\)Enhancing speech instruction understanding and disambiguation in robotics via speech prosody\.arXiv preprint arXiv:2506\.02057\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- W\. Saunders, C\. Yeh, J\. Wu, S\. Bills, L\. Ouyang, J\. Ward, and J\. Leike \(2022\)Self\-critiquing models for assisting human evaluators\.Vol\.abs/2206\.05802\.External Links:[Link](https://arxiv.org/abs/2206.05802)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p1.1)\.
- N\. Schaller, Y\. Ding, A\. Horbach, J\. Meyer, and T\. Jansen \(2024\)Fairness in automated essay scoring: a comparative analysis of algorithms on german learner essays from secondary education\.InProceedings of the 19th workshop on innovative use of nlp for building educational applications \(bea 2024\),pp\. 210–221\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p2.1)\.
- M\. K\. Scheuerman, A\. Hanna, and R\. Denton \(2021\)Do datasets have politics? disciplinary values in computer vision dataset development\.Proceedings of the ACM on Human\-Computer Interaction5\(CSCW2\),pp\. 1–37\.Cited by:[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p1.1),[§3](https://arxiv.org/html/2605.06901#S3.p2.1)\.
- J\. Schulman, S\. Levine, P\. Abbeel, M\. I\. Jordan, and P\. Moritz \(2015\)Trust region policy optimization\.InProceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6\-11 July 2015,F\. R\. Bach and D\. M\. Blei \(Eds\.\),JMLR Workshop and Conference Proceedings, Vol\.37,pp\. 1889–1897\.External Links:[Link](http://proceedings.mlr.press/v37/schulman15.html)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p2.1)\.
- J\. Schulman, F\. Wolski, P\. Dhariwal, A\. Radford, and O\. Klimov \(2017\)Proximal policy optimization algorithms\.Vol\.abs/1707\.06347\.External Links:[Link](https://arxiv.org/abs/1707.06347)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p2.1)\.
- I\. Sen, M\. Lutz, E\. Rogers, D\. Garcia, and M\. Strohmaier \(2025\)Missing the margins: a systematic literature review on the demographic representativeness of llms\.Findings of the Association for Computational Linguistics: ACL 2025,pp\. 24263–24289\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p1.1)\.
- P\. Seshadri, P\. Pezeshkpour, and S\. Singh \(2022\)Quantifying social biases using templates is unreliable\.ArXiv preprintabs/2210\.04337\.External Links:[Link](https://arxiv.org/abs/2210.04337)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- A\. Seyitoğlu, A\. Kuvshinov, L\. Schwinn, and S\. Günnemann \(2024\)Extracting unlearned information from llms with activation steering\.ArXiv preprintabs/2411\.02631\.External Links:[Link](https://arxiv.org/abs/2411.02631)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- D\. S\. Shah, H\. A\. Schwartz, and D\. Hovy \(2020\)Predictive biases in natural language processing models: a conceptual framework and overview\.InProceedings of the 58th annual meeting of the association for computational linguistics,pp\. 5248–5264\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p1.1)\.
- O\. Shaikh, V\. E\. Chai, M\. Gelfand, D\. Yang, and M\. S\. Bernstein \(2024a\)Rehearsal: simulating conflict to teach conflict resolution\.InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11\-16, 2024,F\. ’\. Mueller, P\. Kyburz, J\. R\. Williamson, C\. Sas, M\. L\. Wilson, P\. O\. T\. Dugas, and I\. Shklovski \(Eds\.\),pp\. 920:1–920:20\.External Links:[Document](https://dx.doi.org/10.1145/3613904.3642159),[Link](https://doi.org/10.1145/3613904.3642159)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p5.1),[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p5.1)\.
- O\. Shaikh, K\. Gligorić, A\. Khetan, M\. Gerstgrasser, D\. Yang, and D\. Jurafsky \(2023a\)Grounding or guesswork? large language models are presumptive grounders\.ArXiv preprintabs/2311\.09144\.External Links:[Link](https://arxiv.org/abs/2311.09144)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p2.1)\.
- O\. Shaikh, M\. Lam, J\. Hejna, Y\. Shao, M\. Bernstein, and D\. Yang \(2024b\)Show, don’t tell: aligning language models with demonstrated feedback\.ArXiv preprintabs/2406\.00888\.External Links:[Link](https://arxiv.org/abs/2406.00888)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px2.p1.1),[§4\.2\.2](https://arxiv.org/html/2605.06901#S4.SS2.SSS2.p2.1),[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1)\.
- O\. Shaikh, S\. Sapkota, S\. Rizvi, E\. Horvitz, J\. S\. Park, D\. Yang, and M\. S\. Bernstein \(2025\)Creating general user models from computer use\.InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology,UIST ’25,New York, NY, USA\.External Links:ISBN 9798400720376,[Link](https://doi.org/10.1145/3746059.3747722),[Document](https://dx.doi.org/10.1145/3746059.3747722)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1),[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- O\. Shaikh, H\. Zhang, W\. Held, M\. Bernstein, and D\. Yang \(2023b\)On second thought, let‘s not think step by step\! bias and toxicity in zero\-shot reasoning\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 4454–4470\.External Links:[Link](https://aclanthology.org/2023.acl-long.244/),[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.244)Cited by:[§4\.3\.4](https://arxiv.org/html/2605.06901#S4.SS3.SSS4.p2.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1),[§6\.3](https://arxiv.org/html/2605.06901#S6.SS3.p1.1)\.
- M\. Shanahan, K\. McDonell, and L\. Reynolds \(2023\)Role play with large language models\.Nature623\(7987\),pp\. 493–498\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p3.1)\.
- Y\. Shao, T\. Li, W\. Shi, Y\. Liu, and D\. Yang \(2024a\)Privacylens: evaluating privacy norm awareness of language models in action\.Advances in Neural Information Processing Systems37,pp\. 89373–89407\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- Y\. Shao, V\. Samuel, Y\. Jiang, J\. Yang, and D\. Yang \(2024b\)Collaborative gym: a framework for enabling and evaluating human\-agent collaboration\.arXiv preprint arXiv:2412\.15701\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p1.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1)\.
- Y\. Shao, H\. Zope, Y\. Jiang, J\. Pei, D\. Nguyen, E\. Brynjolfsson, and D\. Yang \(2025\)Future of work with ai agents: auditing automation and augmentation potential across the us workforce\.arXiv preprint arXiv:2506\.06576\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1),[§7\.1](https://arxiv.org/html/2605.06901#S7.SS1.p2.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1),[§7](https://arxiv.org/html/2605.06901#S7.p2.1)\.
- A\. Sharma, I\. W\. Lin, A\. S\. Miner, D\. C\. Atkins, and T\. Althoff \(2023\)Human–ai collaboration enables more empathic conversations in text\-based peer\-to\-peer mental health support\.Nature Machine Intelligence5\(1\),pp\. 46–57\.External Links:[Link](https://www.nature.com/articles/s42256-022-00593-2#citeas),[Document](https://dx.doi.org/10.1038/s42256-022-00593-2)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p4.1)\.
- M\. Sharma, M\. Tong, T\. Korbak, D\. Duvenaud, A\. Askell, S\. R\. Bowman, E\. DURMUS, Z\. Hatfield\-Dodds, S\. R\. Johnston, S\. M\. Kravec, T\. Maxwell, S\. McCandlish, K\. Ndousse, O\. Rausch, N\. Schiefer, D\. Yan, M\. Zhang, and E\. Perez \(2024\)Towards understanding sycophancy in language models\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=tvhaxkMKAn)Cited by:[§4\.5\.2](https://arxiv.org/html/2605.06901#S4.SS5.SSS2.p2.1),[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- R\. Shelby, S\. Rismani, K\. Henne, A\. Moon, N\. Rostamzadeh, P\. Nicholas, N\. Yilla\-Akbari, J\. Gallegos, A\. Smart, E\. Garcia,et al\.\(2023\)Sociotechnical harms of algorithmic systems: scoping a taxonomy for harm reduction\.InProceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society,pp\. 723–741\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p1.1),[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.p1.1)\.
- J\. H\. Shen and A\. Tamkin \(2026\)How ai impacts skill formation\.arXiv preprint arXiv:2601\.20245\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p1.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px2.p1.1)\.
- S\. Z\. Shen, V\. Chen, K\. Gu, A\. Ross, Z\. Ma, J\. Ross, A\. Gu, C\. Si, W\. Chi, A\. Peng,et al\.\(2025\)Completion≠\\neqcollaboration: scaling collaborative effort with agents\.arXiv preprint arXiv:2510\.25744\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.p1.1)\.
- E\. Sheng, K\. Chang, P\. Natarajan, and N\. Peng \(2019\)The woman worked as a babysitter: on biases in language generation\.InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing \(EMNLP\-IJCNLP\),K\. Inui, J\. Jiang, V\. Ng, and X\. Wan \(Eds\.\),Hong Kong, China,pp\. 3407–3412\.External Links:[Document](https://dx.doi.org/10.18653/v1/D19-1339),[Link](https://aclanthology.org/D19-1339)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p3.1)\.
- T\. Shi, J\. Xu, X\. Zhang, X\. Zang, K\. Zheng, Y\. Song, and H\. Li \(2025\)Retrieval augmented generation with collaborative filtering for personalized text generation\.InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval,pp\. 1294–1304\.Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px2.p1.1)\.
- W\. Shi, R\. Shea, S\. Chen, C\. Zhang, R\. Jia, and Z\. Yu \(2022\)Just fine\-tune twice: selective differential privacy for large language models\.InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing,Y\. Goldberg, Z\. Kozareva, and Y\. Zhang \(Eds\.\),Abu Dhabi, United Arab Emirates,pp\. 6327–6340\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.emnlp-main.425),[Link](https://aclanthology.org/2022.emnlp-main.425)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- G\. Shin, Y\. Feng, M\. H\. Jarrahi, and N\. Gafinowitz \(2018\)Beyond novelty effect: a mixed\-methods exploration into the motivation for long\-term activity tracker use\.JAMIA Open2\(1\),pp\. 62–72\.External Links:ISSN 2574\-2531,[Document](https://dx.doi.org/10.1093/jamiaopen/ooy048),[Link](https://doi.org/10.1093/jamiaopen/ooy048),https://academic\.oup\.com/jamiaopen/article\-pdf/2/1/62/32298485/ooy048\.pdfCited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p8.1)\.
- B\. Shneiderman and P\. Maes \(1997\)Direct manipulation vs\. interface agents\.interactions4\(6\),pp\. 42–61\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p1.1)\.
- B\. Shneiderman \(2020\)Human\-Centered Artificial Intelligence: Three Fresh Ideas\.AIS Transactions on Human\-Computer Interaction12\(3\),pp\. 109–124\.External Links:[Document](https://dx.doi.org/10.17705/1thci.00131),ISSN 1944\-3900,[Link](https://aisel.aisnet.org/thci/vol12/iss3/1)Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p4.1)\.
- B\. Shneiderman \(2022\)Human\-Centered AI\.Oxford University Press\(en\)\.Note:Google\-Books\-ID: mSRXEAAAQBAJExternal Links:ISBN 978\-0\-19\-266000\-8Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p4.1)\.
- M\. Shu, J\. Wang, C\. Zhu, J\. Geiping, C\. Xiao, and T\. Goldstein \(2023\)On the exploitability of instruction tuning\.Advances in Neural Information Processing Systems36,pp\. 61836–61856\.Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p2.1)\.
- C\. Si, D\. Yang, and T\. Hashimoto \(2024\)Can llms generate novel research ideas? a large\-scale human study with 100\+ nlp researchers\.External Links:2409\.04109,[Link](https://arxiv.org/abs/2409.04109)Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- C\. Si, Y\. Zhang, R\. Li, Z\. Yang, R\. Liu, and D\. Yang \(2025\)Design2code: benchmarking multimodal code generation for automated front\-end engineering\.InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 3956–3974\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- E\. Simon and J\. Zou \(2024\)InterPLM: discovering interpretable features in protein language models via sparse autoencoders\.bioRxiv,pp\. 2024–11\.Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p4.1)\.
- A\. Singh and T\. Joachims \(2018\)Fairness of exposure in rankings\.InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19\-23, 2018,Y\. Guo and F\. Farooq \(Eds\.\),pp\. 2219–2228\.External Links:[Document](https://dx.doi.org/10.1145/3219819.3220088),[Link](https://doi.org/10.1145/3219819.3220088)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- S\. Singh, Y\. Nan, A\. Wang, D\. D’Souza, S\. Kapoor, A\. Üstün, S\. Koyejo, Y\. Deng, S\. Longpre, N\. A\. Smith,et al\.\(2025\)The leaderboard illusion\.arXiv preprint arXiv:2504\.20879\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1)\.
- S\. Singh, F\. Vargus, D\. Dsouza, B\. F\. Karlsson, A\. Mahendiran, W\. Ko, H\. Shandilya,et al\.\(2024\)Aya dataset: an open\-access collection for multilingual instruction tuning\.ArXiv preprintabs/2402\.06619\.External Links:[Link](https://arxiv.org/abs/2402.06619)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p1.1),[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- K\. Singhal, S\. Azizi, T\. Tu, S\. S\. Mahdavi, J\. Wei, H\. W\. Chung, N\. Scales, A\. Tanwani, H\. Cole\-Lewis, S\. Pfohl,et al\.\(2023\)Large language models encode clinical knowledge\.Nature620\(7972\),pp\. 172–180\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- N\. A\. Smuha \(2025\)Regulation 2024/1689 of the eur\. parl\. & council of june 13, 2024 \(eu artificial intelligence act\)\.International Legal Materials64\(5\),pp\. 1234–1381\.External Links:[Link](https://eur-lex.europa.eu/eli/reg/2024/1689/oj)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p1.1)\.
- V\. Snæbjarnarson, H\. B\. Símonarson, P\. O\. Ragnarsson, S\. L\. Ingólfsdóttir, H\. Jónsson, V\. Þorsteinsson, and H\. Einarsson \(2022\)A warm start and a clean crawled corpus\-a recipe for good language models\.InProceedings of the Thirteenth Language Resources and Evaluation Conference,pp\. 4356–4366\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- I\. Solaiman, Z\. Talat, W\. Agnew, L\. Ahmad, D\. Baker, S\. L\. Blodgett, C\. Chen, H\. Daumé III, J\. Dodge, I\. Duan,et al\.\(2023\)Evaluating the social impact of generative ai systems in systems and society\.ArXiv preprintabs/2306\.05949\.External Links:[Link](https://arxiv.org/abs/2306.05949)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- L\. Soldaini, R\. Kinney, A\. Bhagia, D\. Schwenk, D\. Atkinson, R\. Authur, B\. Bogin, K\. Chandu, J\. Dumas, Y\. Elazar, V\. Hofmann, A\. H\. Jha, S\. Kumar, L\. Lucy, X\. Lyu, N\. Lambert, I\. Magnusson, J\. Morrison, N\. Muennighoff, A\. Naik, C\. Nam, M\. E\. Peters, A\. Ravichander, K\. Richardson, Z\. Shen, E\. Strubell, N\. Subramani, O\. Tafjord, P\. Walsh, L\. Zettlemoyer, N\. A\. Smith, H\. Hajishirzi, I\. Beltagy, D\. Groeneveld, J\. Dodge, and K\. Lo \(2024\)Dolma: an open corpus of three trillion tokens for language model pretraining research\.Vol\.abs/2402\.00159\.External Links:[Link](https://arxiv.org/abs/2402.00159)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p2.1)\.
- J\. Solyst, C\. Peng, W\. H\. Deng, P\. Pratapa, A\. Ogan, J\. Hammer, J\. Hong, and M\. Eslami \(2025\)Investigating youth ai auditing\.InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency,pp\. 2098–2111\.Cited by:[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px3.p2.1)\.
- T\. Sorensen, L\. Jiang, J\. D\. Hwang, S\. Levine, V\. Pyatkin, P\. West, N\. Dziri, X\. Lu, K\. Rao, C\. Bhagavatula, M\. Sap, J\. Tasioulas, and Y\. Choi \(2024a\)Value kaleidoscope: engaging AI with pluralistic human values, rights, and duties\.InThirty\-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty\-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20\-27, 2024, Vancouver, Canada,M\. J\. Wooldridge, J\. G\. Dy, and S\. Natarajan \(Eds\.\),pp\. 19937–19947\.External Links:[Document](https://dx.doi.org/10.1609/AAAI.V38I18.29970),[Link](https://doi.org/10.1609/aaai.v38i18.29970)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p2.1)\.
- T\. Sorensen, J\. Moore, J\. Fisher, M\. L\. Gordon, N\. Mireshghallah, C\. M\. Rytting, A\. Ye, L\. Jiang, X\. Lu, N\. Dziri, T\. Althoff, and Y\. Choi \(2024b\)Position: a roadmap to pluralistic alignment\.InICML,External Links:[Link](https://openreview.net/forum?id=gQpBnRHwxM)Cited by:[§4\.5](https://arxiv.org/html/2605.06901#S4.SS5.p1.1)\.
- T\. Sorensen, J\. Moore, J\. Fisher, M\. Gordon, N\. Mireshghallah, C\. M\. Rytting, A\. Ye, L\. Jiang, X\. Lu, N\. Dziri,et al\.\(2024c\)A roadmap to pluralistic alignment\.ArXiv preprintabs/2402\.05070\.External Links:[Link](https://arxiv.org/abs/2402.05070)Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p5.1),[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px1.p1.1)\.
- T\. Sorensen, B\. Newman, J\. Moore, C\. Park, J\. Fisher, N\. Mireshghallah, L\. Jiang, and Y\. Choi \(2025\)Spectrum tuning: post\-training for distributional coverage and in\-context steerability\.arXiv preprint arXiv:2510\.06084\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p2.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- A\. Srivastava, A\. Rastogi, A\. Rao, A\. A\. M\. Shoeb, A\. Abid, A\. Fisch, A\. R\. Brown, A\. Santoro, A\. Gupta, A\. Garriga\-Alonso,et al\.\(2022\)Beyond the imitation game: quantifying and extrapolating the capabilities of language models\.ArXiv preprintabs/2206\.04615\.External Links:[Link](https://arxiv.org/abs/2206.04615)Cited by:[§5](https://arxiv.org/html/2605.06901#S5.p1.1)\.
- R\. Staab, M\. Vero, M\. Balunović, and M\. Vechev \(2023\)Beyond memorization: violating privacy via inference with large language models\.Vol\.abs/2310\.07298\.External Links:[Link](https://arxiv.org/abs/2310.07298)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1),[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1)\.
- E\. C\. Stade, S\. W\. Stirman, L\. H\. Ungar, C\. L\. Boland, H\. A\. Schwartz, D\. B\. Yaden, J\. Sedoc, R\. J\. DeRubeis, R\. Willer, and J\. C\. Eichstaedt \(2024a\)Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation\.NPJ Mental Health Research3\(1\),pp\. 12\.Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- E\. C\. Stade, S\. W\. Stirman, L\. H\. Ungar, C\. L\. Boland, H\. A\. Schwartz, D\. B\. Yaden, J\. Sedoc, R\. J\. DeRubeis, R\. Willer, and J\. C\. Eichstaedt \(2024b\)Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation\.Nature\.External Links:[Link](https://www.nature.com/articles/s44184-024-00056-z)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p4.1)\.
- N\. Stiennon, L\. Ouyang, J\. Wu, D\. M\. Ziegler, R\. Lowe, C\. Voss, A\. Radford, D\. Amodei, and P\. Christiano \(2020\)Learning to summarize from human feedback\.Vol\.abs/2009\.01325\.External Links:[Link](https://arxiv.org/abs/2009.01325)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p1.1)\.
- M\. A\. Stranisci and C\. Hardmeier \(2025\)What are they filtering out? a survey of filtering strategies for harm reduction in pretraining datasets\.arXiv preprint arXiv:2503\.05721\.Cited by:[§6\.3\.1](https://arxiv.org/html/2605.06901#S6.SS3.SSS1.p2.1)\.
- D\. Su, K\. Kong, Y\. Lin, J\. Jennings, B\. Norick, M\. Kliegl, M\. Patwary, M\. Shoeybi, and B\. Catanzaro \(2025\)Nemotron\-cc: transforming common crawl into a refined long\-horizon pretraining dataset\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 2459–2475\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p5.1),[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- N\. Subramani, S\. Luccioni, J\. Dodge, and M\. Mitchell \(2023\)Detecting personal information in training corpora: an analysis\.InProceedings of the 3rd Workshop on Trustworthy Natural Language Processing \(TrustNLP 2023\),A\. Ovalle, K\. Chang, N\. Mehrabi, Y\. Pruksachatkun, A\. Galystan, J\. Dhamala, A\. Verma, T\. Cao, A\. Kumar, and R\. Gupta \(Eds\.\),Toronto, Canada,pp\. 208–220\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.trustnlp-1.18),[Link](https://aclanthology.org/2023.trustnlp-1.18)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p1.1),[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1),[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1)\.
- H\. Subramonyam, R\. Pea, C\. Pondoc, M\. Agrawala, and C\. Seifert \(2024\)Bridging the gulf of envisioning: cognitive challenges in prompt based interactions with llms\.InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems,pp\. 1–19\.Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p1.1),[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p2.1)\.
- S\. Suh, B\. Min, S\. Palani, and H\. Xia \(2023\)Sensecape: enabling multilevel exploration and sensemaking with large language models\.InProceedings of the 36th annual ACM symposium on user interface software and technology,pp\. 1–18\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p1.1)\.
- C\. Sun, A\. Shrivastava, S\. Singh, and A\. Gupta \(2017\)Revisiting unreasonable effectiveness of data in deep learning era\.InProceedings of the IEEE international conference on computer vision,pp\. 843–852\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- H\. Sun, Y\. Min, Z\. Chen, W\. X\. Zhao, L\. Fang, Z\. Liu, Z\. Wang, and J\. Wen \(2025\)Challenging the boundaries of reasoning: an olympiad\-level math benchmark for large language models\.arXiv preprint arXiv:2503\.21380\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px1.p1.1)\.
- H\. Suresh, R\. Movva, A\. L\. Dogan, R\. Bhargava, I\. Cruxen, A\. M\. Cuba, G\. Taurino, W\. So, and C\. D’Ignazio \(2022\)Towards intersectional feminist and participatory ml: a case study in supporting feminicide counterdata collection\.InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency,pp\. 667–678\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p2.1)\.
- H\. Suresh, E\. Tseng, M\. Young, M\. Gray, E\. Pierson, and K\. Levy \(2024\)Participation in the age of foundation models\.InProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency,pp\. 1609–1621\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- C\. Swoopes, T\. Holloway, and E\. L\. Glassman \(2025\)The impact of revealing large language model stochasticity on trust, reliability, and anthropomorphization\.arXiv preprint arXiv:2503\.16114\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- A\. Tamkin, A\. Askell, L\. Lovitt, E\. Durmus, N\. Joseph, S\. Kravec, K\. Nguyen, J\. Kaplan, and D\. Ganguli \(2023\)Evaluating and mitigating discrimination in language model decisions\.arXiv preprint arXiv:2312\.03689\.Cited by:[§3\.2\.3](https://arxiv.org/html/2605.06901#S3.SS2.SSS3.p2.1)\.
- A\. Tamkin, M\. Brundage, J\. Clark, and D\. Ganguli \(2021\)Understanding the capabilities, limitations, and societal impact of large language models\.Vol\.abs/2102\.02503\.External Links:[Link](https://arxiv.org/abs/2102.02503)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.p1.1)\.
- A\. Tamkin, M\. McCain, K\. Handa, E\. Durmus, L\. Lovitt, A\. Rathi, S\. Huang, A\. Mountfield, J\. Hong, S\. Ritchie,et al\.\(2024\)Clio: privacy\-preserving insights into real\-world ai use\.arXiv preprint arXiv:2412\.13678\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p1.1),[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p2.1),[§7](https://arxiv.org/html/2605.06901#S7.p1.1)\.
- Z\. Tan, Q\. Zeng, Y\. Tian, Z\. Liu, B\. Yin, and M\. Jiang \(2024\)Democratizing large language models via personalized parameter\-efficient fine\-tuning\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 6476–6491\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- L\. Tang, P\. Laban, and G\. Durrett \(2024a\)MiniCheck: efficient fact\-checking of LLMs on grounding documents\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,Y\. Al\-Onaizan, M\. Bansal, and Y\. Chen \(Eds\.\),Miami, Florida, USA,pp\. 8818–8847\.External Links:[Link](https://aclanthology.org/2024.emnlp-main.499/),[Document](https://dx.doi.org/10.18653/v1/2024.emnlp-main.499)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p5.1)\.
- T\. Tang, W\. Luo, H\. Huang, D\. Zhang, X\. Wang, X\. Zhao, F\. Wei, and J\. Wen \(2024b\)Language\-specific neurons: the key to multilingual capabilities in large language models\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 5701–5715\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.acl-long.309),[Link](https://aclanthology.org/2024.acl-long.309)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- Y\. Tao, O\. Viberg, R\. S\. Baker, and R\. F\. Kizilcec \(2024\)Cultural bias and cultural alignment of large language models\.PNAS nexus3\(9\),pp\. pgae346\.Cited by:[item 2](https://arxiv.org/html/2605.06901#S4.I1.i2.p1.1)\.
- R\. Taori, I\. Gulrajani, T\. Zhang, Y\. Dubois, X\. Li, C\. Guestrin, P\. Liang, and T\. B\. Hashimoto \(2023\)Stanford alpaca: an instruction\-following llama model\.GitHub\.Note:[https://github\.com/tatsu\-lab/stanford\_alpaca](https://github.com/tatsu-lab/stanford_alpaca)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p1.1),[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- Y\. Tay, M\. Dehghani, S\. Abnar, H\. Chung, W\. Fedus, J\. Rao, S\. Narang, V\. Tran, D\. Yogatama, and D\. Metzler \(2023\)Scaling laws vs model architectures: how does inductive bias influence scaling?\.InFindings of the Association for Computational Linguistics: EMNLP 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 12342–12364\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.825),[Link](https://aclanthology.org/2023.findings-emnlp.825)Cited by:[§4\.3\.1](https://arxiv.org/html/2605.06901#S4.SS3.SSS1.p2.1)\.
- M\. H\. Tessler, M\. A\. Bakker, D\. Jarrett, H\. Sheahan, M\. J\. Chadwick, R\. Koster, G\. Evans, L\. Campbell\-Gillingham, T\. Collins, D\. C\. Parkes,et al\.\(2024\)AI can help humans find common ground in democratic deliberation\.Science386\(6719\),pp\. eadq2852\.Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px3.p2.1)\.
- A\. J\. Thirunavukarasu, D\. S\. J\. Ting, K\. Elangovan, L\. Gutierrez, T\. F\. Tan, and D\. S\. W\. Ting \(2023\)Large language models in medicine\.Nature medicine29\(8\),pp\. 1930–1940\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1),[§7](https://arxiv.org/html/2605.06901#S7.p2.1)\.
- K\. Tian, E\. Mitchell, A\. Zhou, A\. Sharma, R\. Rafailov, H\. Yao, C\. Finn, and C\. Manning \(2023\)Just ask for calibration: strategies for eliciting calibrated confidence scores from language models fine\-tuned with human feedback\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 5433–5442\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.330),[Link](https://aclanthology.org/2023.emnlp-main.330)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- N\. Tomašev, M\. Franklin, and S\. Osindero \(2026\)Intelligent ai delegation\.arXiv preprint arXiv:2602\.11865\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- K\. Tomlinson, S\. Jaffe, W\. Wang, S\. Counts, and S\. Suri \(2025\)Working with ai: measuring the applicability of generative ai to occupations\.arXiv preprint arXiv:2507\.07935\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1)\.
- K\. Toyama \(2015\)Geek heresy: rescuing social change from the cult of technology\.PublicAffairs\.Cited by:[§4\.6\.2](https://arxiv.org/html/2605.06901#S4.SS6.SSS2.p3.1)\.
- E\. Tseng, M\. Young, M\. A\. Le Quéré, A\. Rinehart, and H\. Suresh \(2025\)" Ownership, not just happy talk": co\-designing a participatory large language model for journalism\.InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency,pp\. 3119–3130\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p2.1)\.
- Y\. Tseng, Y\. Huang, T\. Hsiao, W\. Chen, C\. Huang, Y\. Meng, and Y\. Chen \(2024\)Two tales of persona in llms: a survey of role\-playing and personalization\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 16612–16631\.Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px2.p1.1),[§4\.4](https://arxiv.org/html/2605.06901#S4.SS4.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- A\. M\. Turner, L\. Thiergart, D\. Udell, G\. Leech, U\. Mini, and M\. MacDiarmid \(2023\)Activation addition: steering language models without optimization\.ArXiv preprintabs/2308\.10248\.External Links:[Link](https://arxiv.org/abs/2308.10248)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- M\. Turpin, J\. Michael, E\. Perez, and S\. R\. Bowman \(2023\)Language models don’t always say what they think: unfaithful explanations in chain\-of\-thought prompting\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/ed3fea9033a80fea1376299fa7863f4a-Abstract-Conference.html)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px1.p1.1),[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p3.1)\.
- V\. Udandarao, Z\. Lu, X\. Chang, Y\. Wang, V\. Z\. Yao, A\. M\. Jose, F\. Faghri, J\. Gardner, and C\. Chiu \(2025\)Data\-centric lessons to improve speech\-language pretraining\.arXiv preprint arXiv:2510\.20860\.Cited by:[§4\.1\.3](https://arxiv.org/html/2605.06901#S4.SS1.SSS3.Px1.p1.1)\.
- K\. M\. Unertl, C\. L\. Schaefbauer, T\. R\. Campbell, C\. Senteio, K\. A\. Siek, S\. Bakken, and T\. C\. Veinot \(2016\)Integrating community\-based participatory research and informatics approaches to improve the engagement and health of underserved populations\.Journal of the American Medical Informatics Association23\(1\),pp\. 60–73\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p1.1)\.
- A\. Üstün, V\. Aryabumi, Z\. Yong, W\. Ko, D\. D’souza, G\. Onilude, N\. Bhandari, S\. Singh, H\. Ooi, A\. Kayid,et al\.\(2024\)Aya model: an instruction finetuned open\-access multilingual language model\.InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 15894–15939\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- K\. Van Es, D\. Everts, and I\. Muis \(2021\)Gendered language and employment web sites: how search algorithms can cause allocative harm\.First Monday\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p2.1)\.
- R\. van Noord, T\. Kuzman, P\. Rupnik, N\. Ljubesic, M\. Esplà\-Gomis, G\. Ramírez\-Sánchez, and A\. Toral \(2024\)Do language models care about text quality? evaluating web\-crawled corpora across 11 languages\.InLREC/COLING,Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- E\. Vanmassenhove, D\. Shterionov, and M\. Gwilliam \(2021\)Machine translationese: effects of algorithmic bias on linguistic complexity in machine translation\.InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume,pp\. 2203–2213\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- H\. Vasconcelos, M\. Jörke, M\. Grunde\-McLaughlin, T\. Gerstenberg, M\. S\. Bernstein, and R\. Krishna \(2023\)Explanations can reduce overreliance on ai systems during decision\-making\.Proceedings of the ACM on Human\-Computer Interaction7\(CSCW1\),pp\. 1–38\.Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1)\.
- S\. Vatsal, H\. Dubey, and A\. Singh \(2025\)Multilingual prompt engineering in large language models: a survey across nlp tasks\.arXiv preprint arXiv:2505\.11665\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p3.1)\.
- L\. M\. Vaughn and F\. Jacquez \(2020\)Participatory research methods–choice points in the research process\.Journal of participatory research methods1\(1\)\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p1.1),[§3\.2\.4](https://arxiv.org/html/2605.06901#S3.SS2.SSS4.p2.1)\.
- A\. K\. Veldanda, F\. Grob, S\. Thakur, H\. Pearce, B\. Tan, R\. Karri, and S\. Garg \(2023\)Investigating hiring bias in large language models\.InR0\-FoMo: Robustness of Few\-shot and Zero\-shot Learning in Large Foundation Models,Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p3.1)\.
- D\. Vennemeyer, P\. A\. Duong, T\. Zhan, and T\. Jiang \(2025\)Sycophancy is not one thing: causal separation of sycophantic behaviors in llms\.arXiv preprint arXiv:2509\.21305\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- P\. Verga, S\. Hofstatter, S\. Althammer, Y\. Su, A\. Piktus, A\. Arkhangorodsky, M\. Xu, N\. White, and P\. Lewis \(2024\)Replacing judges with juries: evaluating llm generations with a panel of diverse models\.External Links:2404\.18796,[Link](https://arxiv.org/abs/2404.18796)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- S\. Verma and J\. Rubin \(2018\)Fairness definitions explained\.InProceedings of the international workshop on software fairness,pp\. 1–7\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p4.1)\.
- V\. Veselovsky, B\. Argin, B\. Stroebl, C\. Wendler, R\. West, J\. Evans, T\. L\. Griffiths, and A\. Narayanan \(2025\)Localized cultural knowledge is conserved and controllable in large language models\.arXiv preprint arXiv:2504\.10191\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- A\. R\. Vijjini, S\. B\. R\. Chowdhury, and S\. Chaturvedi \(2024\)Exploring safety\-utility trade\-offs in personalized language models\.arXiv preprint arXiv:2406\.11107\.Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p8.1)\.
- P\. Villalobos, A\. Ho, J\. Sevilla, T\. Besiroglu, L\. Heim, and M\. Hobbhahn \(2024\)Position: will we run out of data? limits of llm scaling based on human\-generated data\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- J\. Vincent \(2023\)Getty images sues ai art generator stable diffusion in the us for copyright infringement\.Theverge\.External Links:[Link](https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-lawsuit-getty-images-stable-diffusion)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1)\.
- K\. Vongthongsri \(2025\)Using llms for synthetic data generation: the definitive guide\.Note:[https://www\.confident\-ai\.com/blog/the\-definitive\-guide\-to\-synthetic\-data\-generation\-using\-llms](https://www.confident-ai.com/blog/the-definitive-guide-to-synthetic-data-generation-using-llms)Cofounder at Confident AI, blog postCited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.p1.1)\.
- S\. Wachter \(2019\)Data protection in the age of big data\.Nature Electronics2\(1\),pp\. 6–7\.Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p2.1),[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1)\.
- N\. Wallerstein, B\. Duran,et al\.\(2017\)The theoretical, historical and practice roots of cbpr\.Community\-based participatory research for health: Advancing social and health equity2,pp\. 25–46\.Cited by:[§2\.3\.2](https://arxiv.org/html/2605.06901#S2.SS3.SSS2.p1.1)\.
- A\. Wan, E\. Wallace, S\. Shen, and D\. Klein \(2023\)Poisoning language models during instruction tuning\.InInternational Conference on Machine Learning, ICML 2023, 23\-29 July 2023, Honolulu, Hawaii, USA,A\. Krause, E\. Brunskill, K\. Cho, B\. Engelhardt, S\. Sabato, and J\. Scarlett \(Eds\.\),Proceedings of Machine Learning Research, Vol\.202,pp\. 35413–35425\.External Links:[Link](https://proceedings.mlr.press/v202/wan23b.html)Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p2.1)\.
- A\. Wang, J\. Morgenstern, and J\. P\. Dickerson \(2025a\)Large language models that replace human participants can harmfully misportray and flatten identity groups\.Nature Machine Intelligence7\(3\),pp\. 400–411\.Cited by:[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p1.1),[§3\.2\.2](https://arxiv.org/html/2605.06901#S3.SS2.SSS2.p2.1),[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p2.1)\.
- A\. Wang and O\. Russakovsky \(2021\)Directional bias amplification\.InProceedings of the 38th International Conference on Machine Learning, ICML 2021, 18\-24 July 2021, Virtual Event,M\. Meila and T\. Zhang \(Eds\.\),Proceedings of Machine Learning Research, Vol\.139,pp\. 10882–10893\.External Links:[Link](http://proceedings.mlr.press/v139/wang21t.html)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p1.1)\.
- B\. Wang, W\. Chen, H\. Pei, C\. Xie, M\. Kang, C\. Zhang, C\. Xu, Z\. Xiong, R\. Dutta, R\. Schaeffer, S\. T\. Truong, S\. Arora, M\. Mazeika, D\. Hendrycks, Z\. Lin, Y\. Cheng, S\. Koyejo, D\. Song, and B\. Li \(2023a\)DecodingTrust: A comprehensive assessment of trustworthiness in GPT models\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/63cb9921eecf51bfad27a99b2c53dd6d-Abstract-Datasets%5C_and%5C_Benchmarks.html)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1)\.
- C\. Wang, Y\. Deng, Z\. Lyu, L\. Zeng, J\. He, S\. Yan, and B\. An \(2024a\)Q\*: improving multi\-step reasoning for llms with deliberative planning\.Note:[https://arxiv\.org/abs/2406\.14283v4](https://arxiv.org/abs/2406.14283v4)arXiv:2406\.14283v4 \[cs\.AI\], version dated 22 Jul 2024Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p1.1)\.
- D\. Wang and S\. Zhang \(2024\)Large language models in medical and healthcare fields: applications, advances, and challenges\.Artificial Intelligence Review57\(11\),pp\. 299\.Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- D\. Wang, S\. Prabhat, and N\. Sambasivan \(2022\)Whose ai dream? in search of the aspiration in data annotation\.\.InProceedings of the 2022 CHI conference on human factors in computing systems,pp\. 1–16\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px1.p1.1)\.
- J\. Wang, Y\. Lu, M\. Weber, M\. Ryabinin, D\. I\. Adelani, Y\. Chen, R\. Tang, and P\. Stenetorp \(2025b\)Multilingual language model pretraining using machine\-translated data\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 28075–28095\.Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- J\. Wang, W\. Ma, P\. Sun, M\. Zhang, and J\. Nie \(2024b\)Understanding user experience in large language model interactions\.External Links:2401\.08329,[Link](https://arxiv.org/abs/2401.08329)Cited by:[§5\.2\.1](https://arxiv.org/html/2605.06901#S5.SS2.SSS1.p7.1)\.
- L\. Wang, K\. Yurechko, P\. Dani, Q\. Z\. Chen, and A\. X\. Zhang \(2025c\)End user authoring of personalized content classifiers: comparing example labeling, rule writing, and llm prompting\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–21\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- Q\. Wang, Q\. Zeng, L\. Huang, K\. Knight, H\. Ji, and N\. Rajani \(2020\)ReviewRobot: explainable paper review generation based on knowledge synthesis\.ArXivabs/2010\.06119\.External Links:[Link](https://api.semanticscholar.org/CorpusID:222310232)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p5.1)\.
- R\. Wang and D\. Demszky \(2023\)Is ChatGPT a good teacher coach? measuring zero\-shot performance for scoring and providing actionable insights on classroom instruction\.InProceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications \(BEA 2023\),E\. Kochmar, J\. Burstein, A\. Horbach, R\. Laarmann\-Quante, N\. Madnani, A\. Tack, V\. Yaneva, Z\. Yuan, and T\. Zesch \(Eds\.\),Toronto, Canada,pp\. 626–667\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.bea-1.53),[Link](https://aclanthology.org/2023.bea-1.53)Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1)\.
- R\. Wang and D\. Demszky \(2024\)Edu\-ConvoKit: an open\-source library for education conversation data\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 3: System Demonstrations\),K\. Chang, A\. Lee, and N\. Rajani \(Eds\.\),Mexico City, Mexico,pp\. 61–69\.External Links:[Link](https://aclanthology.org/2024.naacl-demo.6)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px3.p1.1)\.
- R\. E\. Wang, A\. T\. Ribeiro, C\. D\. Robinson, S\. Loeb, and D\. Demszky \(2025d\)Tutor copilot: a human\-ai approach for scaling real\-time expertise\.External Links:2410\.03017,[Link](https://arxiv.org/abs/2410.03017)Cited by:[§2\.3\.1](https://arxiv.org/html/2605.06901#S2.SS3.SSS1.p2.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p2.1)\.
- R\. Wang, P\. Wirawarn, N\. Goodman, and D\. Demszky \(2023b\)SIGHT: a large annotated dataset on student insights gathered from higher education transcripts\.InProceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications \(BEA 2023\),E\. Kochmar, J\. Burstein, A\. Horbach, R\. Laarmann\-Quante, N\. Madnani, A\. Tack, V\. Yaneva, Z\. Yuan, and T\. Zesch \(Eds\.\),Toronto, Canada,pp\. 315–351\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.bea-1.27),[Link](https://aclanthology.org/2023.bea-1.27)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px3.p1.1)\.
- R\. Wang, Q\. Zhang, C\. Robinson, S\. Loeb, and D\. Demszky \(2024c\)Bridging the novice\-expert gap via models of decision\-making: a case study on remediating math mistakes\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 2174–2199\.External Links:[Link](https://aclanthology.org/2024.naacl-long.120)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px3.p1.1),[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px4.p1.1)\.
- S\. Wang, T\. Xu, H\. Li, C\. Zhang, J\. Liang, J\. Tang, P\. S\. Yu, and Q\. Wen \(2024d\)Large language models for education: a survey and outlook\.External Links:2403\.18105,[Link](https://arxiv.org/abs/2403.18105)Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- T\. Wang, M\. Tao, R\. Fang, H\. Wang, S\. Wang, Y\. E\. Jiang, and W\. Zhou \(2024e\)AI persona: towards life\-long personalization of llms\.arXiv preprint arXiv:2412\.13103\.Cited by:[§4\.4\.2](https://arxiv.org/html/2605.06901#S4.SS4.SSS2.p2.1)\.
- Y\. Wang, Y\. Kordi, S\. Mishra, A\. Liu, N\. A\. Smith, D\. Khashabi, and H\. Hajishirzi \(2023c\)Self\-instruct: aligning language models with self\-generated instructions\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 13484–13508\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.754),[Link](https://aclanthology.org/2023.acl-long.754)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1),[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p1.1)\.
- Z\. Wang, Z\. Wu, X\. Guan, M\. Thaler, A\. Koshiyama, S\. Lu, S\. Beepath, E\. Ertekin, and M\. Perez\-Ortiz \(2024f\)Jobfair: a framework for benchmarking gender hiring bias in large language models\.InFindings of the association for computational linguistics: EMNLP 2024,pp\. 3227–3246\.Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p3.1)\.
- Z\. Z\. Wang, Y\. Shao, O\. Shaikh, D\. Fried, G\. Neubig, and D\. Yang \(2025e\)How do ai agents do human work? comparing ai and human workflows across diverse occupations\.arXiv preprint arXiv:2510\.22780\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1),[§7\.3](https://arxiv.org/html/2605.06901#S7.SS3.SSS0.Px1.p2.1)\.
- Z\. Z\. Wang, J\. Yang, K\. Lieret, A\. Tartaglini, V\. Chen, Y\. Wei, Z\. W\. L\. Zhang, K\. Narasimhan, L\. Schmidt, G\. Neubig,et al\.\(2026\)Position: humans are missing from ai coding agent research\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px2.p1.1),[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.p1.1)\.
- M\. J\. Warrens \(2015\)Five ways to look at cohen’s kappa\.Journal of Psychology & Psychotherapy5\.Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- M\. Weber, D\. Fu, Q\. Anthony, Y\. Oren, S\. Adams, A\. Alexandrov, X\. Lyu, H\. Nguyen, X\. Yao, V\. Adams,et al\.\(2024\)Redpajama: an open dataset for training large language models\.Advances in neural information processing systems37,pp\. 116462–116492\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p5.1),[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p1.1)\.
- A\. Wei, N\. Haghtalab, and J\. Steinhardt \(2023\)Jailbroken: how does LLM safety training fail?\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/fd6613131889a4b656206c50a8bd7790-Abstract-Conference.html)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p1.1)\.
- J\. Wei, M\. Bosma, V\. Y\. Zhao, K\. Guu, A\. W\. Yu, B\. Lester, N\. Du, A\. M\. Dai, and Q\. V\. Le \(2022a\)Finetuned language models are zero\-shot learners\.InThe Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\-29, 2022,External Links:[Link](https://openreview.net/forum?id=gEZrGCozdqR)Cited by:[§3\.1\.2](https://arxiv.org/html/2605.06901#S3.SS1.SSS2.p3.1)\.
- J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, B\. Ichter, F\. Xia, E\. H\. Chi, Q\. V\. Le, and D\. Zhou \(2022b\)Chain\-of\-thought prompting elicits reasoning in large language models\.InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 \- December 9, 2022,S\. Koyejo, S\. Mohamed, A\. Agarwal, D\. Belgrave, K\. Cho, and A\. Oh \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html)Cited by:[§4\.2\.3](https://arxiv.org/html/2605.06901#S4.SS2.SSS3.Px1.p2.1)\.
- L\. Weidinger, J\. Mellor, M\. Rauh, C\. Griffin, J\. Uesato, P\. Huang, M\. Cheng, M\. Glaese, B\. Balle, A\. Kasirzadeh, Z\. Kenton, S\. Brown, W\. Hawkins, T\. Stepleton, C\. Biles, A\. Birhane, J\. Haas, L\. Rimell, L\. A\. Hendricks, W\. Isaac, S\. Legassick, G\. Irving, and I\. Gabriel \(2021\)Ethical and social risks of harm from language models\.Vol\.abs/2112\.04359\.External Links:[Link](https://arxiv.org/abs/2112.04359)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p1.1)\.
- L\. Weidinger, M\. Rauh, N\. Marchal, A\. Manzini, L\. A\. Hendricks, J\. Mateos\-Garcia, S\. Bergman, J\. Kay, C\. Griffin, B\. Bariach,et al\.\(2023\)Sociotechnical safety evaluation of generative ai systems\.ArXiv preprintabs/2310\.11986\.External Links:[Link](https://arxiv.org/abs/2310.11986)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p1.1)\.
- L\. Weidinger, J\. Uesato, M\. Rauh, C\. Griffin, P\. Huang, J\. Mellor, A\. Glaese, M\. Cheng, B\. Balle, A\. Kasirzadeh,et al\.\(2022\)Taxonomy of risks posed by language models\.InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency,pp\. 214–229\.Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px2.p1.1)\.
- G\. Wenzek, M\. Lachaux, A\. Conneau, V\. Chaudhary, F\. Guzmán, A\. Joulin, and E\. Grave \(2020\)CCNet: extracting high quality monolingual datasets from web crawl data\.InProceedings of the Twelfth Language Resources and Evaluation Conference,N\. Calzolari, F\. Béchet, P\. Blache, K\. Choukri, C\. Cieri, T\. Declerck, S\. Goggi, H\. Isahara, B\. Maegaard, J\. Mariani, H\. Mazo, A\. Moreno, J\. Odijk, and S\. Piperidis \(Eds\.\),Marseille, France,pp\. 4003–4012\(English\)\.External Links:ISBN 979\-10\-95546\-34\-4,[Link](https://aclanthology.org/2020.lrec-1.494)Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p1.1),[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- S\. J\. Westwood \(2025\)The potential existential threat of large language models to online survey research\.Proceedings of the National Academy of Sciences122\(47\),pp\. e2518075122\.Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p3.1)\.
- A\. Wettig, A\. Gupta, S\. Malik, and D\. Chen \(2024\)Qurating: selecting high\-quality data for training language models\.arXiv preprint arXiv:2402\.09739\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p3.1)\.
- N\. Wies, Y\. Levine, and A\. Shashua \(2023\)The learnability of in\-context learning\.Advances in Neural Information Processing Systems36,pp\. 36637–36651\.Cited by:[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p5.1)\.
- J\. Wieting, T\. Berg\-Kirkpatrick, K\. Gimpel, and G\. Neubig \(2019\)Beyond BLEU: training neural machine translation with semantic similarity\.InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics,A\. Korhonen, D\. Traum, and L\. Màrquez \(Eds\.\),Florence, Italy,pp\. 4344–4355\.External Links:[Link](https://aclanthology.org/P19-1427/),[Document](https://dx.doi.org/10.18653/v1/P19-1427)Cited by:[§5\.1\.2](https://arxiv.org/html/2605.06901#S5.SS1.SSS2.p2.1)\.
- K\. Wilson, M\. Sim, A\. Gueorguieva, and A\. Caliskan \(2025\)No thoughts just ai: biased llm hiring recommendations alter human decision making and limit human autonomy\.InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society,Vol\.8,pp\. 2692–2704\.Cited by:[§2\.2\.4](https://arxiv.org/html/2605.06901#S2.SS2.SSS4.p2.1)\.
- G\. I\. Winata, A\. Madotto, Z\. Lin, R\. Liu, J\. Yosinski, and P\. Fung \(2021\)Language models are few\-shot multilingual learners\.InProceedings of the 1st Workshop on Multilingual Representation Learning,D\. Ataman, A\. Birch, A\. Conneau, O\. Firat, S\. Ruder, and G\. G\. Sahin \(Eds\.\),Punta Cana, Dominican Republic,pp\. 1–15\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.mrl-1.1),[Link](https://aclanthology.org/2021.mrl-1.1)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px2.p5.1)\.
- B\. Workshop, T\. L\. Scao, A\. Fan, C\. Akiki, E\. Pavlick, S\. Ilić, D\. Hesslow, R\. Castagné, A\. S\. Luccioni, F\. Yvon,et al\.\(2022\)Bloom: a 176b\-parameter open\-access multilingual language model\.arXiv preprint arXiv:2211\.05100\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p3.1)\.
- A\. P\. Wright, Z\. J\. Wang, H\. Park, G\. Guo, F\. Sperrle, M\. El\-Assady, A\. Endert, D\. Keim, and D\. H\. Chau \(2020\)A comparative analysis of industry human\-ai interaction guidelines\.arXiv preprint arXiv:2010\.11761\.Cited by:[§2\.2](https://arxiv.org/html/2605.06901#S2.SS2.p2.1)\.
- S\. Wu, M\. Galley, B\. Peng, H\. Cheng, G\. Li, Y\. Dou, W\. Cai, J\. Zou, J\. Leskovec, and J\. Gao \(2025a\)CollabLLM: from passive responders to active collaborators\.InInternational Conference on Machine Learning,pp\. 67260–67283\.Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p1.1)\.
- T\. Wu, M\. Terry, and C\. J\. Cai \(2022a\)AI chains: transparent and controllable human\-ai interaction by chaining large language model prompts\.InCHI ’22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022 \- 5 May 2022,S\. D\. J\. Barbosa, C\. Lampe, C\. Appert, D\. A\. Shamma, S\. M\. Drucker, J\. R\. Williamson, and K\. Yatani \(Eds\.\),pp\. 385:1–385:22\.External Links:[Document](https://dx.doi.org/10.1145/3491102.3517582),[Link](https://doi.org/10.1145/3491102.3517582)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p3.1)\.
- X\. Wu, L\. Gong, and D\. Xiong \(2022b\)Adaptive differential privacy for language model training\.InProceedings of the First Workshop on Federated Learning for Natural Language Processing \(FL4NLP 2022\),B\. Y\. Lin, C\. He, C\. Xie, F\. Mireshghallah, N\. Mehrabi, T\. Li, M\. Soltanolkotabi, and X\. Ren \(Eds\.\),Dublin, Ireland,pp\. 21–26\.External Links:[Document](https://dx.doi.org/10.18653/v1/2022.fl4nlp-1.3),[Link](https://aclanthology.org/2022.fl4nlp-1.3)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- Z\. Wu, A\. Arora, A\. Geiger, Z\. Wang, J\. Huang, D\. Jurafsky, C\. D\. Manning, and C\. Potts \(2025b\)Axbench: steering llms? even simple baselines outperform sparse autoencoders\.arXiv preprint arXiv:2501\.17148\.Cited by:[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- Z\. Wu, A\. Arora, Z\. Wang, A\. Geiger, D\. Jurafsky, C\. D\. Manning, and C\. Potts \(2024\)Reft: representation finetuning for language models\.ArXiv preprintabs/2404\.03592\.External Links:[Link](https://arxiv.org/abs/2404.03592)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1),[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- Z\. Wu, Q\. Yu, A\. Arora, C\. D\. Manning, and C\. Potts \(2025c\)Improved representation steering for language models\.arXiv preprint arXiv:2505\.20809\.Cited by:[§6\.1\.3](https://arxiv.org/html/2605.06901#S6.SS1.SSS3.Px2.p1.1)\.
- Y\. Xia, C\. Wang, J\. Mabry, and G\. Cheng \(2024\)Advancing retail data science: comprehensive evaluation of synthetic data\.Note:arXiv preprint arXiv:2406\.13130v1 \[cs\.LG\]19 Jun 2024\. Licensed under CC BY 4\.0Cited by:[§3\.4\.1](https://arxiv.org/html/2605.06901#S3.SS4.SSS1.Px1.p3.1)\.
- Q\. Xie, W\. Han, Z\. Chen, R\. Xiang, X\. Zhang, Y\. He, M\. Xiao, D\. Li, Y\. Dai, D\. Feng,et al\.\(2024\)Finben: a holistic financial benchmark for large language models\.Advances in Neural Information Processing Systems37,pp\. 95716–95743\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- Q\. Xie, W\. Han, X\. Zhang, Y\. Lai, M\. Peng, A\. Lopez\-Lira, and J\. Huang \(2023a\)PIXIU: a large language model, instruction data and evaluation benchmark for finance\.Vol\.abs/2306\.05443\.External Links:[Link](https://arxiv.org/abs/2306.05443)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1.p3.1)\.
- Z\. Xie, M\. Li, T\. Cohn, and J\. Lau \(2023b\)DeltaScore: fine\-grained story evaluation with perturbations\.InFindings of the Association for Computational Linguistics: EMNLP 2023,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 5317–5331\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.findings-emnlp.353),[Link](https://aclanthology.org/2023.findings-emnlp.353)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- M\. Xu, D\. Cai, Y\. Wu, X\. Li, and S\. Wang \(2023a\)Fwdllm: efficient fedllm using forward gradient\.ArXiv preprintabs/2308\.13894\.External Links:[Link](https://arxiv.org/abs/2308.13894)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- S\. Xu, X\. Huang, C\. K\. Lo, G\. Chen, and M\. S\. Jong \(2024a\)Evaluating the performance of chatgpt and gpt\-4o in coding classroom discourse data: a study of synchronous online mathematics instruction\.Computers and Education: Artificial Intelligence7,pp\. 100325\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.caeai.2024.100325),ISSN 2666\-920X,[Link](https://www.sciencedirect.com/science/article/pii/S2666920X24001280)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px3.p1.1)\.
- W\. Xu \(2019\)Toward human\-centered ai: a perspective from human\-computer interaction\.interactions26\(4\),pp\. 42–46\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p4.1)\.
- W\. Xu, D\. Wang, L\. Pan, Z\. Song, M\. Freitag, W\. Y\. Wang, and L\. Li \(2023b\)INSTRUCTSCORE: explainable text generation evaluation with finegrained feedback\.Vol\.abs/2305\.14282\.External Links:[Link](https://arxiv.org/abs/2305.14282)Cited by:[§5\.1\.3](https://arxiv.org/html/2605.06901#S5.SS1.SSS3.p2.1)\.
- Z\. Xu, Y\. Liu, G\. Deng, Y\. Li, and S\. Picek \(2024b\)A comprehensive study of jailbreak attack versus defense for large language models\.InFindings of the Association for Computational Linguistics: ACL 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 7432–7449\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.443),[Link](https://aclanthology.org/2024.findings-acl.443)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p2.1)\.
- L\. Xue, N\. Constant, A\. Roberts, M\. Kale, R\. Al\-Rfou, A\. Siddhant, A\. Barua, and C\. Raffel \(2021\)MT5: a massively multilingual pre\-trained text\-to\-text transformer\.InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,pp\. 483–498\.Cited by:[§3\.1\.1](https://arxiv.org/html/2605.06901#S3.SS1.SSS1.p4.1)\.
- B\. Yan, K\. Li, M\. Xu, Y\. Dong, Y\. Zhang, Z\. Ren, and X\. Cheng \(2024\)On protecting the data privacy of large language models \(llms\): a survey\.ArXiv preprintabs/2403\.05156\.External Links:[Link](https://arxiv.org/abs/2403.05156)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1),[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p1.1),[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1),[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1)\.
- D\. Yang, C\. Ziems, W\. Held, O\. Shaikh, M\. S\. Bernstein, and J\. Mitchell \(2024a\)Social skill training with large language models\.Vol\.abs/2404\.04204\.External Links:[Link](https://arxiv.org/abs/2404.04204)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p5.1),[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p8.1)\.
- J\. Yang, C\. E\. Jimenez, A\. Wettig, K\. Lieret, S\. Yao, K\. Narasimhan, and O\. Press \(2024b\)Swe\-agent: agent\-computer interfaces enable automated software engineering\.Advances in Neural Information Processing Systems37,pp\. 50528–50652\.Cited by:[§7](https://arxiv.org/html/2605.06901#S7.p1.1)\.
- K\. Yang, S\. Ji, T\. Zhang, Q\. Xie, Z\. Kuang, and S\. Ananiadou \(2023a\)Towards interpretable mental health analysis with large language models\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 6056–6077\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.370),[Link](https://aclanthology.org/2023.emnlp-main.370)Cited by:[§5\.3](https://arxiv.org/html/2605.06901#S5.SS3.p3.1)\.
- Q\. Yang, A\. Steinfeld, C\. P\. Rosé, and J\. Zimmerman \(2020\)Re\-examining whether, why, and how human\-ai interaction is uniquely difficult to design\.InCHI ’20: CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25\-30, 2020,R\. Bernhaupt, F\. ’\. Mueller, D\. Verweij, J\. Andres, J\. McGrenere, A\. Cockburn, I\. Avellino, A\. Goguey, P\. Bjøn, S\. Zhao, B\. P\. Samson, and R\. Kocielnik \(Eds\.\),pp\. 1–13\.External Links:[Document](https://dx.doi.org/10.1145/3313831.3376301),[Link](https://doi.org/10.1145/3313831.3376301)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p2.1),[§2\.2](https://arxiv.org/html/2605.06901#S2.SS2.p2.1)\.
- X\. Yang, X\. Wang, Q\. Zhang, L\. Petzold, W\. Y\. Wang, X\. Zhao, and D\. Lin \(2023b\)Shadow alignment: the ease of subverting safely\-aligned language models\.Vol\.abs/2310\.02949\.External Links:[Link](https://arxiv.org/abs/2310.02949)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- J\. Yi, R\. Ye, Q\. Chen, B\. Zhu, S\. Chen, D\. Lian, G\. Sun, X\. Xie, and F\. Wu \(2024a\)On the vulnerability of safety alignment in open\-access LLMs\.InFindings of the Association for Computational Linguistics: ACL 2024,L\. Ku, A\. Martins, and V\. Srikumar \(Eds\.\),Bangkok, Thailand,pp\. 9236–9260\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-acl.549),[Link](https://aclanthology.org/2024.findings-acl.549)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- S\. Yi, Y\. Liu, Z\. Sun, T\. Cong, X\. He, J\. Song, K\. Xu, and Q\. Li \(2024b\)Jailbreak attacks and defenses against large language models: a survey\.Vol\.abs/2407\.04295\.External Links:[Link](https://arxiv.org/abs/2407.04295)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
- N\. Yildirim, M\. Pushkarna, N\. Goyal, M\. Wattenberg, and F\. Viégas \(2023\)Investigating how practitioners use human\-ai guidelines: a case study on the people\+ ai guidebook\.InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems,pp\. 1–13\.Cited by:[§2\.2](https://arxiv.org/html/2605.06901#S2.SS2.p2.1)\.
- N\. Yildirim, H\. Richardson, M\. T\. Wetscherek, J\. Bajwa, J\. Jacob, M\. A\. Pinnock, S\. Harris, D\. C\. de Castro, S\. Bannur, S\. L\. Hyland, P\. Ghosh, M\. Ranjit, K\. Bouzid, A\. Schwaighofer, F\. Pérez\-García, H\. Sharma, O\. Oktay, M\. P\. Lungren, J\. Alvarez\-Valle, A\. V\. Nori, and A\. Thieme \(2024\)Multimodal healthcare AI: identifying and designing clinically relevant vision\-language applications for radiology\.InProceedings of the CHI Conference on Human Factors in Computing Systems, CHI 2024, Honolulu, HI, USA, May 11\-16, 2024,F\. ’\. Mueller, P\. Kyburz, J\. R\. Williamson, C\. Sas, M\. L\. Wilson, P\. O\. T\. Dugas, and I\. Shklovski \(Eds\.\),pp\. 444:1–444:22\.External Links:[Document](https://dx.doi.org/10.1145/3613904.3642013),[Link](https://doi.org/10.1145/3613904.3642013)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- S\. Yin, C\. Fu, S\. Zhao, K\. Li, X\. Sun, T\. Xu, and E\. Chen \(2024\)A survey on multimodal large language models\.National Science Review,pp\. nwae403\.Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- Z\. Yong, B\. Ermis, M\. Fadaee, S\. Bach, and J\. Kreutzer \(2025\)The state of multilingual llm safety research: from measuring the language gap to mitigating it\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing,pp\. 15856–15871\.Cited by:[§4\.6\.2](https://arxiv.org/html/2605.06901#S4.SS6.SSS2.p1.1)\.
- H\. Yu, S\. Jeong, S\. Pawar, J\. Shin, J\. Jin, J\. Myung, A\. Oh, and I\. Augenstein \(2025\)Entangled in representations: mechanistic investigation of cultural biases in large language models\.arXiv preprint arXiv:2508\.08879\.Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- L\. Yu, V\. Do, K\. Hambardzumyan, and N\. Cancedda \(2024a\)Robust llm safeguarding via refusal feature adversarial training\.Vol\.abs/2409\.20089\.External Links:[Link](https://arxiv.org/abs/2409.20089)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- S\. Yu, K\. Lin, A\. Xiao, J\. Duan, and H\. Soh \(2024b\)Octopi: object property reasoning with large tactile\-language models\.ArXiv preprintabs/2405\.02794\.External Links:[Link](https://arxiv.org/abs/2405.02794)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- S\. Yu, J\. P\. Munoz, and A\. Jannesari \(2024c\)Federated foundation models: privacy\-preserving and collaborative learning for large models\.InProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation \(LREC\-COLING 2024\),N\. Calzolari, M\. Kan, V\. Hoste, A\. Lenci, S\. Sakti, and N\. Xue \(Eds\.\),Torino, Italia,pp\. 7174–7184\.External Links:[Link](https://aclanthology.org/2024.lrec-main.630)Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p2.1)\.
- Z\. Yu, X\. Liu, S\. Liang, Z\. Cameron, C\. Xiao, and N\. Zhang \(2024d\)Don’t listen to me: understanding and exploring jailbreak prompts of large language models\.ArXiv preprintabs/2403\.17336\.External Links:[Link](https://arxiv.org/abs/2403.17336)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p3.1)\.
- Z\. Yuan, H\. Yuan, C\. Tan, W\. Wang, S\. Huang, and F\. Huang \(2023\)RRHF: rank responses to align language models with human feedback without tears\.Vol\.abs/2304\.05302\.External Links:[Link](https://arxiv.org/abs/2304.05302)Cited by:[§4\.2\.1](https://arxiv.org/html/2605.06901#S4.SS2.SSS1.p2.1)\.
- S\. Yue, W\. Chen, S\. Wang, B\. Li, C\. Shen, S\. Liu, Y\. Zhou, Y\. Xiao, S\. Yun, X\. Huang,et al\.\(2023\)Disc\-lawllm: fine\-tuning large language models for intelligent legal services\.arXiv preprint arXiv:2309\.11325\.Cited by:[§4\.1](https://arxiv.org/html/2605.06901#S4.SS1.p1.1)\.
- X\. Yue, Y\. Song, A\. Asai, S\. Kim, J\. de Dieu Nyandwi, S\. Khanuja, A\. Kantharuban, L\. Sutawika, S\. Ramamoorthy, and G\. Neubig \(2024\)Pangea: a fully open multilingual multimodal llm for 39 languages\.InThe Thirteenth International Conference on Learning Representations,Cited by:[§4\.6\.1](https://arxiv.org/html/2605.06901#S4.SS6.SSS1.p2.1)\.
- J\.D\. Zamfirescu\-Pereira, R\. Y\. Wong, B\. Hartmann, and Q\. Yang \(2023\)Why johnny can’t prompt: how non\-ai experts try \(and fail\) to design llm prompts\.InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems,CHI ’23,New York, NY, USA\.External Links:ISBN 9781450394215,[Link](https://doi.org/10.1145/3544548.3581388),[Document](https://dx.doi.org/10.1145/3544548.3581388)Cited by:[§2\.2\.1](https://arxiv.org/html/2605.06901#S2.SS2.SSS1.p2.1)\.
- M\. Zao\-Sanders \(2025\)How people are really using gen ai in 2025\.Harvard Business Review\.External Links:[Link](https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025)Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- Y\. Zeng, H\. Lin, J\. Zhang, D\. Yang, R\. Jia, and W\. Shi \(2024\)How johnny can persuade llms to jailbreak them: rethinking persuasion to challenge ai safety by humanizing llms\.ArXiv preprintabs/2401\.06373\.External Links:[Link](https://arxiv.org/abs/2401.06373)Cited by:[§4\.1\.2](https://arxiv.org/html/2605.06901#S4.SS1.SSS2.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p3.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
- Q\. Zhan, R\. Fang, R\. Bindu, A\. Gupta, T\. Hashimoto, and D\. Kang \(2024\)Removing RLHF protections in GPT\-4 via fine\-tuning\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 2: Short Papers\),K\. Duh, H\. Gomez, and S\. Bethard \(Eds\.\),Mexico City, Mexico,pp\. 681–687\.External Links:[Link](https://aclanthology.org/2024.naacl-short.59)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p6.1)\.
- B\. Zhang, Z\. Liu, C\. Cherry, and O\. Firat \(2024a\)When scaling meets llm finetuning: the effect of data, model and finetuning method\.ArXiv preprintabs/2402\.17193\.External Links:[Link](https://arxiv.org/abs/2402.17193)Cited by:[§4\.3\.2](https://arxiv.org/html/2605.06901#S4.SS3.SSS2.p4.3)\.
- D\. Zhang, B\. Xia, Y\. Liu, X\. Xu, T\. Hoang, Z\. Xing, M\. Staples, Q\. Lu, and L\. Zhu \(2023a\)Tag your fish in the broken net: a responsible web framework for protecting online privacy and copyright\.arXiv preprint arXiv:2310\.07915\.Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p1.1)\.
- D\. Zhang, B\. Xia, Y\. Liu, X\. Xu, T\. Hoang, Z\. Xing, M\. Staples, Q\. Lu, and L\. Zhu \(2024b\)Privacy and copyright protection in generative ai: a lifecycle perspective\.InProceedings of the IEEE/ACM 3rd International Conference on AI Engineering\-Software Engineering for AI,pp\. 92–97\.Cited by:[§3\.3\.2](https://arxiv.org/html/2605.06901#S3.SS3.SSS2.p1.1),[§3\.3\.3](https://arxiv.org/html/2605.06901#S3.SS3.SSS3.p1.1),[§3\.3](https://arxiv.org/html/2605.06901#S3.SS3.p1.1)\.
- H\. Zhang, X\. Li, and L\. Bing \(2023b\)Video\-LLaMA: an instruction\-tuned audio\-visual language model for video understanding\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations,Y\. Feng and E\. Lefever \(Eds\.\),Singapore,pp\. 543–553\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-demo.49),[Link](https://aclanthology.org/2023.emnlp-demo.49)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- J\. Zhang, A\. Elgohary, A\. Magooda, D\. Khashabi, and B\. V\. Durme \(2024c\)Controllable safety alignment: inference\-time adaptation to diverse safety requirements\.External Links:2410\.08968,[Link](https://arxiv.org/abs/2410.08968)Cited by:[§4\.3\.4](https://arxiv.org/html/2605.06901#S4.SS3.SSS4.p1.1)\.
- L\. H\. Zhang, S\. Milli, K\. L\. Jusko, J\. Smith, B\. Amos, W\. Bouaziz, J\. Kussman, M\. Revel, L\. Titus, B\. Radharapu,et al\.\(2026\)Cultivating pluralism in algorithmic monoculture: the community alignment dataset\.InICLR,Cited by:[§3\.1\.3](https://arxiv.org/html/2605.06901#S3.SS1.SSS3.p4.1)\.
- R\. Zhang, H\. Li, H\. Meng, J\. Zhan, H\. Gan, and Y\. Lee \(2025a\)The dark side of ai companionship: a taxonomy of harmful algorithmic behaviors in human\-ai relationships\.InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems,pp\. 1–17\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- R\. Zhang, Q\. Yu, M\. Zang, C\. Eickhoff, and E\. Pavlick \(2024d\)The same but different: structural similarities and differences in multilingual language modeling\.Vol\.abs/2410\.09223\.External Links:[Link](https://arxiv.org/abs/2410.09223)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- S\. Zhang, S\. Roller, N\. Goyal, M\. Artetxe, M\. Chen, S\. Chen, C\. Dewan, M\. T\. Diab, X\. Li, X\. V\. Lin, T\. Mihaylov, M\. Ott, S\. Shleifer, K\. Shuster, D\. Simig, P\. S\. Koura, A\. Sridhar, T\. Wang, and L\. Zettlemoyer \(2022\)OPT: open pre\-trained transformer language models\.ArXivabs/2205\.01068\.External Links:[Link](https://api.semanticscholar.org/CorpusID:248496292)Cited by:[§6\.3](https://arxiv.org/html/2605.06901#S6.SS3.p1.1)\.
- X\. Zhang, H\. Liu, K\. Xu, Q\. Zhang, D\. Liu, B\. Ahmed, and J\. Epps \(2024e\)When llms meets acoustic landmarks: an efficient approach to integrate speech into large language models for depression detection\.ArXiv preprintabs/2402\.13276\.External Links:[Link](https://arxiv.org/abs/2402.13276)Cited by:[§3\.4\.2](https://arxiv.org/html/2605.06901#S3.SS4.SSS2.Px1.p1.1)\.
- Y\. Zhang, X\. Chen, B\. Jin, S\. Wang, S\. Ji, W\. Wang, and J\. Han \(2024f\)A comprehensive survey of scientific large language models and their applications in scientific discovery\.InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing,pp\. 8783–8817\.Cited by:[§1](https://arxiv.org/html/2605.06901#S1.p1.1)\.
- Y\. Zhang, D\. Zhao, J\. T\. Hancock, R\. Kraut, and D\. Yang \(2025b\)The rise of ai companions: how human\-chatbot relationships influence well\-being\.arXiv preprint arXiv:2506\.12605\.Cited by:[§2\.2\.3](https://arxiv.org/html/2605.06901#S2.SS2.SSS3.p2.1)\.
- Z\. Zhang, R\. A\. Rossi, B\. Kveton, Y\. Shao, D\. Yang, H\. Zamani, F\. Dernoncourt, J\. Barrow, T\. Yu, S\. Kim,et al\.\(2024g\)Personalization of large language models: a survey\.Transactions on Machine Learning Research\.Cited by:[§4\.4\.1](https://arxiv.org/html/2605.06901#S4.SS4.SSS1.Px3.p1.1),[§4\.4\.2](https://arxiv.org/html/2605.06901#S4.SS4.SSS2.p2.1),[§4\.4](https://arxiv.org/html/2605.06901#S4.SS4.p1.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p2.1)\.
- Z\. Zhang, L\. Lei, L\. Wu, R\. Sun, Y\. Huang, C\. Long, X\. Liu, X\. Lei, J\. Tang, and M\. Huang \(2023c\)Safetybench: evaluating the safety of large language models with multiple choice questions\.ArXiv preprintabs/2309\.07045\.External Links:[Link](https://arxiv.org/abs/2309.07045)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px2.p1.1)\.
- Z\. Zhang, Y\. E\. Zhang, F\. Shi, and T\. Li \(2025c\)Autonomy matters: a study on personalization\-privacy dilemma in llm agents\.arXiv preprint arXiv:2510\.04465\.Cited by:[§6\.2\.2](https://arxiv.org/html/2605.06901#S6.SS2.SSS2.p1.1)\.
- D\. Zhao, Q\. Ma, X\. Zhao, C\. Si, C\. Yang, R\. Louie, E\. Reiter, D\. Yang, and T\. Wu \(2025a\)SPHERE: an evaluation card for human\-ai systems\.InFindings of the Association for Computational Linguistics: ACL 2025,pp\. 1340–1365\.Cited by:[§2\.3\.3](https://arxiv.org/html/2605.06901#S2.SS3.SSS3.p2.1)\.
- D\. Zhao, D\. Yang, and M\. S\. Bernstein \(2025b\)Knoll: creating a knowledge ecosystem for large language models\.InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology,pp\. 1–23\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- H\. Zhao, H\. Chen, F\. Yang, N\. Liu, H\. Deng, H\. Cai, S\. Wang, D\. Yin, and M\. Du \(2024a\)Explainability for large language models: a survey\.ACM Trans\. Intell\. Syst\. Technol\.15\(2\)\.External Links:ISSN 2157\-6904,[Link](https://doi.org/10.1145/3639372),[Document](https://dx.doi.org/10.1145/3639372)Cited by:[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px1.p1.1)\.
- J\. Zhao, J\. Huang, Z\. Wu, D\. Bau, and W\. Shi \(2025c\)Llms encode harmfulness and refusal separately\.arXiv preprint arXiv:2507\.11878\.Cited by:[§4\.4\.2](https://arxiv.org/html/2605.06901#S4.SS4.SSS2.p2.1),[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px2.p1.1)\.
- J\. Zhao, T\. Wang, M\. Yatskar, V\. Ordonez, and K\. Chang \(2017\)Men also like shopping: reducing gender bias amplification using corpus\-level constraints\.InProceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,M\. Palmer, R\. Hwa, and S\. Riedel \(Eds\.\),Copenhagen, Denmark,pp\. 2979–2989\.External Links:[Document](https://dx.doi.org/10.18653/v1/D17-1323),[Link](https://aclanthology.org/D17-1323)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p1.1)\.
- J\. Zhao, T\. Wang, M\. Yatskar, V\. Ordonez, and K\. Chang \(2018\)Gender bias in coreference resolution: evaluation and debiasing methods\.InProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 \(Short Papers\),M\. Walker, H\. Ji, and A\. Stent \(Eds\.\),New Orleans, Louisiana,pp\. 15–20\.External Links:[Document](https://dx.doi.org/10.18653/v1/N18-2003),[Link](https://aclanthology.org/N18-2003)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p2.1)\.
- S\. Zhao, J\. Dang, and A\. Grover \(2023a\)Group preference optimization: few\-shot alignment of large language models\.InR0\-FoMo:Robustness of Few\-shot and Zero\-shot Learning in Large Foundation Models,External Links:[Link](https://openreview.net/forum?id=p6hzUHjQn1)Cited by:[§4\.5\.1](https://arxiv.org/html/2605.06901#S4.SS5.SSS1.Px2.p1.1)\.
- W\. X\. Zhao, K\. Zhou, J\. Li, T\. Tang, X\. Wang, Y\. Hou, Y\. Min, B\. Zhang, J\. Zhang, Z\. Dong,et al\.\(2023b\)A survey of large language models\.arXiv preprint arXiv:2303\.18223\.Cited by:[§4](https://arxiv.org/html/2605.06901#S4.p2.1)\.
- W\. Zhao, H\. Shao, Z\. Xu, S\. Duan, and D\. Zhang \(2024b\)Measuring copyright risks of large language model via partial information probing\.ArXiv preprintabs/2409\.13831\.External Links:[Link](https://arxiv.org/abs/2409.13831)Cited by:[§3\.3\.1](https://arxiv.org/html/2605.06901#S3.SS3.SSS1.p1.1)\.
- W\. Zhong, L\. Guo, Q\. Gao, H\. Ye, and Y\. Wang \(2024\)Memorybank: enhancing large language models with long\-term memory\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.38,pp\. 19724–19731\.Cited by:[§2\.1](https://arxiv.org/html/2605.06901#S2.SS1.SSS0.Px3.p1.1)\.
- C\. Zhou, P\. Liu, P\. Xu, S\. Iyer, J\. Sun, Y\. Mao, X\. Ma, A\. Efrat, P\. Yu, L\. Yu, S\. Zhang, G\. Ghosh, M\. Lewis, L\. Zettlemoyer, and O\. Levy \(2023a\)LIMA: less is more for alignment\.InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 \- 16, 2023,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),External Links:[Link](http://papers.nips.cc/paper%5C_files/paper/2023/hash/ac662d74829e4407ce1d126477f4a03a-Abstract-Conference.html)Cited by:[§3](https://arxiv.org/html/2605.06901#S3.p1.1)\.
- K\. Zhou, D\. Jurafsky, and T\. Hashimoto \(2023b\)Navigating the grey area: how expressions of uncertainty and overconfidence affect language models\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing,H\. Bouamor, J\. Pino, and K\. Bali \(Eds\.\),Singapore,pp\. 5506–5524\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.emnlp-main.335),[Link](https://aclanthology.org/2023.emnlp-main.335)Cited by:[§2\.2\.2](https://arxiv.org/html/2605.06901#S2.SS2.SSS2.p2.1),[§6\.1\.2](https://arxiv.org/html/2605.06901#S6.SS1.SSS2.Px2.p2.1)\.
- S\. Zhou, F\. F\. Xu, H\. Zhu, X\. Zhou, R\. Lo, A\. Sridhar, X\. Cheng, T\. Ou, Y\. Bisk, D\. Fried,et al\.\(2023c\)WebArena: a realistic web environment for building autonomous agents\.InThe Twelfth International Conference on Learning Representations,Cited by:[§5\.1\.1](https://arxiv.org/html/2605.06901#S5.SS1.SSS1.Px3.p1.1)\.
- D\. Zhu, B\. Haddow, P\. Chen, X\. Shen, M\. Zhang, and D\. Klakow \(2024\)Fine\-tuning large language models to translate: will a touch of noisy data in misaligned languages suffice?\.ArXiv preprintabs/2404\.14122\.External Links:[Link](https://arxiv.org/abs/2404.14122)Cited by:[§4\.1\.1](https://arxiv.org/html/2605.06901#S4.SS1.SSS1.p3.1)\.
- T\. Y\. Zhuo, Y\. Huang, C\. Chen, and Z\. Xing \(2023\)Red teaming chatgpt via jailbreaking: bias, robustness, reliability and toxicity\.ArXiv preprintabs/2301\.12867\.External Links:[Link](https://arxiv.org/abs/2301.12867)Cited by:[§5\.2\.2](https://arxiv.org/html/2605.06901#S5.SS2.SSS2.Px1.p4.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1)\.
- C\. Ziems, J\. Dwivedi\-Yu, Y\. Wang, A\. Halevy, and D\. Yang \(2023a\)NormBank: a knowledge bank of situational social norms\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),A\. Rogers, J\. Boyd\-Graber, and N\. Okazaki \(Eds\.\),Toronto, Canada,pp\. 7756–7776\.External Links:[Document](https://dx.doi.org/10.18653/v1/2023.acl-long.429),[Link](https://aclanthology.org/2023.acl-long.429)Cited by:[§7\.2](https://arxiv.org/html/2605.06901#S7.SS2.SSS0.Px1.p2.1)\.
- C\. Ziems, W\. Held, J\. Yang, J\. Dhamala, R\. Gupta, and D\. Yang \(2023b\)Multi\-value: a framework for cross\-dialectal english nlp\.InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 744–768\.Cited by:[§3\.2\.1](https://arxiv.org/html/2605.06901#S3.SS2.SSS1.p2.1),[§6\.2\.1](https://arxiv.org/html/2605.06901#S6.SS2.SSS1.p4.1)\.
- A\. Zimmermann, A\. Zeppa, S\. Pandey, and K\. Diao \(2025\)Don’t give up on democratizing ai for the wrong reasons\.InThe Thirty\-Ninth Annual Conference on Neural Information Processing Systems Position Paper Track,Cited by:[§6\.3\.2](https://arxiv.org/html/2605.06901#S6.SS3.SSS2.Px2.p2.1)\.
- A\. Zou, L\. Phan, S\. Chen, J\. Campbell, P\. Guo, R\. Ren, A\. Pan, X\. Yin, M\. Mazeika, A\. Dombrowski, S\. Goel, N\. Li, M\. J\. Byun, Z\. Wang, A\. Mallen, S\. Basart, S\. Koyejo, D\. Song, M\. Fredrikson, J\. Z\. Kolter, and D\. Hendrycks \(2023a\)Representation engineering: a top\-down approach to ai transparency\.Vol\.abs/2310\.01405\.External Links:[Link](https://arxiv.org/abs/2310.01405)Cited by:[§6\.1\.1](https://arxiv.org/html/2605.06901#S6.SS1.SSS1.Px1.p1.1)\.
- A\. Zou, Z\. Wang, N\. Carlini, M\. Nasr, J\. Z\. Kolter, and M\. Fredrikson \(2023b\)Universal and transferable adversarial attacks on aligned language models\.Vol\.abs/2307\.15043\.External Links:[Link](https://arxiv.org/abs/2307.15043)Cited by:[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px1.p2.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px2.p1.1),[§5\.2\.3](https://arxiv.org/html/2605.06901#S5.SS2.SSS3.Px3.p4.1)\.
Reflections and New Directions for Human-Centered Large Language Models

Similar Articles

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research

Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures

Learning Transferable Latent User Preferences for Human-Aligned Decision Making

Non-linear Interventions on Large Language Models

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

Submit Feedback

Similar Articles

Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research
Towards Intrinsic Interpretability of Large Language Models: A Survey of Design Principles and Architectures
Learning Transferable Latent User Preferences for Human-Aligned Decision Making
Non-linear Interventions on Large Language Models
Large Language Models over Networks: Collaborative Intelligence under Resource Constraints