Beyond Static Personas: Situational Personality Steering for Large Language Models

arXiv cs.CL Papers

Summary

This paper introduces IRiS, a training-free framework for situational personality steering in LLMs that moves beyond static persona modeling by identifying and leveraging situation-dependent persona neurons. The approach demonstrates that LLM behavior varies contextually and proposes neuron-based identification, retrieval, and weighted steering methods validated on PersonalityBench and a new SPBench benchmark.

arXiv:2604.13846v3 Announce Type: replace Abstract: Personalized Large Language Models (LLMs) facilitate more natural, human-like interactions in human-centric applications. However, existing personalization methods are constrained by limited controllability and high resource demands. Furthermore, their reliance on static personality modeling restricts adaptability across varying situations. To address these limitations, we first demonstrate the existence of situation-dependency and consistent situation-behavior patterns within LLM personalities through a multi-perspective analysis of persona neurons. Building on these insights, we propose IRIS, a training-free, neuron-based Identify-Retrieve-Steer framework for advanced situational personality steering. Our approach comprises situational persona neuron identification, situation-aware neuron retrieval, and similarity-weighted steering. We empirically validate our framework on PersonalityBench and our newly introduced SPBench, a comprehensive situational personality benchmark. Experimental results show that our method surpasses best-performing baselines, demonstrating IRIS's generalization and robustness to complex, unseen situations and different models architecture.
Original Article
View Cached Full Text

Cached at: 04/20/26, 08:32 AM

# Beyond Static Personas: Situational Personality Steering for Large Language Models Source: https://arxiv.org/html/2604.13846 Zesheng Wei1,Mengxiang Li111footnotemark:1Zilei Wang1Yang Deng2 1University of Science and Technology of China 2Singapore Management University \{zswei, mxli02\}@mail\.ustc\.edu\.cn,zlwang@ustc\.edu\.cn ydeng@smu\.edu\.sg ###### Abstract Personalized Large Language Models \(LLMs\) facilitate more natural, human\-like interactions in human\-centric applications\. However, existing personalization methods are constrained by limited controllability and high resource demands\. Furthermore, their reliance on static personality modeling restricts adaptability across varying situations\. To address these limitations, we first demonstrate the existence of situation\-dependency and consistent situation\-behavior patterns within LLM personalities through a multi\-perspective analysis of persona neurons\. Building on these insights, we proposeIRiS, a training\-free, neuron\-based Identify\-Retrieve\-Steer framework for advanced situational personality steering\. Our approach comprises situational persona neuron identification, situation\-aware neuron retrieval, and similarity\-weighted steering\. We empirically validate our framework onPersonalityBenchand our newly introduced SPBench, a comprehensive situational personality benchmark\. Experimental results show that our method surpasses best\-performing baselines, demonstratingIRiS’s generalization and robustness to complex, unseen situations and different models architecture\. Beyond Static Personas: Situational Personality Steering for Large Language Models Zesheng Wei1††thanks:Equal contribution\.,††thanks:Work was done during a visit at SMU\.Mengxiang Li111footnotemark:1Zilei Wang1††thanks:Corresponding author\.Yang Deng21University of Science and Technology of China2Singapore Management University\{zswei, mxli02\}@mail\.ustc\.edu\.cn,zlwang@ustc\.edu\.cnydeng@smu\.edu\.sg ## 1Introduction > “Behavior is a function of the person and their environment\.”\(Lewin,2013 (https://arxiv.org/html/2604.13846#bib.bib29)\) — Kurt Lewin The advancement of Large Language Models \(LLMs\) has catalyzed a wide range of human\-centric applications such as role\-playing\(Chenet al\.,2024a (https://arxiv.org/html/2604.13846#bib.bib64); Wanget al\.,2025c (https://arxiv.org/html/2604.13846#bib.bib2),a (https://arxiv.org/html/2604.13846#bib.bib18)\), personalized assistance\(Denget al\.,2024b (https://arxiv.org/html/2604.13846#bib.bib68); Moket al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib17)\), user simulation\(Zhanget al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib66); Wuet al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib65)\), and social simulation\(Chenet al\.,2024b (https://arxiv.org/html/2604.13846#bib.bib12); Zhouet al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib13); Zhanget al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib67)\)\. These applications require models to express coherent personalities while adapting their behavior across diverse interaction situations, making personality modeling a central challenge for LLM personalization\. However, most existing approaches implicitly assume that behavior is determined solely by stable personality traits\. This assumption conflicts with a core principle in psychology:behavior is a function of both the person and the environmentLewin \(2013 (https://arxiv.org/html/2604.13846#bib.bib29)\)\. When situational factors are ignored, personalized agents may exhibit superficial consistency yet fail to respond appropriately across varying situations\. Refer to captionFigure 1:PCA of situation\-dependent and global persona neurons, with proximate topic names annotated\. Comparable distances between distinct personality domains and topics within same personality domain highlight the crucial impact of situations\.Existing approaches to endowing LLMs with personality are primarily categorized into training\-based and training\-free methods\. Training\-based methods rely on large\-scale, high\-quality datasets to align models with specific personalized preferences\(Liet al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib1)\), utilizing techniques such as Supervised Fine\-Tuning \(SFT\)\(Wanget al\.,2025b (https://arxiv.org/html/2604.13846#bib.bib39); Tanet al\.,2024b (https://arxiv.org/html/2604.13846#bib.bib14); Liet al\.,2024b (https://arxiv.org/html/2604.13846#bib.bib15)\)or Direct Preference Optimization \(DPO\)\(Liet al\.,2024a (https://arxiv.org/html/2604.13846#bib.bib40)\)\. While effective, these methods are computationally expensive\(Szepet al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib38)\)and difficult to adapt post hoc\(Tsenget al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib41)\)\. Training\-free methods, including prompt\-based personalization\(Jianget al\.,2023 (https://arxiv.org/html/2604.13846#bib.bib30); Liet al\.,2023 (https://arxiv.org/html/2604.13846#bib.bib31)\)and direct internal steering\(Denget al\.,2024a (https://arxiv.org/html/2604.13846#bib.bib22); Chenet al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib32)\), offer greater flexibility but suffer from instability, limited controllability, and weak theoretical grounding\. Although prompt\-based methods can incorporate conversational history as situational context, they treat it merely as a "black\-box" input\. Relying on implicit attention mechanisms for this modulation lacks transparency, often causing unstable personality expression\. Crucially, both training and training\-free paradigms lack a mechanistic framework to explicitly model the underlying persona\-situation interactions\. In contrast to the static assumptions underlying existing LLM personalization methods, personality psychology provides a well\-establishedpersona\-situation interactionalaccount of behaviorLewin \(2013 (https://arxiv.org/html/2604.13846#bib.bib29)\)\. While early trait theories assume cross\-situational consistency\(Newcomb,1929 (https://arxiv.org/html/2604.13846#bib.bib25); Allport,1937 (https://arxiv.org/html/2604.13846#bib.bib24)\), subsequent work demonstrates that stable traits alone are insufficient to explain behavior across diverse situations\(Mischel,1968 (https://arxiv.org/html/2604.13846#bib.bib33); Mischel and Peake,1982 (https://arxiv.org/html/2604.13846#bib.bib34)\)\. In particular, the Cognitive\-Affective Personality System \(CAPS\) theory\(Mischel and Shoda,1995 (https://arxiv.org/html/2604.13846#bib.bib3)\)argues thathuman behavior varies across diverse situations\. Empirical studies further demonstrate thatdifferent situations selectively activate cognitive and affective units\(Mischelet al\.,2002 (https://arxiv.org/html/2604.13846#bib.bib36)\), and thatpersonality coherence is expressed through consistent situation\-behavior patterns\(Ayduk and Gyurak,2008 (https://arxiv.org/html/2604.13846#bib.bib35)\)\. To date, these persona\-situation interactional mechanisms have not been systematically investigated or integrated into LLM\-based personalization\. Motivated by the above psychology studies, we first empirically examine whether LLMs exhibit human\-like situational dependency and consistent situation\-behavior personality patterns, through multi\-perspective analysis of internal neurons\. As illustrated in Figure1 (https://arxiv.org/html/2604.13846#S1.F1), our preliminary results show that the maximum PCA distance between situational topics within a single domain is comparable to the distance observed between distinct personality domains, suggesting that situational variation can induce significant personality\-level representational shifts\. Building on this observation, we propose a training\-free, neuron\-based Identify\-Retrieve\-Steer framework for situational personality steering, namedIRiS\. The framework first identifies situational persona neurons from a set of historical situations, which serve as instructive priors\. Given a novel situational situation,IRiSestimates its similarity to these historical situations, retrieves the corresponding persona neurons, and applies coefficient\-weighted steering to enable precise and situation\-aware personalization\. Comprehensive evaluations across two personality benchmarks validate the SOTA performance ofIRiS, demonstrating effective generalization to unseen situations and robustness within complex situations\. Furthermore, extensive experiments verify the adaptability of our approach across diverse model architectures\. To summarize, our contributions are as follows: - •We empirically validate human\-like situational dependency and consistent situation\-behavior personality patterns within LLMs, pioneering the integration of these psychological mechanisms to guide precise personality steering\. - •We propose theIRiSframework, a psychology\-grounded approach that leverages instructive priors for situational\-aware retrieval and steering\. - •We conduct extensive experiments and in\-depth analyses to verify our framework’s effectiveness, offering intuitive insights into situational personality for future research and applications\. Refer to caption\(a\) Refer to caption\(b\) Refer to caption\(c\) Figure 2:Empirical study results: \(a\) Layer\-wise counts of persona neurons across situational topics \(Topic labels abbreviated by omitting "and"\) \(b\) Variability in neuron proportions across early, middle, and late layers for different topics\. \(c\) Validation of the situation\-behavior consistency patterns in LLM personality\. ## 2Preliminary Analysis The preliminary analysis aims to investigate the situational impact on LLMs’ personality manifestation, and to validate the theory of situation\-behavior consistency within LLMs\. ### 2\.1Backgrounds ##### Personality Model In this work, we adopt the widely validated Big\-Five model\(Tupes and Christal,1992 (https://arxiv.org/html/2604.13846#bib.bib20)\)as the foundational personality framework, which comprises five domains:openness\(O\),conscientiousness\(C\),extroversion\(E\),agreeableness\(A\), andneuroticism\(N\)\. Each domain encompasses opposing aspects \(e\.g\., extraverted versus introverted within theEdomain\)\. ##### Neurons in LLMs Modern LLMs adopt the auto\-regressive transformer\(Vaswaniet al\.,2017 (https://arxiv.org/html/2604.13846#bib.bib6)\)architecture, which consists of L stacked transformer blocks\. Previous work has shown that knowledge, such as personality, is stored in specific neurons within the feed\-forward networks \(FFNs\) of each block\(Daiet al\.,2022 (https://arxiv.org/html/2604.13846#bib.bib4)\)\. Specifically, in layerll, given inputXlX^\{l\}for a token, we have: FFN\(Xl\)=act\(XlW1l\)W2l\\text\{FFN\}\(X^\{l\}\)=\\text\{act\}\(X^\{l\}W^\{l\}\_\{1\}\)W^\{l\}\_\{2\}\(1\)whereXl∈RdX^\{l\}\\in\\mathbb\{R\}^\{d\},W1l∈Rd×dhW^\{l\}\_\{1\}\\in\\mathbb\{R\}^\{d\\times d\_\{h\}\},W2l∈Rdh×dW^\{l\}\_\{2\}\\in\\mathbb\{R\}^\{d\_\{h\}\\times d\}, andactactrepresents the activation function \(e\.g\., ReLU\(Agarap,2019 (https://arxiv.org/html/2604.13846#bib.bib10)\)\)\. More recent advanced LLMs\(Grattafioriet al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib23); Teamet al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib9); Yanget al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib8)\)have replaced the ReLU non\-linearity with the GLU\(Shazeer,2020 (https://arxiv.org/html/2604.13846#bib.bib11)\)activation function to achieve better performance: FFN’\(Xl\)=\(act\(XlW1l\)⊙\(XlW3l\)\)W2l\\text\{FFN'\}\(X^\{l\}\)=\(\\text\{act\}\(X^\{l\}W^\{l\}\_\{1\}\)\\odot\(X^\{l\}W^\{l\}\_\{3\}\)\)W^\{l\}\_\{2\}\(2\)where⊙\\odotdenotes element\-wise multiplication, andW3l∈Rd×dhW^\{l\}\_\{3\}\\in\\mathbb\{R\}^\{d\\times d\_\{h\}\}is the gating weight matrix\. In layerll, theithi^\{th\}neuron can be conceptualized as applying a linear transformation to columniiofW1lW^\{l\}\_\{1\}followed by the non\-linear activation\. The activation value of a neuron is positively correlated with the expression of corresponding facts\. In this context, a neuron is consideredactivatedif its activation value exceeds zero\(Nair and Hinton,2010 (https://arxiv.org/html/2604.13846#bib.bib5)\)\. ##### Neuron Identification Dataset & Situational Topic Category To identify persona neurons within LLMs, we employ the dataset fromPersonalityBench\(Denget al\.,2024a (https://arxiv.org/html/2604.13846#bib.bib22)\), denoted asQ\\mathcal\{Q\}, which comprises a diverse set of descriptive personalization prompts and situational questions designed to elicit personality\-driven responses\. To investigate the influence of varying situations on LLMs’ personality, these questions are categorized intoMM= 30 distinct topics, following the taxonomy of UltraChat\(Dinget al\.,2023 (https://arxiv.org/html/2604.13846#bib.bib21)\)\. Further details are provided in AppendixA (https://arxiv.org/html/2604.13846#A1)\. ### 2\.2Empirical Study The targets of our analysis aresituational persona neuronsin LLMs, which are identified by measuring activation differences of neurons within LLMs under contrastive personality prompts within situational topics, with detailed explanations insection ̃3\.1 (https://arxiv.org/html/2604.13846#S3.SS1)\. #### 2\.2\.1Situational Impact on LLMs’ Personality We analyze the activation states and layer\-wise distribution of persona neurons to investigate situation impact on LLMs’ personality\. Llama\-3\-8B\-Instruct\(Grattafioriet al\.,2024 (https://arxiv.org/html/2604.13846#bib.bib23)\)and Qwen3\-8B\(Yanget al\.,2025 (https://arxiv.org/html/2604.13846#bib.bib8)\)are adopted as target LLMs for validation\. Comprehensive results are provided in AppendixC (https://arxiv.org/html/2604.13846#A3)\. ##### Activation\-Level Perspective We collect the activation probabilities of all situational persona neurons across diverse topics\. For comparison, we also derive "global persona neurons" by disregarding topic distinctions, establishing a baseline for cross\-situational consistency\. We then perform Principal Component Analysis \(PCA\) on the feature vectorsvf∈RL×dhv\_\{f\}\\in\\mathbb\{R\}^\{L\\times d\_\{h\}\}, which are constructed by populating the activation probability values of specific neurons while setting irrelevant neurons’ positions to zero\. As illustrated in Figure1 (https://arxiv.org/html/2604.13846#S1.F1)\(Qwen\) and Figure5 (https://arxiv.org/html/2604.13846#A3.F5)\(Llama\), situational persona neurons exhibit significant variation across topics\. Notably, the maximum Euclidean distance between topics within a single domain in the PCA space is comparable to the distance between distinct personality domains\. Moreover, nearby points in the PCA space correspond to semantically similar topics, confirming that the observed variation is systematic rather than noise\. Given the independence of Big Five personality domains\(Goldberg,2013 (https://arxiv.org/html/2604.13846#bib.bib26)\), this result indicates thatdistinct situations induces personality\-level representational shifts in LLMs\. Refer to captionFigure 3:Overview of theIRiSframework, comprising Identification, Retrieval, and Steering phases for accurate personality steering\. ##### Layer\-Level Perspective We further investigate the variations in the count and proportion of situational personality neurons across diverse topics among all layers\. For illustrative clarity, we select eight topics and eight layers spanning the early, middle, and late stages within personality domainC\. As shown in Figure2\(a\) (https://arxiv.org/html/2604.13846#S1.F2.sf1), within a specific personality domain, questions involving different situational topics significantly influence the distribution of neurons controlling LLM personality\. Similarly, as illustrated in Figure2\(b\) (https://arxiv.org/html/2604.13846#S1.F2.sf2), the proportion of situational persona neurons per layer exhibits significant disparity, with a maximum difference of 2\.23% observed in layer 16\.

Similar Articles

Persona-Assigned Large Language Models Exhibit Human-Like Motivated Reasoning

arXiv cs.CL

This paper investigates whether assigning personas to large language models induces human-like motivated reasoning, finding that persona-assigned LLMs show up to 9% reduced veracity discernment and are up to 90% more likely to evaluate scientific evidence in ways congruent with their induced political identity, with prompt-based debiasing largely ineffective.

PersonaVLM: Long-Term Personalized Multimodal LLMs

Hugging Face Daily Papers

PersonaVLM introduces a personalized multimodal LLM framework that enables long-term user adaptation through memory retention, multi-turn reasoning, and response alignment, outperforming GPT-4o by 5.2% on the new Persona-MME benchmark.

Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation

arXiv cs.CL

Researchers from KAIST propose a framework that uses persona-guided LLM agents to synthesize diverse harmful content for stress-testing detection systems, addressing limitations of static benchmarks such as scalability, diversity, and data contamination. Both human and LLM evaluations confirm the synthetic scenarios are harder to detect than existing benchmarks while maintaining linguistic and topical diversity.

SPS: Steering Probability Squeezing for Better Exploration in Reinforcement Learning for Large Language Models

arXiv cs.CL

Researchers propose SPS (Steering Probability Squeezing), a training paradigm combining reinforcement learning with inverse reinforcement learning to address probability squeezing in LLM reasoning training, where probability mass concentrates too narrowly on high-reward trajectories, limiting exploration and multi-sample performance (Pass@k). Experiments on five reasoning benchmarks demonstrate improved exploration and Pass@k metrics.