PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models
Summary
PersonaArena is a dynamic simulation framework that uses a large corpus of social content and a multi-agent debating judge to evaluate and improve LLMs' ability to maintain coherent and authentic persona-level role-playing in realistic social scenarios.
View Cached Full Text
Cached at: 05/19/26, 06:38 AM
# PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models Source: [https://arxiv.org/abs/2605.17044](https://arxiv.org/abs/2605.17044) [View PDF](https://arxiv.org/pdf/2605.17044) > Abstract:Large language models \(LLMs\) increasingly serve as interactive social agents, yet their ability to maintain coherent and authentic persona\-level role\-playing remains limited, particularly in realistic social scenarios\. Existing research predominantly focuses on character\-level settings and relies on static evaluation formats, failing to capture the complexity of everyday social interactions\. In this work, we present PersonaArena, a dynamic simulation framework for evaluating and improving persona\-level role\-playing in LLMs\. PersonaArena leverages a large, filtered corpus of user\-generated social content to construct a nuanced persona bank, and elicits multi\-turn, context\-rich interactions within simulated social environments\. Our framework features a multi\-agent debating judge for holistic and unbiased assessment\. Through extensive experiments, we demonstrate that PersonaArena enables rigorous evaluation and enhancement of LLMs' role\-playing capabilities, advancing the development of more authentic and socially adept AI agents\. ## Submission history From: Wenlong Shi \[[view email](https://arxiv.org/show-email/193ca9d4/2605.17044)\] **\[v1\]**Sat, 16 May 2026 15:23:28 UTC \(5,612 KB\)
Similar Articles
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents
Introduces Persona Policies (PPol), a plug-and-play control layer that uses LLM-driven evolutionary program search to generate diverse, human-like user personas for evaluating LLM agents. Achieves 33–62% fitness gains over baseline, with human-likeness rated at 80.4%, and improves agent robustness with +17% task success.
Beyond Static Personas: Situational Personality Steering for Large Language Models
This paper introduces IRiS, a training-free framework for situational personality steering in LLMs that moves beyond static persona modeling by identifying and leveraging situation-dependent persona neurons. The approach demonstrates that LLM behavior varies contextually and proposes neuron-based identification, retrieval, and weighted steering methods validated on PersonalityBench and a new SPBench benchmark.
MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation
MCP-Persona is a benchmark evaluating LLM agents on personalized tools interacting with individual accounts and local databases. Experiments reveal significant challenges for state-of-the-art agents in personalized tool use.
Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation
Researchers from KAIST propose a framework that uses persona-guided LLM agents to synthesize diverse harmful content for stress-testing detection systems, addressing limitations of static benchmarks such as scalability, diversity, and data contamination. Both human and LLM evaluations confirm the synthetic scenarios are harder to detect than existing benchmarks while maintaining linguistic and topical diversity.
One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
Introduces PCSP, a single RL policy conditioned on frozen LLM embeddings of persona descriptions, enabling scalable, real-time persona-traceable NPC control in life simulation games. Experiments show zero-shot persona identification and behavioral alignment, with faster inference than LLM baselines.