PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models

arXiv cs.AI Papers

Summary

PersonaArena is a dynamic simulation framework that uses a large corpus of social content and a multi-agent debating judge to evaluate and improve LLMs' ability to maintain coherent and authentic persona-level role-playing in realistic social scenarios.

arXiv:2605.17044v1 Announce Type: new Abstract: Large language models (LLMs) increasingly serve as interactive social agents, yet their ability to maintain coherent and authentic persona-level role-playing remains limited, particularly in realistic social scenarios. Existing research predominantly focuses on character-level settings and relies on static evaluation formats, failing to capture the complexity of everyday social interactions. In this work, we present PersonaArena, a dynamic simulation framework for evaluating and improving persona-level role-playing in LLMs. PersonaArena leverages a large, filtered corpus of user-generated social content to construct a nuanced persona bank, and elicits multi-turn, context-rich interactions within simulated social environments. Our framework features a multi-agent debating judge for holistic and unbiased assessment. Through extensive experiments, we demonstrate that PersonaArena enables rigorous evaluation and enhancement of LLMs' role-playing capabilities, advancing the development of more authentic and socially adept AI agents.
Original Article
View Cached Full Text

Cached at: 05/19/26, 06:38 AM

# PersonaArena: Dynamic Simulation for Evaluating and Enhancing Persona-Level Role-Playing in Large Language Models
Source: [https://arxiv.org/abs/2605.17044](https://arxiv.org/abs/2605.17044)
[View PDF](https://arxiv.org/pdf/2605.17044)

> Abstract:Large language models \(LLMs\) increasingly serve as interactive social agents, yet their ability to maintain coherent and authentic persona\-level role\-playing remains limited, particularly in realistic social scenarios\. Existing research predominantly focuses on character\-level settings and relies on static evaluation formats, failing to capture the complexity of everyday social interactions\. In this work, we present PersonaArena, a dynamic simulation framework for evaluating and improving persona\-level role\-playing in LLMs\. PersonaArena leverages a large, filtered corpus of user\-generated social content to construct a nuanced persona bank, and elicits multi\-turn, context\-rich interactions within simulated social environments\. Our framework features a multi\-agent debating judge for holistic and unbiased assessment\. Through extensive experiments, we demonstrate that PersonaArena enables rigorous evaluation and enhancement of LLMs' role\-playing capabilities, advancing the development of more authentic and socially adept AI agents\.

## Submission history

From: Wenlong Shi \[[view email](https://arxiv.org/show-email/193ca9d4/2605.17044)\] **\[v1\]**Sat, 16 May 2026 15:23:28 UTC \(5,612 KB\)

Similar Articles

Beyond Static Personas: Situational Personality Steering for Large Language Models

arXiv cs.CL

This paper introduces IRiS, a training-free framework for situational personality steering in LLMs that moves beyond static persona modeling by identifying and leveraging situation-dependent persona neurons. The approach demonstrates that LLM behavior varies contextually and proposes neuron-based identification, retrieval, and weighted steering methods validated on PersonalityBench and a new SPBench benchmark.

Beyond Static Benchmarks: Synthesizing Harmful Content via Persona-based Simulation for Robust Evaluation

arXiv cs.CL

Researchers from KAIST propose a framework that uses persona-guided LLM agents to synthesize diverse harmful content for stress-testing detection systems, addressing limitations of static benchmarks such as scalability, diversity, and data contamination. Both human and LLM evaluations confirm the synthetic scenarios are harder to detect than existing benchmarks while maintaining linguistic and topical diversity.