Tag
Proposes Online Agent-as-a-Judge, an evaluation framework that uses an in-world evaluator agent to actively generate situations for testing interactive social agents, improving coverage and reliability over passive methods.
Agentopia is a comprehensive framework for long-term life simulation in multi-agent societies, where 100 LLM-powered agents autonomously pursue personal growth and social relationships over 10 simulated years. The work studies emergent social behaviors and uses life reward training to improve LLM role-playing capabilities.
Introduces PCSP, a single RL policy conditioned on frozen LLM embeddings of persona descriptions, enabling scalable, real-time persona-traceable NPC control in life simulation games. Experiments show zero-shot persona identification and behavioral alignment, with faster inference than LLM baselines.