Tag
The paper introduces RealUserSim, a framework that grounds LLM-based user simulation in real human behavioral data from 14,000+ authentic conversations to bridge the reality gap in agent benchmarking. It shows that grounded simulation raises behavioral match rates from 24.2% to 45.3% and reveals failure mechanisms invisible to cooperative simulators.