Tag
This paper introduces NARRA-Gym, a benchmark and executable evaluation environment for assessing Large Language Models' abilities in sustaining interactive narratives, managing memory, and adapting to users over multiple turns.