Tag
This paper diagnoses the low diversity in LLM-generated stories, finding that 88.3% of sampled stories contain one of 11 common words (e.g., Elias, lighthouse) across models, and traces this homogeneity to post-training data and alignment rather than prevalence in pre-training data.
A hierarchical multi-agent framework generates short dramas from single sentences by enforcing narrative pacing, ensuring spatial consistency, and implementing quality control through iterative refinement and reviewer loops. It introduces a new benchmark, Short-Drama-Bench, for evaluation.
This paper introduces NARRA-Gym, a benchmark and executable evaluation environment for assessing Large Language Models' abilities in sustaining interactive narratives, managing memory, and adapting to users over multiple turns.
A local MCP server lets LLMs browse Project Gutenberg books offline to improve creative writing.