Tag
This paper introduces a unified framework for test-time diverse generation in large language models, categorizing methods by where diversity is injected (surface-level vs. specification-level). It proposes specification-level methods that generate diverse intermediate specifications, achieving better output diversity across five open-ended tasks and four backbone models while maintaining quality.