Tag
This paper constructs a large dataset of 263,911 long-form stories annotated with TTCW-based creativity metrics and fine-tunes Qwen3 models to generate structured review reports. It finds that non-reasoning fine-tuning outperforms reasoning-supervised fine-tuning, which suffers from parse failures and irrelevant repetition.