One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

Hugging Face Daily Papers Papers

Summary

A hierarchical multi-agent framework generates short dramas from single sentences by enforcing narrative pacing, ensuring spatial consistency, and implementing quality control through iterative refinement and reviewer loops. It introduces a new benchmark, Short-Drama-Bench, for evaluation.

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1) narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2) spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3) production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchical multi-agent framework that transforms a user's single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-based story generation module that enforces short-drama pacing and narrative coherence; (2) a 3D-grounded first-frame generation mechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3) multi-stage reviewer loops that perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introduce scene-level BGM matching and scene transition planning to improve the audience's immersive experience. To systematically evaluate this task, we introduce Short-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.
Original Article
View Cached Full Text

Cached at: 05/22/26, 06:27 AM

Paper page - One Sentence, One Drama: Personalized Short-Form Drama Generation via Multi-Agent Systems

Source: https://huggingface.co/papers/2605.22144

Abstract

A hierarchical multi-agent framework generates short dramas from single sentences by enforcing narrative pacing, ensuring spatial consistency, and implementing quality control through iterative refinement and reviewer loops.

Existing approaches for digital short-drama production typically rely on one-shot LLM generated scripts and loosely coupled pipelines, which fail to satisfy three key requirements of short-drama generation: (1)narrative pacing, resulting in weak hooks, insufficient escalation, and unattractive endings; (2)spatial consistency, leading to drifting scene layouts and inconsistent character positions across clips; and (3)production-level quality control, requiring extensive manual review and correction across script and visual stages. We present One Sentence, One Drama, a hierarchicalmulti-agent frameworkthat transforms a user’s single-sentence idea into a fully produced short drama through structured intermediate modules and iterative refinement. Our approach is built upon three key components: (1) a multi-agent debate-basedstory generation modulethat enforces short-drama pacing and narrative coherence; (2) a3D-grounded first-frame generationmechanism that establishes a shared spatial reference for consistent character positioning and scene layout across clips; and (3)multi-stage reviewer loopsthat perform comprehensive error detection and targeted revision across script, visual, and video generation stages. We also introducescene-level BGM matchingandscene transition planningto improve the audience’s immersive experience. To systematically evaluate this task, we introduceShort-Drama-Bench, a benchmark that extends standard video quality metrics with short-drama-specific criteria. Experimental results demonstrate that our method significantly outperforms existing pipelines in narrative quality, cross-clip consistency, and overall viewing experience.

View arXiv pageView PDFAdd to collection

Get this paper in your agent:

hf papers read 2605\.22144

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22144 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22144 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22144 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles

@polynoamial: https://x.com/polynoamial/status/2064210146558136827

X AI KOLs Following

This article argues that LLM benchmark performance is increasingly a function of test-time compute, and that current evaluation methods fail to capture capability improvements when controlling for inference budget. It advocates for plotting performance vs. tokens, cost, or time, and discusses implications for safety evaluations.