PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Hugging Face Daily Papers 05/12/26, 12:00 AM Papers

Summary

PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.

Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework for generating presentation videos from user queries. Given an open-ended user query and a selected presentation mode, PresentAgent-2 first summarizes the query into a focused topic and performs deep research over presentation-friendly sources to collect multimodal resources, including relevant text, images, GIFs, and videos. It then constructs presentation slides, generates mode-specific scripts, and composes slides, audio, and dynamic media into a complete presentation video. PresentAgent-2 supports three independent presentation modes within a unified framework: Single Presentation, which generates a single-speaker narrated presentation video; Discussion, which creates a multi-speaker presentation with structured speaker roles, such as for asking guiding questions, explaining concepts, clarifying details, and summarizing key points; and Interaction, which independently supports answering audience questions grounded in the generated slides, scripts, retrieved evidence, and presentation context. To evaluate these capabilities, we build a multimodal presentation benchmark covering single presentation, discussion, and interaction scenarios, with task-specific evaluation criteria for content quality, media relevance, dynamic media use, dialogue naturalness, and interaction grounding. Overall, PresentAgent-2 extends presentation generation from document-dependent slide creation to query-driven, research-grounded presentation video generation with multimodal media, dialogue, and interaction. Code: https://github.com/AIGeeksGroup/PresentAgent-2. Website: https://aigeeksgroup.github.io/PresentAgent-2.

Original Article

View Cached Full Text

Cached at: 05/14/26, 04:17 AM

Paper page - PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Source: https://huggingface.co/papers/2605.11363

Abstract

Presentation generation is moving beyond static slide creation toward end-to-endpresentation video generationwithresearch grounding,multimodal media, and interactive delivery. We introduce PresentAgent-2, anagentic frameworkfor generating presentation videos from user queries. Given an open-ended user query and a selected presentation mode, PresentAgent-2 first summarizes the query into a focused topic and performs deep research over presentation-friendly sources to collect multimodal resources, including relevant text, images, GIFs, and videos. It then constructs presentation slides, generates mode-specific scripts, and composes slides, audio, and dynamic media into a complete presentation video. PresentAgent-2 supports three independentpresentation modeswithin a unified framework:Single Presentation, which generates a single-speaker narrated presentation video;Discussion, which creates a multi-speaker presentation with structured speaker roles, such as for asking guiding questions, explaining concepts, clarifying details, and summarizing key points; andInteraction, which independently supports answering audience questions grounded in the generated slides, scripts, retrieved evidence, and presentation context. To evaluate these capabilities, we build a multimodal presentation benchmark coveringsingle presentation,discussion, andinteractionscenarios, with task-specific evaluation criteria for content quality, media relevance, dynamic media use,dialogue naturalness, andinteraction grounding. Overall, PresentAgent-2 extends presentation generation from document-dependent slide creation to query-driven, research-groundedpresentation video generationwithmultimodal media, dialogue, andinteraction. Code: https://github.com/AIGeeksGroup/PresentAgent-2. Website: https://aigeeksgroup.github.io/PresentAgent-2.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2605\.11363

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.11363 in a model README.md to link it from this page.

Datasets citing this paper1

#### AIGeeksGroup/PresentEval Viewer• Updatedabout 14 hours ago • 58 • 63

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.11363 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Paper page - PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Abstract

Models citing this paper0

Datasets citing this paper1

Spaces citing this paper0

Collections including this paper0

Similar Articles

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

I built 10 gamified, interactive presentation decks to teach Agentic AI (Stop falling asleep reading whitepapers).

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Macaron-A2UI: A Model for Generative UI in Personal Agents

Submit Feedback

Similar Articles

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

I built 10 gamified, interactive presentation decks to teach Agentic AI (Stop falling asleep reading whitepapers).

LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching

Macaron-A2UI: A Model for Generative UI in Personal Agents