PresentAgent-2: Towards Generalist Multimodal Presentation Agents
Summary
PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.
View Cached Full Text
Cached at: 05/14/26, 04:17 AM
Paper page - PresentAgent-2: Towards Generalist Multimodal Presentation Agents
Source: https://huggingface.co/papers/2605.11363
Abstract
PresentAgent-2 is an agentic framework that generates presentation videos from user queries by conducting research, creating multimodal slides, and producing interactive content across single, discussion, and interaction modes.
Presentation generation is moving beyond static slide creation toward end-to-endpresentation video generationwithresearch grounding,multimodal media, and interactive delivery. We introduce PresentAgent-2, anagentic frameworkfor generating presentation videos from user queries. Given an open-ended user query and a selected presentation mode, PresentAgent-2 first summarizes the query into a focused topic and performs deep research over presentation-friendly sources to collect multimodal resources, including relevant text, images, GIFs, and videos. It then constructs presentation slides, generates mode-specific scripts, and composes slides, audio, and dynamic media into a complete presentation video. PresentAgent-2 supports three independentpresentation modeswithin a unified framework:Single Presentation, which generates a single-speaker narrated presentation video;Discussion, which creates a multi-speaker presentation with structured speaker roles, such as for asking guiding questions, explaining concepts, clarifying details, and summarizing key points; andInteraction, which independently supports answering audience questions grounded in the generated slides, scripts, retrieved evidence, and presentation context. To evaluate these capabilities, we build a multimodal presentation benchmark coveringsingle presentation,discussion, andinteractionscenarios, with task-specific evaluation criteria for content quality, media relevance, dynamic media use,dialogue naturalness, andinteraction grounding. Overall, PresentAgent-2 extends presentation generation from document-dependent slide creation to query-driven, research-groundedpresentation video generationwithmultimodal media, dialogue, andinteraction. Code: https://github.com/AIGeeksGroup/PresentAgent-2. Website: https://aigeeksgroup.github.io/PresentAgent-2.
View arXiv pageView PDFProject pageGitHub2Add to collection
Get this paper in your agent:
hf papers read 2605\.11363
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.11363 in a model README.md to link it from this page.
Datasets citing this paper1
#### AIGeeksGroup/PresentEval Viewer• Updatedabout 14 hours ago • 58 • 63
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.11363 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation
MM-WebAgent is a hierarchical agentic framework that generates coherent and visually consistent webpages by coordinating AIGC-based element generation through joint optimization of layout and multimodal content. The paper introduces a benchmark and multi-level evaluation protocol, demonstrating improvements over code-generation and agent-based baselines.
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Agent S2 is a new compositional framework for computer use agents that achieves state-of-the-art performance on multiple benchmarks by utilizing Mixture-of-Grounding and Proactive Hierarchical Planning.
I built 10 gamified, interactive presentation decks to teach Agentic AI (Stop falling asleep reading whitepapers).
A developer built 10 gamified, interactive slide decks within the AgentSwarms platform to teach Agentic AI concepts like ReAct loops, multi-agent swarms, and production RAG, using active recall instead of passive reading.
LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching
LectūraAgents is a multi-agent framework for adaptive personalized learning that mimics professor-student interactions and generates embodied teaching actions aligned with learner profiles. It introduces a hierarchical architecture, an adaptive embodied teaching mechanism, and a Teaching Action-Speech Alignment algorithm, showing consistent improvements over existing approaches.
Macaron-A2UI: A Model for Generative UI in Personal Agents
Presents Macaron-A2UI, a model for generative UI in personal agents that synthesizes dynamic interfaces with lightweight executable actions, moving beyond text-only chat. The paper introduces a large-scale corpus, the A2UI-Bench benchmark, and trains models up to 754B parameters using LoRA fine-tuning and reinforcement learning, achieving strong results.