@QingQ77: Upload academic paper PDFs or LaTeX source code to automatically generate editable PowerPoint presentations through multi-agent collaboration. https://github.com/CRui5in/paper-ppt-agent… Paper PPT Agent uses three ag…
Summary
Paper PPT Agent is an open-source multi-agent collaboration tool that automatically converts academic paper PDFs or LaTeX source code into editable PowerPoint presentations, featuring content summarization, layout design, and visual quality review capabilities.
View Cached Full Text
Cached at: 05/11/26, 08:39 AM
Upload academic paper PDFs or LaTeX source code to automatically generate editable PowerPoint presentations through multi-agent collaboration. https://github.com/CRui5in/paper-ppt-agent… The Paper PPT Agent utilizes three agents (Strategist, Executor, and Critic) to convert paper PDFs or TeX source code into editable PowerPoint files. After upload, the AI automatically distills content, plans structure, designs layouts, and reviews visual quality. The static Critic detects overflow and overlap issues, triggering automatic repairs, while Visual QA uses multimodal models to inspect rendering results.
CRui5in/paper-ppt-agent
Source: https://github.com/CRui5in/paper-ppt-agent
Paper PPT Agent
Chinese | English
An automated tool for generating academic presentation slides based on multi-agent collaboration. Upload paper PDFs or TeX source code, and the AI will handle content distillation, structural planning, layout design, and visual quality review, ultimately outputting editable PowerPoint files.
Screenshot
Core Capabilities
Content Generation
Supports input from paper PDFs and TeX source code; uploading the complete TeX archive is recommended for optimal parsing. The multi-agent pipeline (Strategist → Executor → Critic) collaborates to distill content and generate layouts. It supports bilingual (Chinese/English) and customizable language outputs, with configurable target page count, detail level, and canvas aspect ratio.
Visual Quality Assurance
The static analysis Critic automatically detects layout issues such as text overflow, element overlap, and decorative line occlusion, triggering repairs. Visual QA (experimental) calls upon large multimodal models to review rendered images. The repair process automatically archives before-and-after snapshots, supporting round-by-round comparison and full-screen real-time preview.
Icons and Decorations
Includes a built-in icon library that supports automatic insertion of semantically matching icons. You can retrieve the most suitable candidates from the icon library via RAG semantic search (based on Gemini Embedding). Icon decorations and RAG search can be toggled independently.
Feedback and Iteration
After generation, you can specify single or multiple pages for feedback optimization, supporting structural adjustments (adding/deleting pages, inserting pages, reordering). Each iteration automatically saves a version snapshot, supporting version comparison and rollback.
Logging and Observability
Real-time agent log streams display events and progress for each stage; token usage is aggregated by model, stage, and time dimension, with support for filtering and detailed views. The Critic event panel displays violations, repair prompts, and archive paths page by page. The results page supports tracing back the complete run configuration.
Environment Requirements
- Python 3.11+
- uv (https://docs.astral.sh/uv/)
- Node.js 18+ and npm
- API Key for at least one model provider:
- OpenAI
- DeepSeek
- Anthropic
- Gemini
- Custom BaseURL-compatible interfaces (model quality significantly impacts generation results;
GPT-5.5andGemini 3.1 Proare recommended)
- (Optional) Gemini API Key: Used for icon RAG semantic search
Quick Start
Windows:
powershell .\start-dev.bat
Linux:
bash sh start-dev.sh
The startup script will automatically install dependencies and launch the frontend and backend services.
Manual Startup (Frontend and Backend started separately):
``powershell
Backend
uv run python -m uvicorn backend.app:app –host 127.0.0.1 –port 8000 –reload –reload-dir backend –reload-include=*.py
Frontend
cd frontend && npm run dev – –host 127.0.0.1 –port 5173 –strictPort ``
Dependencies must be installed before manual startup:
powershell uv sync --locked cd frontend && npm install && cd ..
Access after startup:
- Frontend: http://127.0.0.1:5173
- Backend: http://127.0.0.1:8000
Important Update Log
- Critic Log Persistence and Detail Panel: Violations, repair prompts, and archive paths detected by the Critic are persisted in
critic_history.json; the frontend supports viewing details page by page. - Before/After SVG Comparison: Automatically archives pre-repair SVG snapshots, supporting round-by-round comparison and full-screen real-time preview.
- Icon RAG Semantic Search: Semantically retrieves matching candidates from the icon library based on Gemini Embedding; can be toggled independently.
- Icon Decoration Master Switch: Supports generating shape-only slides without using icons.
- Visual QA (Experimental): Calls large multimodal models to render slides as images for layout and contrast review.
- Enhanced Static Critic: Added detection for decorative line occlusion and low-contrast text; fixed false positives in multi-line text width estimation.
- Version History Management: Automatically archives snapshots after each feedback iteration, supporting version comparison and rollback.
- Token Log Filtering: Filter LLM call records by model, stage, page number, and task; supports clicking to expand details.
- Generation Cancellation: Supports canceling the current task while the pipeline is running.
- Dedicated DeepSeek Interface: Independent DeepSeek provider support with thinking mode configuration.
- Multi-Agent Pipeline: Three-stage collaboration (Strategist → Executor → Critic), supporting automatic SVG repair and feedback iteration.
References and Acknowledgments
This project references the following open-source projects for product concepts, process decomposition, and some engineering implementation methods:
- PPTAgent (https://github.com/icip-cas/PPTAgent)
- ppt-master (https://github.com/hugohe3/ppt-master)
License
This project is open-sourced under the MIT License.
Contact
For questions or suggestions, feel free to contact us via:
- GitHub Issues: CRui5in/paper-ppt-agent (https://github.com/CRui5in/paper-ppt-agent/issues)
- Email: [email protected]
Disclaimer
This project is an academic research assistant tool. The presentation content generated is produced by AI models and is for reference only. Users are solely responsible for the accuracy and compliance of the generated content. By using this tool, you agree to bear all risks arising from the use of the generated content.
Similar Articles
@QingQ77: Enable AI Agents to generate beautifully formatted, information-dense Chinese PPTs through non-destructive text editing https://github.com/GordenSun/GordenPPTSkill… A PPT building skill for AI Agents, comes with 17 …
An open-source project that enables AI Agents to generate beautifully formatted Chinese PPTs through non-destructive text editing, providing 17 hand-crafted Chinese PPTX templates and a complete toolchain.
@aehyok: I've also been looking into making PPTs recently. After watching a video by a top female creator about using Codex + Image2 to create PPTs, I quickly found several PPT Skills with many stars, planning to try each one to see which works best for me. 1. https://github.com/op7418…
Introduces an AI Agent skill called guizang-ppt-skill, used to generate single-file HTML horizontal flip PPT with images and cover, supporting Claude Code and Codex environments.
@BTCqzy1: Stop using AI to generate PPT images that you can only view but not edit! I've tried various AI tools for PPT before, and they all produced image mosaics—changing a single word meant starting over from scratch. Then I stumbled upon this GitHub hardcore tool ppt-master (23k stars) with the following highlights: · Outputs native .pp...
Introduces an open-source tool called ppt-master that can generate native editable .pptx files from PDF, Word, web links, etc., with support for animations and voice narration, running locally to protect data security.
@Xudong07452910: Paper2Any: Convert papers, texts, or topics into editable research diagrams, technical roadmaps, and presentations with one click. This is an open-source project focused on academic visualization and presentation creation. Main features include: 1.Paper2Figure: Generate editable model architecture diagrams, technical roadmaps, experimental flowcharts (supports PPTX/SVG formats)
Paper2Any is an open-source project that converts papers, texts, or topics into editable research diagrams, technical roadmaps, and presentations with one click, supporting multiple output formats, suitable for researchers to quickly create paper illustrations and presentation materials.
@vintcessun: Tonight I discovered a project with a novel approach: using GPT to generate luxurious image-format PPTs, then 'decompiling' them back into editable PPTX. Previously, AI-generated PPTs either stacked templates or only exported images. This one directly follows the 'generate image → parse → assemble' path. The core idea is to decompose the PPT image into four layers: background, frame, icons, and text, then reconstruct by coord...
An open-source project named GordenSuperPPTSkills uses GPT to generate luxurious image-format PPTs, then 'decompiles' them into fully editable PPTX files, solving the pain point of AI-generated content being non-editable. The project is split into three independent skills, supporting either just image generation or just image-to-editable conversion, suitable for Codex environment.