@QingQ77: Upload academic paper PDFs or LaTeX source code to automatically generate editable PowerPoint presentations through multi-agent collaboration. https://github.com/CRui5in/paper-ppt-agent… Paper PPT Agent uses three ag…

X AI KOLs Timeline 05/10/26, 02:02 PM Tools

Summary

Paper PPT Agent is an open-source multi-agent collaboration tool that automatically converts academic paper PDFs or LaTeX source code into editable PowerPoint presentations, featuring content summarization, layout design, and visual quality review capabilities.

Upload academic paper PDFs or LaTeX source code to automatically generate editable PowerPoint presentations through multi-agent collaboration. https://github.com/CRui5in/paper-ppt-agent… Paper PPT Agent uses three agents (Planner, Executor, and Critic) working in tandem to convert paper PDFs or TeX source code into editable PowerPoint slides. After uploading, the AI automatically summarizes content, plans the structure, and designs layouts. It also reviews visual quality, where a static Critic detects overflow and overlap issues and triggers repairs, while visual QA utilizes multimodal models to evaluate the rendered output.

Original Article

View Cached Full Text

Cached at: 05/11/26, 08:39 AM

Upload academic paper PDFs or LaTeX source code to automatically generate editable PowerPoint presentations through multi-agent collaboration. https://github.com/CRui5in/paper-ppt-agent… The Paper PPT Agent utilizes three agents (Strategist, Executor, and Critic) to convert paper PDFs or TeX source code into editable PowerPoint files. After upload, the AI automatically distills content, plans structure, designs layouts, and reviews visual quality. The static Critic detects overflow and overlap issues, triggering automatic repairs, while Visual QA uses multimodal models to inspect rendering results.

CRui5in/paper-ppt-agent

Source: https://github.com/CRui5in/paper-ppt-agent

Paper PPT Agent

MIT License

Chinese | English

An automated tool for generating academic presentation slides based on multi-agent collaboration. Upload paper PDFs or TeX source code, and the AI will handle content distillation, structural planning, layout design, and visual quality review, ultimately outputting editable PowerPoint files.

Screenshot

Core Capabilities

Content Generation

Supports input from paper PDFs and TeX source code; uploading the complete TeX archive is recommended for optimal parsing. The multi-agent pipeline (Strategist → Executor → Critic) collaborates to distill content and generate layouts. It supports bilingual (Chinese/English) and customizable language outputs, with configurable target page count, detail level, and canvas aspect ratio.

Visual Quality Assurance

The static analysis Critic automatically detects layout issues such as text overflow, element overlap, and decorative line occlusion, triggering repairs. Visual QA (experimental) calls upon large multimodal models to review rendered images. The repair process automatically archives before-and-after snapshots, supporting round-by-round comparison and full-screen real-time preview.

Icons and Decorations

Includes a built-in icon library that supports automatic insertion of semantically matching icons. You can retrieve the most suitable candidates from the icon library via RAG semantic search (based on Gemini Embedding). Icon decorations and RAG search can be toggled independently.

Feedback and Iteration

After generation, you can specify single or multiple pages for feedback optimization, supporting structural adjustments (adding/deleting pages, inserting pages, reordering). Each iteration automatically saves a version snapshot, supporting version comparison and rollback.

Logging and Observability

Real-time agent log streams display events and progress for each stage; token usage is aggregated by model, stage, and time dimension, with support for filtering and detailed views. The Critic event panel displays violations, repair prompts, and archive paths page by page. The results page supports tracing back the complete run configuration.

Environment Requirements

Python 3.11+
uv (https://docs.astral.sh/uv/)
Node.js 18+ and npm
API Key for at least one model provider:
- OpenAI
- DeepSeek
- Anthropic
- Gemini
- Custom BaseURL-compatible interfaces (model quality significantly impacts generation results; GPT-5.5 and Gemini 3.1 Pro are recommended)
(Optional) Gemini API Key: Used for icon RAG semantic search

Quick Start

Windows:

powershell .\start-dev.bat

Linux:

bash sh start-dev.sh

The startup script will automatically install dependencies and launch the frontend and backend services.

Manual Startup (Frontend and Backend started separately):

``powershell

Backend

uv run python -m uvicorn backend.app:app –host 127.0.0.1 –port 8000 –reload –reload-dir backend –reload-include=*.py

Frontend

cd frontend && npm run dev – –host 127.0.0.1 –port 5173 –strictPort ``

Dependencies must be installed before manual startup:

powershell uv sync --locked cd frontend && npm install && cd ..

Access after startup:

Frontend: http://127.0.0.1:5173
Backend: http://127.0.0.1:8000

Important Update Log

Critic Log Persistence and Detail Panel: Violations, repair prompts, and archive paths detected by the Critic are persisted in critic_history.json; the frontend supports viewing details page by page.
Before/After SVG Comparison: Automatically archives pre-repair SVG snapshots, supporting round-by-round comparison and full-screen real-time preview.
Icon RAG Semantic Search: Semantically retrieves matching candidates from the icon library based on Gemini Embedding; can be toggled independently.
Icon Decoration Master Switch: Supports generating shape-only slides without using icons.
Visual QA (Experimental): Calls large multimodal models to render slides as images for layout and contrast review.
Enhanced Static Critic: Added detection for decorative line occlusion and low-contrast text; fixed false positives in multi-line text width estimation.
Version History Management: Automatically archives snapshots after each feedback iteration, supporting version comparison and rollback.
Token Log Filtering: Filter LLM call records by model, stage, page number, and task; supports clicking to expand details.
Generation Cancellation: Supports canceling the current task while the pipeline is running.
Dedicated DeepSeek Interface: Independent DeepSeek provider support with thinking mode configuration.
Multi-Agent Pipeline: Three-stage collaboration (Strategist → Executor → Critic), supporting automatic SVG repair and feedback iteration.

References and Acknowledgments

This project references the following open-source projects for product concepts, process decomposition, and some engineering implementation methods:

PPTAgent (https://github.com/icip-cas/PPTAgent)
ppt-master (https://github.com/hugohe3/ppt-master)

License

This project is open-sourced under the MIT License.

Contact

For questions or suggestions, feel free to contact us via:

GitHub Issues: CRui5in/paper-ppt-agent (https://github.com/CRui5in/paper-ppt-agent/issues)
Email: [email protected]

Disclaimer

This project is an academic research assistant tool. The presentation content generated is produced by AI models and is for reference only. Users are solely responsible for the accuracy and compliance of the generated content. By using this tool, you agree to bear all risks arising from the use of the generated content.