Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation
Summary
This paper introduces slidesqaqa, a Flask-based software system that generates pedagogically useful questions from PDF slide decks. It uses a four-stage LLM pipeline to extract text and images, plan questions across the deck, annotate slides, and reconcile outputs, demonstrating high-fidelity question generation on technical lecture slides.
View Cached Full Text
Cached at: 05/27/26, 09:04 AM
# Introduction
Source: [https://arxiv.org/html/2605.26428](https://arxiv.org/html/2605.26428)
Slide Deck Q&A Quality Assurance App: A Multi\-Stage Pipeline for Pedagogical Question Generation
###### Abstract
Generating high\-quality, pedagogically useful questions from lecture slide decks is difficult because important instructional content is distributed across both text and visual elements, and because useful questions must be scaffolded across the flow of a presentation rather than generated slide by slide in isolation\. This paper describes Slide Deck Q&A Quality Assurance \(slidesqaqa\), a Flask\-based software system that extracts text and rendered images from PDF slides and processes them through a four\-stage large language model pipeline comprising window planning, deck synthesis, slide annotation, and reconciliation\. The system reasons jointly about slide modality and pedagogical role, allocates bounded question budgets, and revises draft annotations at the deck level to reduce redundancy and improve coverage\. The final output is a structured JSON annotation containing deck\-level goals, section structure, slide\-level summaries, question sets, and evaluation scores\. Initial experiments on two technical lecture decks indicate that the pipeline can filter non\-instructional slides and produce high\-fidelity, pedagogically coherent questions for visually complex content\.
The working system is at[https://slidesqaqa\-974767694043\.us\-west1\.run\.app](https://slidesqaqa-974767694043.us-west1.run.app/)
The software repository is at[https://github\.com/jsalsman/slidesqaqa](https://github.com/jsalsman/slidesqaqa)
Keywords:automated question generation, large language models, multimodal learning, document understanding, educational technology, slide decks
Educators frequently rely on presentation slide decks as the primary medium for delivering instructional content, yet creating targeted, high\-fidelity comprehension questions for each slide requires significant manual effort and pedagogical expertise\. While automated approaches to question generation have been explored to alleviate this burden, they often struggle with the inherent multi\-modal nature of modern presentations, where information is distributed across text, images, and diagrams, and limited by the context windows of underlying models\. Developing robust automated systems is critical because high\-quality pedagogical questions are known to significantly enhance active learning, student engagement, and long\-term retention\. Standard generative text models, which form the basis of many early automated tools, typically struggle to interpret the visual information—such as charts, graphs, or complex diagrams—that carries much of the semantic weight in contemporary slide decks\.
To address these challenges, we present slidesqaqa, a novel software system designed to automate the extraction and generation process\. slidesqaqa takes a PDF presentation as input, renders each individual slide into a visual thumbnail, extracts the corresponding raw text, and meticulously coordinates a sophisticated four\-phase Large Language Model \(LLM\) workflow comprising Window Planning, Deck Synthesis, Slide Annotation, and Reconciliation\. The system ultimately outputs a comprehensive, hierarchical JSON structure that details the deck’s topics, categorizes slide roles, defines specific learning goals, and provides bounded, context\-aware comprehension questions\.
The main contributions of this paper are threefold: first, we introduce a windowed deck\-planner approach designed specifically to mitigate LLM context limits while preserving the critical narrative flow of the presentation; second, we propose static heuristics for dynamically balancing question budgets based on the detected modality of each slide; and third, we detail a multi\-pass generative architecture that culminates in a dedicated reconciliation step to ensure consistency and quality\. The remainder of this paper is structured as follows: we first review related work in automated question generation and multimodal analysis, then outline the system’s goals and high\-level architecture, before detailing its concrete implementation using Flask and PyMuPDF and discussing the methods used for inference\. Finally, we conclude with a discussion of future avenues for evaluation and acknowledging current system limitations\.
### Related Work
Traditional automated question generation \(AQG\) has heavily focused on text\-based inputs using NLP techniques to convert declarative sentences into educational questions, establishing foundational metrics for evaluating coverage and pedagogical utility\(Kurdi et al\.,[2020](https://arxiv.org/html/2605.26428#bib.bib3); Gorgun and Bulut,[2024](https://arxiv.org/html/2605.26428#bib.bib1)\)\. However, modern pedagogical materials, particularly slide decks, are inherently multimodal\.
The task of extracting information from these visual and textual modalities closely relates to Visual Question Answering \(VQA\)\(Kim et al\.,[2025](https://arxiv.org/html/2605.26428#bib.bib2)\)\. The recent emergence of multimodal large language models \(MLLMs\) provides a strong baseline architecture capable of processing interleaved text and image inputs\(Yin et al\.,[2024](https://arxiv.org/html/2605.26428#bib.bib5)\)\. Yet, applying standard MLLMs or naive text\-only pipelines to slide decks often leads to poor performance\. Research has shown that Vision\-Language Models \(VLMs\) can struggle with fine\-grained evidence grounding and modality reliance, especially when interpreting complex layouts involving diagrams, charts, and formulas\(Sim et al\.,[2025](https://arxiv.org/html/2605.26428#bib.bib4)\)\.
Furthermore, existing systems often process documents in single large chunks or disjointed pages, ignoring structural flow and relying on brittle OCR\. This approach loses narrative coherence and fails to capture layout\-dependent semantics crucial for effective visually rich document retrieval\(Zhang,[2025](https://arxiv.org/html/2605.26428#bib.bib6)\)\. slidesqaqa addresses these gaps by stitching rendered thumbnails into multi\-slide contact sheets using a sliding\-window chunking strategy to retain transitional context, and by implementing a dynamic reconciliation phase to balance question budgets across heterogeneous slide types\.
### Research Questions
This study seeks to answer the following research questions regarding the automated generation of educational questions from multimodal slide decks:
- •RQ1:To what extent does a multi\-stage overlapping window pipeline improve the pedagogical scaffolding of generated questions compared to naive per\-slide generation?
- •RQ2:How does explicit consideration of slide modality and role affect the fidelity \(groundedness\) and coverage of the resulting question sets?
- •RQ3:Can static heuristics combined with a dynamic reconciliation phase effectively balance question budgets across heterogeneous slide types without manual intervention?
## Method
### System Overview
The high\-level architecture of slidesqaqa is structured as a robust, single\-server Python Flask application, which serves as the central hub coordinating both document processing and complex model interactions\. At its core, the system relies on PyMuPDF for high\-fidelity document extraction, meticulously pulling out structural components and textual content from uploaded presentations\. For the generation of semantic insights and pedagogical questions, the application integrates tightly with the Google GenAI SDK, providing the necessary inference capabilities to process multimodal inputs effectively\.
This architecture is composed of three main, tightly coupled components, each fulfilling a critical role in the pipeline\. ThePreprocessoracts as the initial ingestion layer; it directly reads the uploaded PDF document, systematically extracts all available native text, and simultaneously renders high\-resolution PNG images of each slide to capture its visual layout\. Following extraction, theLLM Pipeline Enginetakes over, serving as the system’s orchestrator\. It manages the execution of four distinct, sequential generation phases—Window Planning, Deck Synthesis, Slide Annotation, and Reconciliation—while strictly enforcing that all outputs conform to strongly typed JSON schemas, ensuring data integrity\. Finally, theFrontend UIprovides the user interface; it is an embedded HTML and JavaScript page designed to function as both a real\-time log viewer, providing transparency into the generation process, and a dynamic data visualizer for the final output\.
The typical data flow through the slidesqaqa system \(as illustrated in Figure[1](https://arxiv.org/html/2605.26428#Sx2.F1)\) is linear and highly structured\. The process initiates with a user uploading a PDF document, which immediately triggers the Text and Image extraction phase handled by the Preprocessor\. The extracted data is then fed into the Window Planning stage, where initial contextual windows are formed\. This is followed by Synthesis and Heuristics, where the overall deck structure is inferred and question budgets are allocated\. The pipeline then proceeds to Slide Annotation, where the actual pedagogical questions are generated based on the prior analysis\. The Reconciliation phase subsequently reviews and refines these annotations to ensure consistency and adherence to the allocated budgets\. Ultimately, the refined data undergoes Final JSON Compilation, assembling the results into a unified structure that is seamlessly streamed back to the Client via the Frontend UI\.
### Design and Architecture
The preprocessor relies on the ‘fitz‘ library \(PyMuPDF\) to extract native text and render scaled PNG images representing the visual layout of each slide\.
User uploads PDF or provides URL1\. PDF PreprocessingExtract text & render PNGs2\. Window Planning PhaseChunk slides into overlapping windowsLLM infers roles & budgets3\. Deck Synthesis PhaseMerge window plansInfer deck\-level topic & goalsApply Static HeuristicsZero\-out exact duplicates & titles4\. Slide Annotation PhaseGenerate questions for eligible slidesusing slide & deck context5\. Reconciliation PhaseEvaluate coverage/scaffoldingAdjust budgets and actionsAction needsrewrite/expand/reduce?Re\-run AnnotationGenerate adjusted questionsCompile Final JSON AnnotationStream JSON to ClientYesNo
Figure 1:Conceptual LLM pipeline data flow for the slidesqaqa application\.The LLM Generation Pipeline conducts four major passes: 1\.Window Planner:Analyzes overlapping windows of slides, creating a contact sheet image to infer titles, summaries, modalities, roles, and a provisional question budget\. 2\.Deck Synthesis:Merges window analyses into one coherent deck plan\. 3\.Slide Annotator:Analyzes individual slides requiring questions based on the deck plan, generating specific questions\. 4\.Reconciliation:Evaluates the provisional full\-deck annotation for redundancies and unbalanced coverage\.
The core data structures are strictly typed using Pydantic models \(SlidePlan,SlideAnnotationModel,ReconciliationModel\)\. The backend relies on Flask with a streaming endpoint, and the frontend uses vanilla JavaScript\.
### Experimental Setup
To evaluate the system, we analyzed academic lecture slide decks focusing on complex subjects such as Natural Language Processing and Deep Learning\. Specifically, we used two comprehensive decks: “Self\-Attention and Transformers” \(cs224n\-2024\-lecture08\-transformers\)111Data shown at[https://pastebin\.com/raw/5G0s2r2p](https://pastebin.com/raw/5G0s2r2p)\.and “Neural Constituency Parsing” \(cs288\-sp23\-neural\-parsing\)222Data shown at[https://pastebin\.com/raw/m15KdeVG](https://pastebin.com/raw/m15KdeVG)\.\. These decks were selected because they prominently feature structural slides \(e\.g\., titles, agendas\) as well as highly complex, mechanism\-heavy slides relying on intricate diagrams \(e\.g\., LSTM bidirectional routing, mean\-pooling aggregation, and span classification\)\.
We evaluate the generated question sets across three key metrics derived from the multi\-stage pipeline logs: Coverage \(alignment with slide content and learning goals\), Fidelity \(accuracy and grounding in both visual and textual evidence\), and Scaffolding \(logical progression and pedagogical relevance of the questions in the context of the entire deck\)\. The system automatically evaluates these metrics on a 1\-5 scale during the generation process, as summarized in Table[1](https://arxiv.org/html/2605.26428#Sx2.T1)\.
Table 1:Qualitative summary of experimental results across evaluated slide decks\.
## Results
The multi\-stage pipeline reliably filtered structural slides and concentrated generation efforts on core pedagogical material\. For instance, in the “Self\-Attention and Transformers” deck, the reconciliation phase correctly zero\-budgeted administrative and transition slides, reducing the overall question load while maintaining narrative coherence\. When generating questions, the system achieved high quantitative marks: questions consistently earned Fidelity scores of 5 and Coverage scores of 4 to 5 across both lecture decks\. Scaffolding scores also remained high \(typically 3 to 5\), indicating that the generated items served as solid foundational checks and progressed logically\.
Qualitatively, the slidesqaqa pipeline successfully interpreted complex multimodal diagrams\. For example, in a slide detailing mean\-pooling as a conceptual stepping stone toward attention, the system generated a question asking how “Sentence encoding” is computed, correctly identifying the answer as “taking the element\-wise max or mean of all hidden states” \(Fidelity: 5, Coverage: 5\)\. Similarly, in the “Neural Constituency Parsing” deck, when interpreting a visual diagram of LSTM units, the system correctly generated questions addressing how bidirectional routing works, accurately citing “horizontal arrows pointing left and right between the boxes” as evidence for how context is gathered across the sequence \(Fidelity: 5\)\.
## Discussion
The experimental results strongly support the hypothesis that a multi\-stage, layout\-aware pipeline is necessary for generating high\-quality pedagogical questions from slide decks\. The high fidelity and coverage scores demonstrate the value of the Window Planner and Reconciliation phases\. slidesqaqa’s ability to identify and reduce question budgets on redundant or transitional slides directly addresses a major limitation in existing naive AQG tools that treat every slide equally\.
The primary strength of this system is its robust handling of multi\-modal information\. The system was able to successfully identify visual cues, such as “horizontal arrows,” and map them to complex architectural concepts like bidirectional context gathering in neural networks\. However, we acknowledge several limitations\. Processing large decks incurs substantial LLM inference latency and API costs due to the multi\-pass architecture\. Furthermore, the reliance on a specific proprietary LLM \(Gemini\) poses potential risks regarding reproducibility and long\-term stability if the underlying API models are updated or deprecated\.
## Conclusion
slidesqaqa demonstrates a sophisticated approach to automated multi\-modal question generation, structuring instructional content extraction into a resilient four\-stage pipeline\. Chunking slides into visual contact sheets and rigorously employing type\-validated LLM generation significantly enhances the quality of structural and pedagogical analysis\. Future work may include comprehensive evaluation against human\-authored question sets, latency optimizations, and exploring fine\-tuned, smaller models for the window planning phases\.
## Disclosures
No external funding was received for this study\. The author declares no competing interests\. The data used in this study consist of lecture slide decks\. The datasets supporting the conclusions of this article are available at[https://pastebin\.com/raw/5G0s2r2p](https://pastebin.com/raw/5G0s2r2p)and[https://pastebin\.com/raw/m15KdeVG](https://pastebin.com/raw/m15KdeVG)\. The author was responsible for conceptualization, methodology, software, investigation, formal analysis, writing all drafts, review, and editing\. During the preparation of this work the author used Google Gemini Pro 3\.1, Google Jules, and OpenAI ChatGPT 5\.4 in order to produce both code and prose\. After using those tools, the author reviewed and edited the content thoroughly and takes full responsibility for the app and publication content\.
## References
- \\NAT@swatrue
- Gorgun and Bulut \(2024\)Gorgun, G\., and Bulut, O\.\(2024\)\.Exploring quality criteria and evaluation methods in automated question generation\.*Education and Information Technologies*, 1–30\.Retrieved from[https://doi\.org/10\.1007/s10639\-024\-12771\-3](https://doi.org/10.1007/s10639-024-12771-3)\\NAT@swatrue
- Kim et al\.\(2025\)Kim, B\. S\., et al\.\(2025\)\.Visual question answering: A survey of methods, datasets, evaluation, and challenges\.*ACM Computing Surveys*\.Retrieved from[https://doi\.org/10\.1145/3728635](https://doi.org/10.1145/3728635)\\NAT@swatrue
- Kurdi et al\.\(2020\)Kurdi, G\., Leo, J\., Parsia, B\., Sattler, U\., and Al\-Emari, S\.\(2020\)\.A systematic review of automatic question generation for educational purposes\.*International Journal of Artificial Intelligence in Education*,*30*\(1\), 121–204\.Retrieved from[https://doi\.org/10\.1007/s40593\-019\-00186\-y](https://doi.org/10.1007/s40593-019-00186-y)\\NAT@swatrue
- Sim et al\.\(2025\)Sim, M\. Y\., et al\.\(2025\)\.Can vlms actually see and read? a survey on modality collapse in vision\-language models\.In*Findings of the association for computational linguistics \(acl\)\.*Retrieved from[https://www\.semanticscholar\.org/paper/Can\-VLMs\-Actually\-See\-and\-Read\-A\-Survey\-on\-Modality\-Sim\-Zhang/003dece9fddb6ae24c87e5d20fec8ff4a108f7ae](https://www.semanticscholar.org/paper/Can-VLMs-Actually-See-and-Read-A-Survey-on-Modality-Sim-Zhang/003dece9fddb6ae24c87e5d20fec8ff4a108f7ae)\\NAT@swatrue
- Yin et al\.\(2024\)Yin, S\., et al\.\(2024\)\.A survey on multimodal large language models\.*National Science Review*,*11*\(1\), nwae403\.Retrieved from[https://doi\.org/10\.1093/nsr/nwae403](https://doi.org/10.1093/nsr/nwae403)\\NAT@swatrue
- Zhang \(2025\)Zhang, X\.\(2025\)\.Roles of mllms in visually rich document retrieval for rag: A survey\.In*Proceedings of ijcnlp/aacl\.*Retrieved from[https://aclanthology\.org/2025\.ijcnlp\-long\.2/](https://aclanthology.org/2025.ijcnlp-long.2/)
## Appendix A: Output JSON Schema
This appendix describes the structure and descriptions of the final output JSON schema generated by the slidesqaqa application\.
\{
"schema\_version":"VersionofthisJSONschema\.",
"field\_descriptions":\{
"\.\.\.":"Fielddescriptionsmapping"
\},
"deck\_metadata":\{
"deck\_id":"Stableidentifierforthisdeck\.",
"deck":"Fullacademiccitationforthedeck\.",
"deck\_url":"OriginalsourceURLforthedeck,ifknown\.",
"source\_file":"LocaluploadedPDFfilename\.",
"total\_slides":"TotalnumberofPDFpagesprocessedasslides\.",
"processed\_at":"UTCtimestampwhenthisJSONwasproduced\."
\},
"deck\_analysis":\{
"deck\_topic":"Shortdescriptionoftheoveralltopicofthedeck\.",
"target\_audience":"Estimatedaudiencelevel;forexampleundergraduate,graduate,ormixed\.",
"learning\_goals":\["Listofdeck\-levellearninggoalsinferredfromtheslides\."\],
"sections":\[
\{
"section\_id":"String",
"start\_slide":"Integer",
"end\_slide":"Integer",
"section\_title":"String",
"section\_summary":"String"
\}
\],
"coverage\_targets":\["Deck\-levelcontenttargetssuchastext,diagram,table,chart,layout\-aware,orimage\-plus\-text\."\],
"global\_notes":"Importantglobalcaveats,ambiguities,orobservations\."
\},
"reconciliation":\{
"revised\_slide\_actions":\[
\{
"slide\_number":"Integer",
"action":"String",
"new\_question\_budget":"Integer",
"reason":"String"
\}
\],
"deck\_reconciliation\_notes":"Globalnotesaboutredundancy,balancing,andqualityadjustmentsacrossthedeck\.",
"uncovered\_learning\_goals":\["Decklearninggoalsthatremainweaklycoveredafterreconciliation\."\],
"redundancy\_warnings":\["Warningsaboutoverlappingorrepeatedquestionsetsacrossslides\."\]
\},
"slides":\[
\{
"slide\_id":"Stableidentifierforaslidewithinthedeck\.",
"slide\_number":"1\-basedslidenumbercorrespondingtothePDFpageorder\.",
"slide\_title":"Visibletitleontheslideifpresent;otherwiseaconcisegeneratedtitle\.",
"modality\_type":"Dominantvisualformoftheslide;forexampletext,diagram,table,chart,layout\-aware,image\-plus\-text,ormixed\.",
"role\_in\_deck":"Instructionalroleoftheslidewithinthedeck;forexampletitle,agenda,transition,definition,example,mechanism,result,summary,orappendix\.",
"local\_summary":"One\-ortwo\-sentencesummaryoftheslide’smaininstructionalcontent\.",
"key\_concepts":\["Listofkeyconceptsexplicitlypresentontheslide\."\],
"evidence\_regions":\["Listofhuman\-readabledescriptionsofimportantvisibleregionsontheslide\."\],
"eligible\_for\_questions":"Whethertheslideshouldreceiveanycomprehensionquestions\.",
"eligibility\_reason":"Explanationforwhytheslideshouldorshouldnotreceivequestions\.",
"question\_budget":"Recommendednumberofquestionsforthisslideindeckcontext\.",
"question\_mix":\["Recommendedmixofquestiontypesforthisslide\."\],
"questions":\[
\{
"question\_id":"Stableidentifierforaquestionwithinaslide\.",
"question\_type":"Controlledlabelforthequestionformorreasoningtype\.",
"prompt":"Questiontextshowntothelearner\.",
"options":\["Listofansweroptionsforamultiple\-choiceitem;emptyotherwise\."\],
"answer":"Goldanswerorboundedreferenceanswergroundedintheslide\.",
"evidence\_span":"Shortdescriptionofwheretheanswerisvisibleontheslide\.",
"difficulty":"Relativedifficultylabelsuchaslow,medium,orhigh\.",
"purpose":"Instructionalpurposesuchasterminology,relationcheck,interpretation,orsynthesis\.",
"fidelity\_score":"1\-5judgmentofwhetherthequestionisanswerablefromtheslidealone\.",
"fidelity\_notes":"Shortrationaleforthefidelityscore\."
\}
\],
"evaluation":\{
"coverage\_score":"1\-5scoreforhowwelltheslide’squestionbundlecoverstheslide’simportantcontent;nullwhentheslideintentionallyhasnoquestions\.",
"coverage\_notes":"Shortrationaleforthecoveragescore\.",
"scaffolding\_score":"1\-5scoreforhowwellthequestionbundleformsaninstructionalprogression;nullwhentheslideintentionallyhasnoquestions\.",
"scaffolding\_notes":"Shortrationaleforthescaffoldingscore\."
\}
\}
\]
\}
## Appendix B: LLM Prompts
This appendix contains the exact system prompt strings used in the four\-phase pipeline of the slidesqaqa application\.
WINDOW\_PLANNER\_PROMPT="""
Youareanalyzingacontiguouswindowfromalargerlectureslidedeck\.
Foreachslideinthiswindow:
\-inferslide\_title
\-writeashortlocal\_summary
\-assignmodality\_type
\-assignrole\_in\_deck
\-decidewhethertheslideiseligibleforlearner\-facingcomprehensionquestions
\-giveaneligibility\_reason
\-assignaquestion\_budgetfrom0to5
\-assignaquestion\_mix
Important:
\-Useneighboringslidesinthewindowtoreasonaboutredundancyandtransitions\.
\-Itisacceptabletoassignzeroquestions\.
\-Donotforceafixednumberofquestions\.
\-Favorlowbudgetsfortitle,agenda,transition,administrative,appendix,andrepeatedrecapslides\.
\-Favorhigherbudgetsforrichmechanism,comparison,result,diagram,chart,table,orsynthesisslides\.
\-question\_mixmustuseonlythesevalues:
\["fill\_blank","mcq","open\_ended","short\_answer","diagram\_labeling","comparison","interpretation","evidence\_localization"\]
\-modality\_typemustuseonlythesevalues:
\["text","diagram","table","chart","layout\-aware","image\-plus\-text","mixed"\]
\-role\_in\_deckmustuseonlythesevalues:
\["title","agenda","transition","definition","example","mechanism","comparison","result","summary","administrative","appendix","review","reference"\]
ReturnJSONonly\.DonotincludeexplanatoryproseoutsideJSON\.
"""
DECK\_SYNTHESIS\_PROMPT="""
Youaremergingoverlappingwindow\-levelanalysesofonelectureslidedeckintoonefinaldeckplan\.
Goals:
1\.Inferthedecktopicandlikelytargetaudience\.
2\.Inferdeck\-levellearninggoals\.
3\.Producesectionboundariesforthefulldeck\.
4\.Resolveconflictingwindow\-levelslideplansconservatively\.
5\.Returnexactlyoneslideplanobjectperslidenumber\.
Important:
\-Preservezero\-questionslideswhentheyarenon\-instructional,redundant,ortoothin\.
\-Someslidesmaydeservemorethanthreequestions\.
\-Keepquestionbudgetsbasedoninstructionalimportance,self\-containedness,evidencerichness,andnovelty\.
\-Useonlytheallowedlabelvocabulariesalreadypresentinthewindowplans\.
\-Sectionsshouldbecontiguousandordered\.
ReturnJSONonly\.DonotincludeexplanatoryproseoutsideJSON\.
"""
SLIDE\_ANNOTATOR\_PROMPT="""
Youaregeneratingslidesqaqaannotationsforoneslidewithinalecturedeck\.
Useboththelocalslideevidenceandtheprovideddeckcontext\.
Yourtasks:
1\.Identifykey\_conceptsexplicitlypresentontheslide\.
2\.Identify2to6evidence\_regionsasshorthuman\-readabledescriptionsofimportantvisibleregions\.
3\.Generateexactlytheassignedquestionbudgetinthesuppliedquestionmix\.
4\.Everyquestionmustbeanswerablefromtheslidealone\.
5\.Everyanswermustbeboundedandevidence\-grounded\.
6\.Usedeckcontextonlytodecidewhatiseducationallyimportant\.Donotanswerfromhiddenlectureknowledge\.
7\.Avoidredundancywiththeneighboringslideswhenpossible\.
Question\-writingguidance:
\-Ontextslides,favorterminology,distinctions,andconciseexplanation\.
\-Ondiagramslides,favorcomponentlabeling,relationships,flow,andmechanism\.
\-Ontable/chartslides,favorlookup,comparison,trend,andinterpretation\.
\-Onlayout\-awareslides,favorspatialorgrouping\-basedreasoningwhenrelevant\.
\-Ifaquestion\_typeismcq,includeexactly4options\.
\-Ifaquestion\_typeisnotmcq,optionsmustbeanemptylist\.
\-fidelity\_scoremustbeanintegerfrom1to5\.
\-coverage\_scoreandscaffolding\_scoremustbeintegersfrom1to5\.
Coverageguidance:
\-1meanspoorcoverageorrepeatedtinyfacts\.
\-3meansadequatecoverageofthemainconceptandatleastonesecondaryelement\.
\-5meansstrongcoverageoftheslide’simportantvisiblecontent\.
Scaffoldingguidance:
\-1meansrandomordisconnected\.
\-3meansreasonableprogression\.
\-5meanscoherentprogressionfromsimplertodeeperunderstanding\.
ReturnJSONonly\.DonotincludeexplanatoryproseoutsideJSON\.
"""
RECONCILIATION\_PROMPT="""
Youarereconcilingaprovisionalslidesqaqaannotationsetforafulllecturedeck\.
Youaregiven:
\-deckmetadata
\-deckanalysis
\-allslideplans
\-allprovisionalslideannotations
Yourtaskistoimprovethedeckasawhole\.
Goals:
1\.Detectredundantquestionsetsacrossnearbyslides\.
2\.Detectslidesthatshouldhavefewerquestions\.
3\.Detectrichslidesthatdeservemorequestions\.
4\.Detectplaceswherelearning\-goalcoverageisunbalanced\.
5\.Detectweakscaffoldingwithinsections\.
Rules:
\-Donotforcesimilarbudgetsacrossallslides\.
\-Preservezero\-questionslideswhentheyaretrulynon\-instructionalorredundant\.
\-Preferdeletingweakorredundantquestionsratherthaninventingextraones\.
\-Usethisactionvocabularyonly:
\["keep","reduce","expand","zero\_out","rewrite"\]
\-Foreachslide,returnoneactionandanew\_question\_budgetbetween0and5\.
"""Similar Articles
DeepSlide: From Artifacts to Presentation Delivery
DeepSlide is a human-in-the-loop multi-agent system for the full presentation process, from requirement elicitation and time-budgeted narrative planning to evidence-grounded slide-script generation and rehearsal support. It introduces a dual-scoreboard benchmark separating static artifact quality from dynamic delivery excellence, and achieves gains in narrative flow, pacing precision, and slide-script synergy.
Narrative-Driven Paper-to-Slide Generation via ArcDeck
ArcDeck is a multi-agent framework that generates presentation slides from academic papers by modeling logical flow through discourse trees and iterative agent refinement, outperforming direct summarization methods. The paper introduces ArcBench, a new benchmark for evaluating paper-to-slide generation with emphasis on narrative coherence and logical structure.
AI-Generated Slides: Are They Good? Can Students Tell?
This paper examines using generative AI tools (NotebookLM, Claude, M365 Copilot, Cursor, Claude Code) to generate slides from instructor notes, finding that coding assistants produce the best slides and that students cannot reliably distinguish AI-generated slides from human-created ones.
PQR: A Framework to Generate Diverse and Realistic User Queries that Elicit QA Agent Failures
Introduces PQR, a framework that automatically generates diverse and realistic user queries to uncover failures in LLM-based QA agents, achieving 23-78% more unhelpful responses compared to prior methods.
I've been starting from a blank document every time I need to build a presentation. Claude can build the whole deck from a conversation and I only figured this out recently.
Describes a workflow using Claude AI to build presentations through a conversational interview process, resulting in better-structured slides than manual drafting.