Tag
This paper investigates how TMK-based question generation strategies affect dataset quality for procedural and multi-hop reasoning in AI learning systems, comparing strict TMK generation, transcript-first generation, and TMK-aware generation, and introduces a grounding validation framework.
This paper introduces HieraRAG, a hierarchical framework for determining optimal granularity in RAG benchmarks. It generates 5,872 synthetic QA pairs across three dimensions and finds that ideal granularity varies by dimension, offering a portable procedure for practitioners.
This paper presents the FETCH classifier, which uses an ensemble of LLMs to generate follow-up questions for automated legal intake, evaluating question quality and cost trade-offs. It finds that high-cost models like GPT-5 are needed for effective plain-language questions, and proposes a rubric for evaluating such questions.
This paper introduces slidesqaqa, a Flask-based software system that generates pedagogically useful questions from PDF slide decks. It uses a four-stage LLM pipeline to extract text and images, plan questions across the deck, annotate slides, and reconcile outputs, demonstrating high-fidelity question generation on technical lecture slides.