Tag
A pre-registered trial in Sierra Leone found that AI-powered Guided Learning significantly improved math scores, achieving 1.2 to 1.7 years of progress in eight weeks, while teachers reported enhanced professional growth and a shift toward facilitation roles.
This paper introduces Elmes+, an automated framework for constructing fine-grained evaluation rubrics for LLMs in long-tail educational scenarios, and presents the Edu-330 benchmark covering 330 scenarios across 11 subjects. The framework uses a multi-agent engine and self-evolving module to co-optimize evaluation criteria and test data, revealing multidimensional educational capability differences among top LLMs.
TeachObs introduces a human-validated benchmark for multimodal teaching observation, consisting of 30 classroom videos annotated with segment-level binary codes and lesson-level expert ratings, and evaluates five frontier LLMs across three tracks, finding no single model consistently outperforms and that model evaluations overrate procedurally clear lessons.
This paper presents a framework that uses domain-specific expert knowledge to ground large language models for providing Just-in-Time adaptive feedback to students based on their written reasoning, achieving over 80% improvement in student performance in a large university course.
This paper presents a modular pipeline for educational analogy generation, decomposing the task into four stages and evaluating 12 LLMs and 7 embedding models. Results show that sub-concept grounding improves explanation quality and retrieval precision, with a novel LLM-as-a-judge evaluation validated against human annotations.
This paper presents a forward-looking perspective on agentic multi-agent AI platforms in higher education, addressing the need for integrated, inclusive systems that support learning, teaching, and institutional operations. It identifies gaps in current fragmented AI tools and proposes directions for scalable, human-aligned multi-agent ecosystems.
This paper details the RETUYT-INCO team's participation in the BEA 2026 Shared Task 2, introducing a meta-prompting approach for rubric-based scoring of German short answers.
This paper introduces MBP-KT, a framework for enhanced knowledge tracing that leverages meta-behavioral patterns to extract global collaborative information from learner interactions, improving performance across various downstream models.
This paper introduces Context-Aligned Contrastive Regression to improve lexical difficulty prediction by addressing cross-lingual alignment and ordinal structure challenges in language learning datasets.
This paper introduces NSMQ Riddles, a novel benchmark using scientific and mathematical riddles from Ghana's National Science and Maths Quiz to evaluate Large Language Models, addressing the underrepresentation of Global South datasets in AI research.
Researchers from Arizona State University present a framework for evaluating adaptive personalization of educational reading materials using theory-grounded simulated learners, incorporating memory models, misconception revision, and Bayesian Knowledge Tracing. Experiments across three subjects show adaptive reading significantly improved outcomes in computer science but had mixed results in chemistry and biology.