Tag
Introduces Bayesian Manifold Curriculum (BMC), an adaptive curriculum learning method for LLMs that leverages the model's latent geometry to allocate training effort across diverse problem types, improving efficiency beyond traditional difficulty-based curricula.
Pythagoras-Prover is a compute-efficient family of Lean theorem provers that achieves strong performance using curriculum supervised fine-tuning and a novel Augmented Lean Formalisation technique. The 4B model surpasses DeepSeek-Prover-V2-671B at pass@32 on MiniF2F-Test, and the 32B model sets a new state-of-the-art among open-source provers.
This paper proposes Representation Curriculum (RC), a training-time intervention that stages feature utilization to reduce over-reliance on exposure-confounded historical signals and improve cold-start generalization in ranking systems. The method is theoretically analyzed and validated on public benchmarks and large-scale eBay search experiments.
This paper investigates sequential fine-tuning of LLaMA-3.1-8B for automated essay scoring using a curriculum aligned with discourse structure, showing improved coherence and performance compared to independent or randomized training.
This paper theoretically studies how transformer-based policies acquire search capabilities from reinforcement learning training dynamics in a stochastic tree environment. It shows that a two-head transformer can implement depth-first search and that this mechanism emerges naturally from sparse reward signals under a depth-wise curriculum.
Introduces the Data-Model Compatibility (DMC) metric to evaluate how well a reasoning dataset aligns with a student model during distillation. Experiments show DMC strongly correlates with distillation performance and that dynamically selecting datasets based on DMC further improves reasoning capabilities.
This paper introduces Micro-Macro Retrieval (M2R), a retrieve-while-generate framework that reduces hallucination in long-form LLM outputs by ensuring key information stays close to generated text. It uses curriculum learning-based reinforcement learning to train retrieval and grounding skills, showing effectiveness especially in lengthy contexts.
This paper proposes Staged-Competence, a curriculum learning framework for DPO-based safety alignment that organizes preference data by difficulty, improving robustness and data efficiency while preserving general capabilities.
Introduces TELL, an AI-generated text detection system that provides explainable annotations alongside numerical scores, achieving competitive AUROC of 0.927 while enabling users to judge authorship based on highlighted textual indicators.
This paper empirically studies how the composition of training data (curriculum) affects the skills learned by RL-based memory agents in multi-session question answering. It finds that curriculum composition acts as a fine-grained lever on specialization, with mixed benchmarks yielding the best overall performance and narrow out-of-domain sets transferring targeted temporal reasoning skills.
This paper proposes a plug-and-play module using self-paced curriculum learning to enhance modality balance in multimodal conversational emotion recognition, achieving consistent F1-score improvements on IEMOCAP and MELD datasets.
RPS is a two-stage LLM post-training method inspired by neuroscience, combining curriculum learning with learning rate decay. Preliminary results show improved program synthesis reliability on Qwen3-8b compared to equal learning rate training.
SCRL is a curriculum reinforcement learning framework that uses subproblem-level normalization and curriculum learning to improve credit assignment in LLM reasoning, outperforming baselines on mathematical reasoning benchmarks.
Introduces PROWL, a prioritized regret-driven optimization framework that uses an adversarial curriculum to improve diffusion-based world model robustness by focusing on high-error trajectories, achieving better performance on out-of-distribution scenarios in MineRL.
This paper proposes a staged training approach for vision-language models that separates visual perception, visual reasoning, and textual reasoning into distinct stages. The method improves visual reasoning accuracy while reducing reasoning trace length, demonstrating that stronger perception reduces the need for excessive reasoning.
This paper proposes a spatially correlated curriculum learning framework for Physics-Informed Neural Networks (PINNs) that improves training stability and solution accuracy by leveraging spatial correlations among subregions, addressing issues like high-dimensional non-convex loss landscapes and imbalanced multi-objective constraints.
Presents VectraYX-Nano, a 42M-parameter decoder-only language model trained from scratch in Spanish for cybersecurity, featuring curriculum learning, native tool invocation via MCP, and a 170M-token corpus. Empirical findings reveal a loss-versus-register inversion and corpus-density artifacts for tool-use capability.
Researchers release SU-01, a 30B-A3B reasoning model achieving gold-medal-level performance on physics and math Olympiad problems using a unified scaling recipe for proof search.
BACR introduces adaptive token budgeting and curriculum-aware scheduling to prevent LLMs from overthinking easy problems and underthinking hard ones, cutting token use 34% while boosting accuracy up to 8.3%.
LiFT is a longitudinal instruction fine-tuning framework that unifies diverse temporal NLP tasks under a shared instruction schema with curriculum-based training. Evaluated across OLMo, LLaMA, and Qwen models, LiFT consistently outperforms base-model in-context learning, especially on out-of-distribution data and rare change events.