Tag
BACR introduces adaptive token budgeting and curriculum-aware scheduling to prevent LLMs from overthinking easy problems and underthinking hard ones, cutting token use 34% while boosting accuracy up to 8.3%.
LiFT is a longitudinal instruction fine-tuning framework that unifies diverse temporal NLP tasks under a shared instruction schema with curriculum-based training. Evaluated across OLMo, LLaMA, and Qwen models, LiFT consistently outperforms base-model in-context learning, especially on out-of-distribution data and rare change events.
Introduces ToolsRL, a two-stage reinforcement learning framework that teaches multimodal LLMs to use simple visual tools for complex visual reasoning tasks.
CLewR introduces a curriculum learning strategy with restarts for improving machine translation performance in LLMs through preference optimization. The method addresses catastrophic forgetting by iterating easy-to-hard curriculum multiple times, showing consistent gains across Gemma2, Qwen2.5, and Llama3.1 models.
This paper proposes the Implicit Curriculum Hypothesis, demonstrating that language model pretraining follows a structured, compositional curriculum where capabilities emerge consistently across architectures and can be predicted from internal representations. The authors validate this through designed tasks spanning retrieval, morphology, coreference, reasoning, and mathematics, finding highly consistent emergence orderings (ρ=0.81) across four model families.
OpenAI achieved a new state-of-the-art 41.2% on the miniF2F formal math olympiad benchmark using a technique called 'statement curriculum learning,' which iteratively trains a neural prover on proofs of increasing difficulty. The approach builds on iterative proof search and retraining over 8 iterations to significantly outperform the previous best of 29.3%.
OpenAI researchers introduce VALOR, a variational inference method for option discovery that connects option learning to variational autoencoders, and propose a curriculum learning approach that stabilizes training by dynamically increasing context complexity.
OpenAI presents Hindsight Experience Replay (HER), a technique enabling sample-efficient reinforcement learning from sparse binary rewards without complex reward engineering. It is demonstrated on robotic arm manipulation tasks including pushing, sliding, and pick-and-place, and validated on physical robots.
OpenAI proposes Teacher–Student Curriculum Learning (TSCL), a framework where a Teacher algorithm automatically selects subtasks for a Student to learn complex tasks, optimizing based on learning curve slope and preventing forgetting. The approach matches or surpasses hand-crafted curricula on decimal addition and Minecraft navigation tasks, enabling solutions previously impossible with direct training.
This paper explores extensions and limitations of the Neural GPU model, demonstrating improvements through curriculum design and scaling, enabling it to learn arithmetic operations on decimal numbers and long expressions while identifying failure modes on symmetric inputs analogous to adversarial examples.