Tag
MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.
This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.
Introducing AwesomeOPD, a curated list of open-source code and papers related to On-Policy Distillation (OPD) and Self-Distillation used in the training of LLMs, VLMs, and Agents. Resources in this list are meticulously categorized and tagged based on teacher source, supervision signal, rollout usage, and training stage.
Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.
This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.
Cornell researchers propose POP, a self-play framework that lets an LLM generate its own rubrics and training pairs for open-ended tasks, boosting Qwen-2.5-7B on healthcare QA, creative writing and instruction following without human labels.
Researchers introduce a method to automatically augment commonsense knowledge corpora with negation, creating 2M+ triples that improve LLM negation understanding when used for pre-training.
Simon Willison's link post highlights a dataset or project titled 'pelicans_riding_bicycles', likely used for LLM training or generative AI experimentation.
Karpathy's autoresearch repository has sparked a trend where agents train AI models to build state-of-the-art agentic systems, highlighting current limitations in LLM-driven hypothesis generation.
Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.
Hugging Face and Unsloth are offering free credits and training resources to fine-tune AI models using Hugging Face Jobs, enabling developers to train small language models like LFM2.5-1.2B-Instruct with 2x faster training and 60% less VRAM usage through coding agents like Claude Code and Codex.
DeepLearning.AI launches 'Build with Andrew,' a course enabling non-coders to build web applications using AI in under 30 minutes, while research addresses LLM transparency issues including model honesty and automated scientific research capabilities.