llm-training

#llm-training

@QingQ77: Collecting open-source code and papers on On-Policy Distillation and Self-Distillation for training LLMs/VLMs/Agents, tagged by four dimensions: teacher source, supervision signal, rollout usage, and training stage. https://g…

X AI KOLs Timeline ↗ · 4d ago Cached

Introducing AwesomeOPD, a curated list of open-source code and papers related to On-Policy Distillation (OPD) and Self-Distillation used in the training of LLMs, VLMs, and Agents. Resources in this list are meticulously categorized and tagged based on teacher source, supervision signal, rollout usage, and training stage.

0 favorites 0 likes

#llm-training

@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…

X AI KOLs Timeline ↗ · 5d ago Cached

Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.

0 favorites 0 likes

#llm-training

The Extrapolation Cliff in On-Policy Distillation of Near-Deterministic Structured Outputs

Hugging Face Daily Papers ↗ · 5d ago Cached

This paper identifies a safety threshold in on-policy distillation with reward extrapolation, beyond which structured output tasks lose format preservation. Empirical validation shows that operating below this threshold allows a 1.7B student model to match an 8B SFT baseline on Amazon Fashion tasks with one-fifth the parameters.

0 favorites 0 likes

#llm-training

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

arXiv cs.LG ↗ · 6d ago Cached

This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.

0 favorites 0 likes

#llm-training

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

arXiv cs.CL ↗ · 2026-04-23 Cached

Cornell researchers propose POP, a self-play framework that lets an LLM generate its own rubrics and training pairs for open-ended tasks, boosting Qwen-2.5-7B on healthcare QA, creative writing and instruction following without human labels.

0 favorites 0 likes

#llm-training

Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding

arXiv cs.CL ↗ · 2026-04-23 Cached

Researchers introduce a method to automatically augment commonsense knowledge corpora with negation, creating 2M+ triples that improve LLM negation understanding when used for pre-training.

0 favorites 0 likes

#llm-training

scosman/pelicans_riding_bicycles

Simon Willison's Blog ↗ · 2026-04-21 Cached

Simon Willison's link post highlights a dataset or project titled 'pelicans_riding_bicycles', likely used for LLM training or generative AI experimentation.

0 favorites 0 likes

#llm-training

@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…

X AI KOLs Following ↗ · 2026-04-21 Cached

Karpathy's autoresearch repository has sparked a trend where agents train AI models to build state-of-the-art agentic systems, highlighting current limitations in LLM-driven hypothesis generation.

0 favorites 0 likes

#llm-training

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.

0 favorites 0 likes

#llm-training

Train AI models with Unsloth and Hugging Face Jobs for FREE

Hugging Face Blog ↗ · 2026-02-20 Cached

Hugging Face and Unsloth are offering free credits and training resources to fine-tune AI models using Hugging Face Jobs, enabling developers to train small language models like LFM2.5-1.2B-Instruct with 2x faster training and 60% less VRAM usage through coding agents like Claude Code and Codex.

1 favorites 1 likes

#llm-training

LLMs Go To Confession, Automated Scientific Research, What Copilot Users Want, Reasoning For Less

The Batch ↗ · 2026-01-09 Cached

DeepLearning.AI launches 'Build with Andrew,' a course enabling non-coders to build web applications using AI in under 30 minutes, while research addresses LLM transparency issues including model honesty and automated scientific research capabilities.

0 favorites 0 likes

llm-training

Submit Feedback