llm-training

#llm-training

@0xLogicrw: MiniMax published a technical blog post detailing the root cause analysis for its M2 series large models' inability to output the person's name "Ma Jiaqi". Starting from a single case study, the investigation ultimately revealed a systematic degradation issue affecting nearly 5% of the entire vocabulary. The root cause was a severe disconnect in data coverage between the two training stages of the large model. In the first stage (pre-training), massive amounts of internet text were used to cre…

X AI KOLs Timeline ↗ · 11h ago

MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.

0 favorites 0 likes

#llm-training

@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…

X AI KOLs Timeline ↗ · 12h ago Cached

This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.

0 favorites 0 likes

#llm-training

@QingQ77: Collecting open-source code and papers on On-Policy Distillation and Self-Distillation for training LLMs/VLMs/Agents, tagged by four dimensions: teacher source, supervision signal, rollout usage, and training stage. https://g…

X AI KOLs Timeline ↗ · yesterday Cached

Introducing AwesomeOPD, a curated list of open-source code and papers related to On-Policy Distillation (OPD) and Self-Distillation used in the training of LLMs, VLMs, and Agents. Resources in this list are meticulously categorized and tagged based on teacher source, supervision signal, rollout usage, and training stage.

0 favorites 0 likes

#llm-training

@wsl8297: UC's Open Course on Reinforcement Learning for LLMs uses a 'theory + practice' approach to thoroughly explain key AI training techniques from the ground up, helping you systematically build a complete framework spanning from RL to LLM training. Comprehensive curriculum paired with complete resources: lecture slides, full videos, and practical exercises are all provided so you can start implementing right away…

X AI KOLs Timeline ↗ · yesterday Cached

Assistant Professor Ernest K. Ryu at UCLA offers the open course 'Reinforcement Learning for Large Language Models,' comprehensively analyzing key LLM training techniques like RLHF, PPO, and DPO alongside their supporting resources through a blend of theory and practice. The course provides developers and researchers with a systematic learning path from foundational algorithms to practical deployment.

0 favorites 0 likes

#llm-training

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.

0 favorites 0 likes

#llm-training

Bootstrapping Post-training Signals for Open-ended Tasks via Rubric-based Self-play on Pre-training Text

arXiv cs.CL ↗ · 2026-04-23 Cached

Cornell researchers propose POP, a self-play framework that lets an LLM generate its own rubrics and training pairs for open-ended tasks, boosting Qwen-2.5-7B on healthcare QA, creative writing and instruction following without human labels.

0 favorites 0 likes

#llm-training

Commonsense Knowledge with Negation: A Resource to Enhance Negation Understanding

arXiv cs.CL ↗ · 2026-04-23 Cached

Researchers introduce a method to automatically augment commonsense knowledge corpora with negation, creating 2M+ triples that improve LLM negation understanding when used for pre-training.

0 favorites 0 likes

#llm-training

scosman/pelicans_riding_bicycles

Simon Willison's Blog ↗ · 2026-04-21 Cached

Simon Willison's link post highlights a dataset or project titled 'pelicans_riding_bicycles', likely used for LLM training or generative AI experimentation.

0 favorites 0 likes

#llm-training

@omarsar0: Karpathy's autoresearch repo started an impressive trend. Agents can now train AI models to build SoTA agentic systems.…

X AI KOLs Following ↗ · 2026-04-21 Cached

Karpathy's autoresearch repository has sparked a trend where agents train AI models to build state-of-the-art agentic systems, highlighting current limitations in LLM-driven hypothesis generation.

0 favorites 0 likes

#llm-training

Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification

arXiv cs.CL ↗ · 2026-04-21 Cached

Researchers from University of Edinburgh propose a self-play framework using Liquid Haskell for formal verification to train LLMs on semantic equivalence reasoning, releasing OpInstruct-HSx dataset (28k programs) and achieving 13.3pp accuracy gains on EquiBench.

0 favorites 0 likes

#llm-training

Train AI models with Unsloth and Hugging Face Jobs for FREE

Hugging Face Blog ↗ · 2026-02-20 Cached

Hugging Face and Unsloth are offering free credits and training resources to fine-tune AI models using Hugging Face Jobs, enabling developers to train small language models like LFM2.5-1.2B-Instruct with 2x faster training and 60% less VRAM usage through coding agents like Claude Code and Codex.

0 favorites 0 likes

#llm-training

LLMs Go To Confession, Automated Scientific Research, What Copilot Users Want, Reasoning For Less

The Batch ↗ · 2026-01-09 Cached

DeepLearning.AI launches 'Build with Andrew,' a course enabling non-coders to build web applications using AI in under 30 minutes, while research addresses LLM transparency issues including model honesty and automated scientific research capabilities.

0 favorites 0 likes

llm-training

Submit Feedback