llm-training

#llm-training

Rotation-Preserving Supervised Fine-Tuning

arXiv cs.LG ↗ · 14h ago Cached

This paper introduces Rotation-Preserving Supervised Fine-Tuning (RPSFT), a method that improves out-of-domain generalization by preserving projected rotations in pretrained singular subspaces during fine-tuning.

0 favorites 0 likes

#llm-training

Learning Agentic Policy from Action Guidance

arXiv cs.CL ↗ · 14h ago Cached

The paper proposes ActGuide-RL, a method for training agentic policies in LLMs by using human action data as guidance to overcome exploration barriers in reinforcement learning without extensive supervised fine-tuning.

0 favorites 0 likes

#llm-training

YFPO: A Preliminary Study of Yoked Feature Preference Optimization with Neuron-Guided Rewards for Mathematical Reasoning

arXiv cs.CL ↗ · 14h ago Cached

This paper introduces YFPO, a neuron-guided preference optimization framework that uses internal activation signals to improve mathematical reasoning in large language models.

0 favorites 0 likes

#llm-training

Freeze Deep, Train Shallow: Interpretable Layer Allocation for Continued Pre-Training

arXiv cs.CL ↗ · 14h ago Cached

This paper proposes LayerTracer, an interpretable framework for layer allocation in continued pre-training, demonstrating that freezing deep layers while training shallow ones outperforms full-parameter fine-tuning. It offers a low-cost, actionable strategy for resource-constrained teams optimizing Large Language Models.

0 favorites 0 likes

#llm-training

@songhan_mit: Explore lightening OPD for efficient LLM post training:

X AI KOLs Following ↗ · yesterday

The article introduces a method to lighten OPD for efficient post-training of Large Language Models.

0 favorites 0 likes

#llm-training

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

arXiv cs.CL ↗ · yesterday Cached

This paper introduces On-Policy Harness Self-Distillation (OPHSD), a method that internalizes the capabilities of inference-time reasoning harnesses into the base model through self-distillation. The approach improves standalone performance on complex reasoning tasks, allowing the model to retain reasoning scaffolds without permanent external dependencies.

0 favorites 0 likes

#llm-training

DataArc-SynData-Toolkit: A Unified Closed-Loop Framework for Multi-Path, Multimodal, and Multilingual Data Synthesis

arXiv cs.LG ↗ · yesterday Cached

The article introduces DataArc-SynData-Toolkit, an open-source framework designed to simplify multi-path, multimodal, and multilingual synthetic data generation. It aims to lower technical barriers and improve usability for training large language models through a unified, configuration-driven pipeline.

0 favorites 0 likes

#llm-training

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

Hugging Face Daily Papers ↗ · yesterday Cached

This paper introduces Pion, a novel spectrum-preserving optimizer for large language model training that uses orthogonal equivalence transformations to maintain singular values during weight updates, offering stable performance comparable to standard optimizers.

0 favorites 0 likes

#llm-training

@UnslothAI: We’re excited to share that Unsloth has joined the PyTorch Ecosystem! Unsloth is an open-source project that makes trai…

X AI KOLs Following ↗ · 2d ago Cached

Unsloth, an open-source library for efficient LLM training and inference, has officially joined the PyTorch Ecosystem to enhance accessibility and performance. The announcement highlights new features like Unsloth Studio and optimized kernels for reduced VRAM usage.

0 favorites 0 likes

#llm-training

How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment

arXiv cs.LG ↗ · 2d ago Cached

This paper proposes Shadow Mask Distillation (SMD) to solve the off-policy bias caused by KV cache compression during reinforcement learning post-training for large language models. It introduces a mechanism that ensures on-policy alignment and improves memory efficiency for long-context reasoning tasks.

0 favorites 0 likes

#llm-training

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper introduces a training-free diagnostic framework to analyze per-token distillation signals for reasoning models, revealing that guidance is more beneficial on incorrect rollouts and depends on student capacity and task context.

0 favorites 0 likes

#llm-training

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper introduces G-Zero, a verifier-free framework that enables autonomous large language model self-improvement through co-evolutionary training using intrinsic rewards and hint-based guidance. It aims to overcome the limitations of proxy LLM judges in open-ended tasks by deriving supervision from internal distributional dynamics.

0 favorites 0 likes

#llm-training

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Hugging Face Daily Papers ↗ · 2d ago Cached

This paper introduces RLRT, a method that reverses teacher signals in self-distillation to reinforce successful student deviations, enhancing reasoning exploration in large language models.

0 favorites 0 likes

#llm-training

SFT, RL, and On-Policy Distillation Through a Distributional Lens (19 minute read)

TLDR AI ↗ · 2d ago Cached

This article analyzes post-training methods for language models through a distributional perspective, comparing how SFT, RL, and on-policy distillation reshape model distributions and impact phenomena like catastrophic forgetting.

0 favorites 0 likes

#llm-training

@RohOnChain: Anthropic pays $750,000+ a year for engineers who can train LLMs to do exactly what your prompt says. Stanford broke do…

X AI KOLs Timeline ↗ · 3d ago Cached

The article claims that Stanford has released a free technique for training LLMs to adhere strictly to prompts, a skill Anthropic reportedly pays high salaries for. It urges readers to bookmark the resource before it is removed.

0 favorites 0 likes

#llm-training

Training an LLM in Swift, Part 1: Taking matrix mult from Gflop/s to Tflop/s

Hacker News Top ↗ · 3d ago Cached

The author details the process of optimizing custom matrix multiplication kernels in Swift to train a Large Language Model on Apple Silicon, aiming to outperform C implementations by leveraging CPU, SIMD, AMX, and GPU capabilities.

0 favorites 0 likes

#llm-training

@0xLogicrw: MiniMax published a technical blog post detailing the root cause analysis for its M2 series large models' inability to output the person's name "Ma Jiaqi". Starting from a single case study, the investigation ultimately revealed a systematic degradation issue affecting nearly 5% of the entire vocabulary. The root cause was a severe disconnect in data coverage between the two training stages of the large model. In the first stage (pre-training), massive amounts of internet text were used to cre…

X AI KOLs Timeline ↗ · 3d ago

MiniMax published a technical blog post providing an in-depth analysis of the systematic vocabulary degradation issue behind its M2 series large models' inability to output specific personal names. It reveals parameter shifts caused by a disconnect in data coverage between pre-training and post-training stages, and proposes an effective solution involving full-scale synthetic data for remediation.

0 favorites 0 likes

#llm-training

@NFTCPS: If you work in AI, take this UCLA course! Theory + practice: a deep dive into RL and LLM training from scratch. Covers MDP, PPO algorithms, the full RLHF process, and hands-on Jupyter coding. Taught by a UCLA professor with videos and assignments, ready to apply immediately after completion. Course URL: https://ernestryu.com/courses/RL-LLM.html…

X AI KOLs Timeline ↗ · 3d ago Cached

This article recommends a UCLA-led online course on Reinforcement Learning for Large Language Models, covering theory, algorithms like PPO and RLHF, and practical coding exercises.

0 favorites 0 likes

#llm-training

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Hugging Face Daily Papers ↗ · 3d ago Cached

This paper introduces Entrocraft, a rejection-sampling method for RL that controls entropy schedules to prevent performance saturation in LLMs. It demonstrates improved generalization and training longevity, allowing smaller models to outperform larger baselines.

0 favorites 0 likes

#llm-training

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Hugging Face Daily Papers ↗ · 3d ago Cached

This research investigates how task geometry influences continual post-training in LLMs, identifying 'geometry conflict' as a cause of forgetting and a mechanism for controlling update integration. The authors propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free method that improves retention and performance across various model sizes.

0 favorites 0 likes

llm-training

Submit Feedback