kl-divergence

#kl-divergence

I built a tool to actually test which weights matter before quantizing, instead of guessing (Qwen3.6-27B, 3 builds: Bedrock/Tightrope/Gambit)

Reddit r/LocalLLaMA ↗ · 19h ago

A developer built a testing harness that measures KL divergence per weight group during quantization, leading to three custom quantized builds of Qwen3.6-27B (Bedrock, Tightrope, Gambit) with optimized compression. Tool calling is identified as the first capability to degrade under quantization.

0 favorites 0 likes

#kl-divergence

From Score Approximation to Distribution Approximation in Score-Based Diffusion Models

arXiv cs.LG ↗ · yesterday Cached

This paper establishes a rigorous quantitative connection between neural network score function approximation and the resulting distribution approximation in score-based diffusion models, proving that accurate score approximation leads to close distribution approximation in KL divergence, with an explicit bound.

0 favorites 0 likes

#kl-divergence

To KL Diverge, or Not to KL Diverge: A Question for Quants

Reddit r/LocalLLaMA ↗ · 2026-07-16

Discusses the trade-offs of using Kullback-Leibler divergence in quantitative analysis, framing it as a Hamlet-like dilemma for quants.

0 favorites 0 likes

#kl-divergence

@TensorTonic: 7 math ideas every ML engineer uses daily and almost nobody has actually derived: 1. Why gradient descent moves in the …

X AI KOLs Timeline ↗ · 2026-07-11 Cached

This tweet lists 7 fundamental math ideas used daily by ML engineers, with brief explanations emphasizing the underlying derivations, such as why gradient descent moves in the steepest direction and why softmax plus cross-entropy yields a clean gradient.

0 favorites 0 likes

#kl-divergence

@neural_avb: This article actually explains all the components of On Policy Distillation Loss functions (forward vs reverse KL), sup…

X AI KOLs Timeline ↗ · 2026-07-10 Cached

This article explains components of On Policy Distillation loss functions, including forward vs reverse KL divergence, supervision granularity, privilege types, and privilege advantage estimation. Additional resources from Thinking Machines, Hugging Face, and others are provided.

0 favorites 0 likes

#kl-divergence

$\mathbf{\lambda}$-VAE: Variance Equalization for Posterior Collapse

arXiv cs.LG ↗ · 2026-07-08 Cached

Identifies two coupled causes of posterior collapse in VAEs and introduces λ-VAE, a modification to the reparameterization step that equalizes variance across latent dimensions, reducing collapse and improving information capacity.

0 favorites 0 likes

#kl-divergence

Multimodal Continuous Reasoning via Asymmetric Mutual Variational Learning

Hugging Face Daily Papers ↗ · 2026-07-01 Cached

Proposes Asymmetric Mutual Variational Learning (AMVL) to resolve train-inference mismatch in multimodal continuous reasoning by using bidirectional calibration to prevent answer leakage and improve latent-space stability, achieving significant gains on the BLINK benchmark.

0 favorites 0 likes

#kl-divergence

KLip-PPO: A per-sample KL perspective on PPO-Clip

arXiv cs.LG ↗ · 2026-06-24 Cached

This paper shows that the gradient of the clipped surrogate in Proximal Policy Optimization (PPO) is exactly reproduced by a per-sample Kullback-Leibler penalty with a variable coefficient, revealing structural features of the clipped surrogate and suggesting new design directions.

0 favorites 0 likes

#kl-divergence

Quantized Reasoning Models Think They Need to Think Longer, but They Do Not

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper reveals that aggressive post-training quantization of reasoning models leads to increased overthinking errors, where models reach correct intermediate answers but fail to finalize them. A simple logit penalty on overthinking markers reduces chain-of-thought length by 12-23% while improving accuracy, especially for quantized models.

0 favorites 0 likes

#kl-divergence

KL Zero: KL divergence intuition game

Hacker News Top ↗ · 2026-05-30 Cached

KL Zero is an interactive browser game where players draw a probability distribution to match a target KL divergence value, helping users intuitively understand the concept of KL divergence in machine learning.

0 favorites 0 likes

#kl-divergence

Entropy-KL Divergence-based Token Masking: A Novel Approach for Selective Fine-tuning of Large Language Models

arXiv cs.AI ↗ · 2026-05-29 Cached

Proposes EKSFT, a selective fine-tuning method for large language models that masks tokens with high entropy or high KL divergence from a reference model, preserving pre-trained distribution while injecting task knowledge. Experiments on mathematical reasoning benchmarks show it outperforms standard SFT and improves subsequent RL fine-tuning.

0 favorites 0 likes

#kl-divergence

Trust Region Q Adjoint Matching

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

Trust Region Q-Adjoint Matching (TRQAM) addresses instability in off-policy reinforcement learning by adaptively controlling path-space KL divergence through projected dual descent, enabling stable fine-tuning of pretrained flow policies. The method consistently outperforms prior arts on 50 OGBench tasks, achieving a 68% success rate in offline RL compared to the strongest baseline's 46%.

0 favorites 0 likes

#kl-divergence

On-Policy Distillation (5 minute read)

TLDR AI ↗ · 2026-05-26

This paper introduces on-policy distillation, which trains a student model on its own trajectories with teacher token-level KL supervision to fix train-inference mismatch, unifying forward-KL, reverse-KL, and JSD losses, with reverse-KL favored for smaller students.

0 favorites 0 likes

#kl-divergence

@maximelabonne: This is so neat! Dynamic Fine-Tuning (DFT) reweights the SFT loss by the model's own token probability, which creates a…

X AI KOLs Following ↗ · 2026-05-20 Cached

Dynamic Fine-Tuning (DFT) is introduced as a method that reweights the SFT loss using the model's own token probability, creating a feedback loop, and adds forward KL to penalize tokens the base model finds likely but the policy has pushed toward zero probability. The tweet expresses skepticism about SFT papers in practice but praises the attempt.

0 favorites 0 likes

#kl-divergence

Measuring Goodhart’s law

OpenAI Blog ↗ · 2022-04-13 Cached

OpenAI research formally analyzes Goodhart's law through best-of-n sampling, providing efficient estimators for measuring how well proxy objectives track true objectives and quantifying optimization effort via KL divergence.

0 favorites 0 likes

kl-divergence

Submit Feedback