variance-reduction

#variance-reduction

Variance Reduction for Heavy-Tailed Monetization Metrics in Ranking Experiments via Post-Stratification

arXiv cs.LG ↗ · 2026-06-04 Cached

Researchers present a practical variance reduction framework combining post-stratification with CUPED for heavy-tailed monetization metrics in ranking experiments, deployed at ShareChat to achieve equivalent statistical confidence with 45% less traffic. The paper is accepted at SIGIR 2026.

0 favorites 0 likes

#variance-reduction

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

arXiv cs.LG ↗ · 2026-06-03 Cached

GRZO is a novel zeroth-order optimization method for fine-tuning large language models that reduces variance by using group-relative normalization, achieving better accuracy and memory efficiency compared to MeZO.

0 favorites 0 likes

#variance-reduction

Zeroth-Order Non-Log-Concave Sampling with Variance Reduction and Applications to Inverse Problems

arXiv cs.LG ↗ · 2026-06-01 Cached

Proposes a variance-reduced zeroth-order Langevin sampling method for non-log-concave distributions, establishing the first non-asymptotic convergence guarantees, and applies it to inverse problems with score-based generative priors.

0 favorites 0 likes

#variance-reduction

Refined Analysis of Entropy-Regularized Actor-Critic

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper provides a refined theoretical analysis of actor-critic methods with entropy regularization, showing that an exact critic acts as a strong variance reducer and enables sample complexity comparable to deterministic policy gradient, and that with a sufficiently accurate learned critic the benefits are preserved.

0 favorites 0 likes

#variance-reduction

Unified High-Probability Analysis of Stochastic Variance-Reduced Estimation

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper presents a unified theoretical framework for stochastic variance-reduced estimation, deriving high-probability bounds via a new Freedman inequality and improving oracle complexities for constrained optimization.

0 favorites 0 likes

#variance-reduction

Beyond Bounded Variance: Variance-Reduced Normalized Methods for Nonconvex Optimization under Blum-Gladyshev Noise

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper studies nonconvex stochastic optimization under Blum-Gladyshev noise, where gradient variance grows with distance from initialization. It proves convergence guarantees for normalized SGD with momentum and a variance-reduced STORM method, achieving minimax optimal rates under certain conditions.

0 favorites 0 likes

#variance-reduction

Heuristic Pathologies and Further Variance Reduction via Uncertainty Propagation in the AIVAT Family of Techniques

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper identifies vulnerabilities in the AIVAT variance reduction technique when the heuristic value function is not fixed prior to evaluation, and shows how to propagate heuristic uncertainty to further reduce variance, achieving a 43% reduction in the number of samples needed for statistical conclusions.

0 favorites 0 likes

#variance-reduction

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-12 Cached

This paper introduces Path-Coupled Bellman Flows (PCBF), a continuous-time distributional reinforcement learning method that uses flow matching to model return distributions without heuristic projections. It addresses boundary mismatch and high-variance issues in previous flow-based approaches by coupling current and successor return flows through shared base noise.

0 favorites 0 likes

#variance-reduction

KL for a KL: On-Policy Distillation with Control Variate Baseline

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

Proposes vOPD, which stabilizes on-policy distillation for LLMs by introducing a control variate baseline from reinforcement learning, achieving performance comparable to expensive full-vocabulary methods at lower computational cost.

0 favorites 0 likes

#variance-reduction

Your Language Model is Its Own Critic: Reinforcement Learning with Value Estimation from Actor's Internal States

Hugging Face Daily Papers ↗ · 2026-05-08 Cached

This paper introduces POISE, a method for stable policy optimization in large reasoning models by estimating baselines using the model's own internal states, reducing computational overhead compared to PPO and GRPO.

0 favorites 0 likes

#variance-reduction

Variance reduction for policy gradient with action-dependent factorized baselines

OpenAI Blog ↗ · 2018-03-20 Cached

OpenAI researchers derive a bias-free action-dependent baseline for variance reduction in policy gradient methods, demonstrating improved learning efficiency on high-dimensional control tasks, multi-agent, and partially observed environments.

0 favorites 0 likes

variance-reduction

Submit Feedback