orthogonalization

#orthogonalization

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

arXiv cs.LG ↗ · 2d ago Cached

This paper proposes SCBoost, a boosting framework that reduces learner redundancy by projecting residuals onto the orthogonal complement of previous predictions and using covariance-regularized weighting, with theoretical guarantees and strong empirical performance.

0 favorites 0 likes

#orthogonalization

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Hacker News Top ↗ · 2026-06-09 Cached

This blog post presents Gram Newton-Schulz, a hardware-aware optimization of the Newton-Schulz orthogonalization procedure used in the Muon optimizer, achieving significant speedups for training large language models while preserving model quality.

0 favorites 0 likes

#orthogonalization

Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization

arXiv cs.CL ↗ · 2026-06-03 Cached

This paper proposes Dynamic Contextual Orthogonalization (DCO), an inference-time method that reduces hallucinations in large language models by aligning attention head outputs with the context manifold, achieving superior faithfulness on benchmarks with Llama-3 models.

0 favorites 0 likes

#orthogonalization

How Much Orthogonalization Does Muon Need?

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper studies how much orthogonalization the Muon optimizer requires, proposing a five-step cubic Newton-Schulz schedule that reduces computational cost while achieving training quality similar to more expensive methods across GPT-2 Small and hybrid MoE/Mamba models.

0 favorites 0 likes

orthogonalization

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

Gram Newton-Schulz: A Fast, Hardware-Aware Newton-Schulz Algorithm for Muon

Hallucinations as Orthogonal Noise: Inference-Time Manifold Alignment via Dynamic Contextual Orthogonalization

How Much Orthogonalization Does Muon Need?

Submit Feedback