convergence

#convergence

On the Convergence of Stochastic Low-Rank Adaptation

arXiv cs.LG ↗ · 4d ago Cached

This paper sharpens the convergence analysis of LoRA, improving deterministic oracle complexity from exponential to O(epsilon^{-4}), and proposes stochastic variants LoRA-NSGDM and LoRA-STORM with improved oracle complexities of O(epsilon^{-8}) and O(epsilon^{-6}) respectively.

0 favorites 0 likes

#convergence

When Does Recurrence Become an Algorithm? Convergence Selection in Weight-Tied Looped Transformers

arXiv cs.LG ↗ · 2026-07-24 Cached

This paper investigates when weight-tied looped transformers implement actual algorithms, introducing the budget law and showing mechanisms are portable across training budgets, with implications for adaptive computation and interpretability.

0 favorites 0 likes

#convergence

The One-Word Census: Answer-Choice Conformity Across 44 Language Models

arXiv cs.CL ↗ · 2026-07-15 Cached

This paper introduces the One-Word Census, a minimal instrument to measure answer-choice convergence across 44 language models, finding extreme conformity (e.g., 94% choose 'oak' for tree) and variation in divergence across model families and generations.

0 favorites 0 likes

#convergence

The Agentic Economy treatise (Website)

TLDR AI ↗ · 2026-07-14 Cached

A treatise exploring the convergence of intelligence and the economy, available in multiple formats including text, audio, and video.

0 favorites 0 likes

#convergence

Understanding Schedule-Free Methods in Nonconvex Optimization: Rate Guarantees and Escaping Saddles

arXiv cs.LG ↗ · 2026-07-13 Cached

This paper provides worst-case convergence analyses for Schedule-Free gradient descent and stochastic gradient descent in nonconvex optimization, establishing optimal rates and strict-saddle avoidance, thus theoretically justifying their empirical success.

0 favorites 0 likes

#convergence

Decentralised Federated Learning over Temporal Networks: The Role of Heterogeneities

arXiv cs.LG ↗ · 2026-07-07 Cached

This paper analyzes the effect of structural and temporal heterogeneities in decentralized federated learning over temporal networks, showing that ignoring these heterogeneities leads to unrealistically rapid convergence and that real-world networks slow down diffusion.

0 favorites 0 likes

#convergence

@huskydogewoof: 𝐌𝐲 𝐭𝐚𝐤𝐞𝐬 𝐚𝐧𝐝 𝐭𝐡𝐨𝐮𝐠𝐡𝐭𝐬 𝐚𝐫𝐞 𝐚𝐬 𝐟𝐨𝐥𝐥𝐨𝐰𝐬 (sorry for being verbose, but I hope you will enjoy …

X AI KOLs Timeline ↗ · 2026-06-17 Cached

The author shares thoughts on making convergence a reliable halting signal for iterative weight-tied models, discussing tricks from papers like DEQ, Huggin, Ouro, and EqR, and highlighting the roles of pre-norm and input injection.

0 favorites 0 likes

#convergence

Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper presents counterexamples showing that Monte Carlo Exploring Starts can converge to suboptimal solutions in tabular reinforcement learning, and provides a modification that guarantees convergence to optimality by scaling learning rates inversely to update frequencies.

0 favorites 0 likes

#convergence

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

Hugging Face Daily Papers ↗ · 2026-06-14 Cached

AC-ODM uses reinforcement learning to dynamically optimize pretraining data composition for LLMs, achieving faster convergence and higher downstream accuracy with negligible computational overhead.

0 favorites 0 likes

#convergence

Emergence via Phase Transitions: Mechanism Landscapes and Universal Convergence Across Complex Systems

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper introduces the Hierarchical Emergence Framework (HEF), which explains how diverse systems such as neural networks and biological evolution converge to similar internal representations through phase transitions in mechanism landscapes under physical and informational constraints. The framework is validated empirically with 111 grokking experiments that confirm universal convergence and identify a critical energy threshold.

0 favorites 0 likes

#convergence

Towards Serverless Semi-Decentralized Federated Learning with Heterogeneous Optimizers

arXiv cs.LG ↗ · 2026-06-08 Cached

Proposes SSD-FL, a serverless semi-decentralized federated learning methodology that optimizes cluster formation in heterogeneous environments using effective loss functions and Cheeger inequality-based iterative clustering, improving convergence and communication efficiency.

0 favorites 0 likes

#convergence

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.

0 favorites 0 likes

#convergence

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper presents a unified theoretical framework for gradient aggregation in multi-objective optimization, establishing convergence rates to Pareto stationarity. The authors introduce a sufficient alignment condition and demonstrate its application to existing and new algorithms, such as capped MGDA.

0 favorites 0 likes

#convergence

Agentic Patterns

Hacker News Top ↗ · 2026-05-25 Cached

A comprehensive research guide from Veso detailing the universal architecture patterns that have converged across major AI agent systems (Claude Code, OpenAI Codex, Gemini CLI, etc.), presenting 8 postulates for building production-grade agentic systems.

0 favorites 0 likes

#convergence

DynMuon: A Dynamic Spectral Shaping View of Muon

Hugging Face Daily Papers ↗ · 2026-05-16 Cached

This paper introduces DynMuon, a dynamic spectral shaping optimizer that schedules the update parameter p from positive to mildly negative during training, consistently achieving lower validation loss and requiring 10.6-26.5% fewer steps than the standard Muon optimizer.

0 favorites 0 likes

convergence

Submit Feedback