optimization

#optimization

Edge of Stability Selectively Shapes Learning Across the Data Distribution

arXiv cs.LG ↗ · 2026-06-04 Cached

MIT researchers show that the edge of stability (EoS) in neural network training is not merely a global optimization phenomenon but selectively redistributes learning across subsets of the training distribution, amplifying progress on some data groups while suppressing others. They identify two key conditions governing this allocation: gradient alignment with the top Hessian eigenvector and sustained non-vanishing gradient magnitude.

0 favorites 0 likes

#optimization

Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning

arXiv cs.LG ↗ · 2026-06-04 Cached

Researchers from the University of Amsterdam propose a tabular reinforcement learning approach to the Metro Network Expansion Problem, showing it achieves comparable performance to Deep RL while reducing training episodes by 18x and carbon emissions by 12x on average. The method also incorporates social equity criteria and is evaluated on real-world metro networks in Xi'an and Amsterdam.

0 favorites 0 likes

#optimization

Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper develops a sharp pseudospectral theory for block-triangular Jacobians in coupled gradient descent, proving Kreiss-constant bounds and establishing iteration complexity results. The work exposes non-asymptotic, instance-dependent transient amplification phenomena relevant to bilevel optimization, two-time-scale stochastic approximation, and GAN training.

0 favorites 0 likes

#optimization

Constraint-Enhanced Physical Search through Correlation Matching

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper proposes a principle of 'constraint-enhanced physical search' where temporal correlations in exploration are matched to constraint-induced spatial correlations in update dynamics, demonstrated via a tug-of-war bandit model. The authors show that efficient search emerges not from maximal randomness but from matching temporal correlation to the physical update scale that converts feedback into evidence.

0 favorites 0 likes

#optimization

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

arXiv cs.AI ↗ · 2026-06-04 Cached

Researchers from Beihang University and Baidu propose 'constraint injection,' a dual verification method for LLM-based optimization modeling that detects spurious or omitted constraints beyond objective equivalence. They develop VRPCoder, an 8B model for translating natural-language vehicle routing problems into Gurobi scripts, achieving 93% average Pass@1 and outperforming Claude Sonnet and prior OR-LLMs by large margins.

0 favorites 0 likes

#optimization

A survey of inlining heuristics

Lobsters Hottest ↗ · 2026-06-04 Cached

A survey of inlining heuristics in method JIT compilers, discussing the challenges of when to inline and the trade-offs involved, with examples from Ruby and Python.

0 favorites 0 likes

#optimization

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

Reddit r/LocalLLaMA ↗ · 2026-06-03

llama.cpp releases version b9495 with optimizations for Qwen3.6/3.5-MTP (Multi-Token Prediction) and requests users to share their benchmark results with full command details.

0 favorites 0 likes

#optimization

KNN early termination in Manticore Search

Hacker News Top ↗ · 2026-06-03 Cached

Manticore Search introduces early termination for HNSW-based KNN vector search, reducing distance computations by up to 80% for large k values while maintaining precision within 2-4% of full search.

0 favorites 0 likes

#optimization

@harold_matmul: it was my idea :) Using GEPA is a very natural workflow for creating LLM programs. The iteration speed is very quick, a…

X AI KOLs Following ↗ · 2026-06-03 Cached

A user thanks for the GEPA tool, highlighting its natural workflow for LLM programs, fast iteration, and ability to bias optimization with data-derived priors.

0 favorites 0 likes

#optimization

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

arXiv cs.AI ↗ · 2026-06-03 Cached

The paper introduces GAMBLe, a framework that decomposes AI-Driven Research Systems into generator, assessor, discovery mechanism, and budget, revealing how component interactions shape optimization landscapes. Experiments on NP-hard problems show no universally best configuration, emphasizing the need for careful component selection.

0 favorites 0 likes

#optimization

A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering

arXiv cs.LG ↗ · 2026-06-03 Cached

Introduces SNMPBB, a nonmonotone gradient-based algorithm for symmetric nonnegative matrix factorization that achieves significant speedups over existing methods, with extensions to graph clustering and low-rank approximations.

0 favorites 0 likes

#optimization

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

arXiv cs.LG ↗ · 2026-06-03 Cached

GRZO is a novel zeroth-order optimization method for fine-tuning large language models that reduces variance by using group-relative normalization, achieving better accuracy and memory efficiency compared to MeZO.

0 favorites 0 likes

#optimization

Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.

0 favorites 0 likes

#optimization

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Hugging Face Daily Papers ↗ · 2026-06-03 Cached

AutoLab introduces a benchmark for evaluating long-horizon iterative optimization capabilities of frontier models across diverse domains. Results show that persistence and time awareness are more critical than initial performance, with claude-opus-4.6 demonstrating strong capabilities while many models terminate prematurely.

0 favorites 0 likes

#optimization

Rotation revisited: Another unidirectional algorithm

The Old New Thing (Raymond Chen) ↗ · 2026-06-02 Cached

Raymond Chen revisits a unidirectional rotation algorithm for swapping adjacent memory blocks, explaining its recursive approach and performance characteristics.

0 favorites 0 likes

#optimization

Threshold-Based Exclusive Batching for LLM Inference

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper analyzes the trade-off between mixed batching and exclusive batching for LLM inference, showing that the optimal choice depends on GPU memory bandwidth. It proposes a threshold-based hybrid scheduler that dynamically switches between the two methods, achieving up to 41.9% higher throughput on bandwidth-constrained GPUs.

0 favorites 0 likes

#optimization

Position Paper: Post-Solve Robustness in Decision Engines: Feasible Regions and Smoothness Under Perturbations

arXiv cs.AI ↗ · 2026-06-02 Cached

Position paper arguing for a post-solve robustness layer for MILP decision engines, formalizing feasible neighborhoods and solution smoothness under perturbations, and calling for certified inner approximations and adversarial robustness margins.

0 favorites 0 likes

#optimization

Balancing Learning Rates Across Layers: Exact Two-Step Dynamics and Optimal Scaling in Linear Neural Networks

arXiv cs.LG ↗ · 2026-06-02 Cached

This paper derives exact closed-form expressions for gradients and test loss after one and two steps of gradient descent in two-layer and three-layer linear neural networks, characterizing optimal learning rate selection and revealing a distinct early-training regime where unequal layer-wise learning rates are initially optimal.

0 favorites 0 likes

#optimization

Foundation-Preserving Adaptation via Generalized Rayleigh-Quotient Optimization

arXiv cs.LG ↗ · 2026-06-02 Cached

Proposes FoLoRA, a forgetting-aware optimization framework for fine-tuning foundation models that balances task utility and forgetting penalty via generalized Rayleigh-quotient optimization, achieving better preservation of non-target capabilities.

0 favorites 0 likes

#optimization

QBE - Compiler Backend: Version 1.3

Lobsters Hottest ↗ · 2026-06-01 Cached

QBE 1.3 is a significant compiler backend release with 7k new lines of code, featuring a new IL matching algorithm, optimizations for coremark benchmark (improving from 40% to over 63% of gcc -O2 performance), Windows ABI support, and position-independent code generation.

0 favorites 0 likes

optimization

Submit Feedback