Tag
MIT researchers show that the edge of stability (EoS) in neural network training is not merely a global optimization phenomenon but selectively redistributes learning across subsets of the training distribution, amplifying progress on some data groups while suppressing others. They identify two key conditions governing this allocation: gradient alignment with the top Hessian eigenvector and sustained non-vanishing gradient magnitude.
Researchers from the University of Amsterdam propose a tabular reinforcement learning approach to the Metro Network Expansion Problem, showing it achieves comparable performance to Deep RL while reducing training episodes by 18x and carbon emissions by 12x on average. The method also incorporates social equity criteria and is evaluated on real-world metro networks in Xi'an and Amsterdam.
This paper develops a sharp pseudospectral theory for block-triangular Jacobians in coupled gradient descent, proving Kreiss-constant bounds and establishing iteration complexity results. The work exposes non-asymptotic, instance-dependent transient amplification phenomena relevant to bilevel optimization, two-time-scale stochastic approximation, and GAN training.
This paper proposes a principle of 'constraint-enhanced physical search' where temporal correlations in exploration are matched to constraint-induced spatial correlations in update dynamics, demonstrated via a tug-of-war bandit model. The authors show that efficient search emerges not from maximal randomness but from matching temporal correlation to the physical update scale that converts feedback into evidence.
Researchers from Beihang University and Baidu propose 'constraint injection,' a dual verification method for LLM-based optimization modeling that detects spurious or omitted constraints beyond objective equivalence. They develop VRPCoder, an 8B model for translating natural-language vehicle routing problems into Gurobi scripts, achieving 93% average Pass@1 and outperforming Claude Sonnet and prior OR-LLMs by large margins.
A survey of inlining heuristics in method JIT compilers, discussing the challenges of when to inline and the trade-offs involved, with examples from Ruby and Python.
llama.cpp releases version b9495 with optimizations for Qwen3.6/3.5-MTP (Multi-Token Prediction) and requests users to share their benchmark results with full command details.
Manticore Search introduces early termination for HNSW-based KNN vector search, reducing distance computations by up to 80% for large k values while maintaining precision within 2-4% of full search.
A user thanks for the GEPA tool, highlighting its natural workflow for LLM programs, fast iteration, and ability to bias optimization with data-derived priors.
The paper introduces GAMBLe, a framework that decomposes AI-Driven Research Systems into generator, assessor, discovery mechanism, and budget, revealing how component interactions shape optimization landscapes. Experiments on NP-hard problems show no universally best configuration, emphasizing the need for careful component selection.
Introduces SNMPBB, a nonmonotone gradient-based algorithm for symmetric nonnegative matrix factorization that achieves significant speedups over existing methods, with extensions to graph clustering and low-rank approximations.
GRZO is a novel zeroth-order optimization method for fine-tuning large language models that reduces variance by using group-relative normalization, achieving better accuracy and memory efficiency compared to MeZO.
This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.
AutoLab introduces a benchmark for evaluating long-horizon iterative optimization capabilities of frontier models across diverse domains. Results show that persistence and time awareness are more critical than initial performance, with claude-opus-4.6 demonstrating strong capabilities while many models terminate prematurely.
Raymond Chen revisits a unidirectional rotation algorithm for swapping adjacent memory blocks, explaining its recursive approach and performance characteristics.
This paper analyzes the trade-off between mixed batching and exclusive batching for LLM inference, showing that the optimal choice depends on GPU memory bandwidth. It proposes a threshold-based hybrid scheduler that dynamically switches between the two methods, achieving up to 41.9% higher throughput on bandwidth-constrained GPUs.
Position paper arguing for a post-solve robustness layer for MILP decision engines, formalizing feasible neighborhoods and solution smoothness under perturbations, and calling for certified inner approximations and adversarial robustness margins.
This paper derives exact closed-form expressions for gradients and test loss after one and two steps of gradient descent in two-layer and three-layer linear neural networks, characterizing optimal learning rate selection and revealing a distinct early-training regime where unequal layer-wise learning rates are initially optimal.
Proposes FoLoRA, a forgetting-aware optimization framework for fine-tuning foundation models that balances task utility and forgetting penalty via generalized Rayleigh-quotient optimization, achieving better preservation of non-target capabilities.
QBE 1.3 is a significant compiler backend release with 7k new lines of code, featuring a new IL matching algorithm, optimizations for coremark benchmark (improving from 40% to over 63% of gcc -O2 performance), Windows ABI support, and position-independent code generation.