optimization

#optimization

Qwen 3.5 122B MoE OC on a single 3090 at 35 t/s — full local stack breakdown

Reddit r/openclaw ↗ · 2026-06-05

Detailed breakdown of running Qwen 3.5 122B MoE on a single RTX 3090 at 35 t/s using a custom llama.cpp fork (ik_llama.cpp) with fused MoE operations and expert offloading to CPU RAM, significantly outperforming stock llama.cpp MTP.

0 favorites 0 likes

#optimization

Dominant-Layer ZO: A Single Layer Dominates Zeroth-Order Fine-Tuning of LLMs

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper reveals that zeroth-order fine-tuning of LLMs is dominated by a single decoding layer, which can be identified by activation outliers, and fine-tuning only that layer matches or exceeds full-model fine-tuning with up to 4.52x speedup.

0 favorites 0 likes

#optimization

Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper proves sharp dimension-free first-order lower bounds for finding epsilon-stationary points in higher-order smooth nonconvex optimization, resolving open problems for Hessian-Lipschitz and third-order smooth cases.

0 favorites 0 likes

#optimization

DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum

arXiv cs.LG ↗ · 2026-06-05 Cached

DP-MacAdam combines adaptive clipping and adaptive momentum to improve differentially private SGD, achieving better model utility without manual tuning of the clipping threshold.

0 favorites 0 likes

#optimization

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv cs.LG ↗ · 2026-06-05 Cached

This paper shows that discrete Gradient Descent with large step sizes restores symmetry in multi-pathway Deep Linear Networks, countering the symmetry-breaking predicted by Gradient Flow, and leads to signal re-balancing across pathways. The authors theoretically prove that balanced solutions are flatter (less sharp) than sparse ones, and large learning rates drive the network toward stable, balanced configurations.

0 favorites 0 likes

#optimization

@sydneyrunkle: let's assume agent = model + harness unfortunately, good models are getting really expensive! so you need a great harne…

X AI KOLs Following ↗ · 2026-06-04

A guide on optimizing AI agent performance by improving the harness component to compensate for expensive model costs, focusing on hill climbing techniques.

0 favorites 0 likes

#optimization

@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/

X AI KOLs Timeline ↗ · 2026-06-04 Cached

A recommendation for the Algorithmica resource on CPU cache memory organization, which provides detailed experimental analysis and optimization techniques for in-memory algorithms.

0 favorites 0 likes

#optimization

Edge of Stability Selectively Shapes Learning Across the Data Distribution

arXiv cs.LG ↗ · 2026-06-04 Cached

MIT researchers show that the edge of stability (EoS) in neural network training is not merely a global optimization phenomenon but selectively redistributes learning across subsets of the training distribution, amplifying progress on some data groups while suppressing others. They identify two key conditions governing this allocation: gradient alignment with the top Hessian eigenvector and sustained non-vanishing gradient magnitude.

0 favorites 0 likes

#optimization

Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning

arXiv cs.LG ↗ · 2026-06-04 Cached

Researchers from the University of Amsterdam propose a tabular reinforcement learning approach to the Metro Network Expansion Problem, showing it achieves comparable performance to Deep RL while reducing training episodes by 18x and carbon emissions by 12x on average. The method also incorporates social equity criteria and is evaluated on real-world metro networks in Xi'an and Amsterdam.

0 favorites 0 likes

#optimization

Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent

arXiv cs.LG ↗ · 2026-06-04 Cached

This paper develops a sharp pseudospectral theory for block-triangular Jacobians in coupled gradient descent, proving Kreiss-constant bounds and establishing iteration complexity results. The work exposes non-asymptotic, instance-dependent transient amplification phenomena relevant to bilevel optimization, two-time-scale stochastic approximation, and GAN training.

0 favorites 0 likes

#optimization

Constraint-Enhanced Physical Search through Correlation Matching

arXiv cs.AI ↗ · 2026-06-04 Cached

This paper proposes a principle of 'constraint-enhanced physical search' where temporal correlations in exploration are matched to constraint-induced spatial correlations in update dynamics, demonstrated via a tug-of-war bandit model. The authors show that efficient search emerges not from maximal randomness but from matching temporal correlation to the physical update scale that converts feedback into evidence.

0 favorites 0 likes

#optimization

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

arXiv cs.AI ↗ · 2026-06-04 Cached

Researchers from Beihang University and Baidu propose 'constraint injection,' a dual verification method for LLM-based optimization modeling that detects spurious or omitted constraints beyond objective equivalence. They develop VRPCoder, an 8B model for translating natural-language vehicle routing problems into Gurobi scripts, achieving 93% average Pass@1 and outperforming Claude Sonnet and prior OR-LLMs by large margins.

0 favorites 0 likes

#optimization

A survey of inlining heuristics

Lobsters Hottest ↗ · 2026-06-04 Cached

A survey of inlining heuristics in method JIT compilers, discussing the challenges of when to inline and the trade-offs involved, with examples from Ruby and Python.

0 favorites 0 likes

#optimization

llama.cpp - Qwen3.6/3.5-MTP - Share your benchmarks t/s

Reddit r/LocalLLaMA ↗ · 2026-06-03

llama.cpp releases version b9495 with optimizations for Qwen3.6/3.5-MTP (Multi-Token Prediction) and requests users to share their benchmark results with full command details.

0 favorites 0 likes

#optimization

KNN early termination in Manticore Search

Hacker News Top ↗ · 2026-06-03 Cached

Manticore Search introduces early termination for HNSW-based KNN vector search, reducing distance computations by up to 80% for large k values while maintaining precision within 2-4% of full search.

0 favorites 0 likes

#optimization

@harold_matmul: it was my idea :) Using GEPA is a very natural workflow for creating LLM programs. The iteration speed is very quick, a…

X AI KOLs Following ↗ · 2026-06-03 Cached

A user thanks for the GEPA tool, highlighting its natural workflow for LLM programs, fast iteration, and ability to bias optimization with data-derived priors.

0 favorites 0 likes

#optimization

Don't Gamble, GAMBLe: An Analytical Framework for AI-Driven Research Systems

arXiv cs.AI ↗ · 2026-06-03 Cached

The paper introduces GAMBLe, a framework that decomposes AI-Driven Research Systems into generator, assessor, discovery mechanism, and budget, revealing how component interactions shape optimization landscapes. Experiments on NP-hard problems show no universally best configuration, emphasizing the need for careful component selection.

0 favorites 0 likes

#optimization

A Nonmonotone Gradient-Based Algorithm for Symmetric Nonnegative Matrix Factorization and Graph Clustering

arXiv cs.LG ↗ · 2026-06-03 Cached

Introduces SNMPBB, a nonmonotone gradient-based algorithm for symmetric nonnegative matrix factorization that achieves significant speedups over existing methods, with extensions to graph clustering and low-rank approximations.

0 favorites 0 likes

#optimization

GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning

arXiv cs.LG ↗ · 2026-06-03 Cached

GRZO is a novel zeroth-order optimization method for fine-tuning large language models that reduces variance by using group-relative normalization, achieving better accuracy and memory efficiency compared to MeZO.

0 favorites 0 likes

#optimization

Spectral Asymptotics of Neural Network Loss Landscapes: An Exact Decomposition of the Curvature Exponent

arXiv cs.LG ↗ · 2026-06-03 Cached

This paper presents an exact decomposition of the curvature exponent α in neural network loss landscapes, explaining why it varies across layer types. It introduces the spectral alignment decomposition and derives a spectral transfer identity linking curvature, gradient rank decay, and Hessian exponents, validated across architectures and datasets.

0 favorites 0 likes

optimization

Submit Feedback