optimization

#optimization

Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper identifies a geometric mismatch in the Dion low-rank spectral optimizer and proposes Orth-Dion, which replaces column normalization with QR orthogonalization to close the convergence gap to full-rank methods like Muon at the same communication cost, validated on large-scale language model pre-training.

0 favorites 0 likes

#optimization

SignMuon: Communication-Efficient Distributed Muon Optimization

arXiv cs.LG ↗ · 2026-05-19 Cached

SignMuon is a 1-bit, matrix-aware optimizer for distributed training that combines signSGD's majority-vote sign aggregation with Muon's polar-step framework, achieving 32x bandwidth reduction over float32 while maintaining strong convergence and performance on benchmarks like CIFAR-10/ResNet-50 and nanoGPT.

0 favorites 0 likes

#optimization

Mirror Descent-Type Algorithms for the Variational Inequality Problem with Functional Constraints

arXiv cs.LG ↗ · 2026-05-19 Cached

This paper proposes mirror descent-type algorithms for solving variational inequality problems with functional constraints, proving optimal convergence rates for problems with bounded monotone operators and Lipschitz convex constraints. A modification is introduced to improve efficiency for many constraints.

0 favorites 0 likes

#optimization

Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

arXiv cs.AI ↗ · 2026-05-19 Cached

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game, identifying four inference-time levers and introducing the concept of agent bullwhip. It shows that a reasoning model can exceed human performance, and proposes GRPO-based post-training to improve reliability.

0 favorites 0 likes

#optimization

LoRA and Weight Decay (2023)

Hacker News Top ↗ · 2026-05-18 Cached

This blog post explores how LoRA's interaction with weight decay leads to a different optimization objective than full fine-tuning, where weights are regularized towards the initial model rather than zero. It explains the implications for practitioners.

0 favorites 0 likes

#optimization

Rewriting model inference with CUDA kernels: the bottleneck was not just GEMM [P]

Reddit r/MachineLearning ↗ · 2026-05-18

Author describes building FlashRT, a CUDA-first inference runtime that rewrites model inference paths with C++/CUDA kernels to address bottlenecks beyond GEMM for small-batch/realtime workloads, achieving significant latency improvements on Jetson Thor and RTX 5090. The article discusses lessons on precision (FP8 helpful, FP4 mixed) and the need to bypass generic runtimes for realtime inference.

0 favorites 0 likes

#optimization

Every AI prompt costs money — and that changes everything

Reddit r/AI_Agents ↗ · 2026-05-18

The article argues that the real challenge in AI isn't just building smarter models but making them cost-efficient at scale, highlighting the importance of reducing token usage, improving speed, and optimizing infrastructure.

0 favorites 0 likes

#optimization

FediMeteo, HAProxy, and the art of not wasting snac threads

Lobsters Hottest ↗ · 2026-05-18 Cached

The author describes using HAProxy caching to reduce unnecessary load on snac threads in the FediMeteo service, following previous similar optimizations with nginx. The approach aims to keep the lightweight ActivityPub server efficient by having the reverse proxy absorb repeated public requests.

0 favorites 0 likes

#optimization

On the Stability of Growth in Structural Plasticity

arXiv cs.LG ↗ · 2026-05-18 Cached

This academic paper investigates the asymmetry between pruning and growth in structural plasticity for neural networks, showing that newborn units suffer from weaker gradient signals than incumbent units, and proposes interventions to improve integration.

0 favorites 0 likes

#optimization

$\phi$-Balancing for Mixture-of-Experts Training

arXiv cs.LG ↗ · 2026-05-18 Cached

This paper proposes φ-balancing, a principled framework for load balancing in Mixture-of-Experts models that directly targets population-level expert balance using convex duality and mirror descent, achieving more stable expert utilization and outperforming prior methods on reasoning and code generation benchmarks.

0 favorites 0 likes

#optimization

Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search

arXiv cs.CL ↗ · 2026-05-18 Cached

This paper presents a case study using an LLM-driven tree search algorithm (ERA) combined with a coding agent (AntiGravity) to autonomously generate high-efficiency three-dimensional photovoltaic structures, overcoming limitations of flat solar panels at mid-latitudes. The workflow includes iterative patching to eliminate reward hacking and discovers improved designs under various constraints.

0 favorites 0 likes

#optimization

Benchmarking the new b9200 update: Optimizing Qwen 3.6 27B mtp for Hermes Agent on a single RTX 3090

Reddit r/LocalLLaMA ↗ · 2026-05-18

Benchmarking the b9200 update of llama.cpp with optimized flags for Qwen 3.6 27B MTP on a single RTX 3090 shows significant performance gains, especially in prompt processing speed, for agentic workflows.

0 favorites 0 likes

#optimization

ROCm 7.13 nightly adds strix halo optimizations

Reddit r/LocalLLaMA ↗ · 2026-05-17

AMD's ROCm 7.13 tech preview adds optimizations for Strix Halo (Ryzen AI Max 300) and open-sources the ROCprof Trace Decoder.

0 favorites 0 likes

#optimization

llama: avoid copying logits during prompt decode in MTP by am17an · Pull Request #23198 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-05-17 Cached

This pull request optimizes llama.cpp by avoiding unnecessary copying of logits during prompt decode in multi-token prediction, improving inference performance.

0 favorites 0 likes

#optimization

KV Cache Is Becoming the Memory Hierarchy of Inference

Hacker News Top ↗ · 2026-05-17 Cached

The article discusses how the KV cache is evolving into a memory hierarchy for LLM inference, optimizing memory management during decoding.

0 favorites 0 likes

#optimization

When can the C++ compiler devirtualize a call?

Hacker News Top ↗ · 2026-05-17 Cached

Explores when C++ compilers can devirtualize virtual function calls, covering cases like known dynamic types and final keyword, with comparisons across GCC, Clang, MSVC, and ICC.

0 favorites 0 likes

#optimization

Understanding Singleflight in Go

Hacker News Top ↗ · 2026-05-16 Cached

The article explains the singleflight pattern in Go, which eliminates redundant concurrent calls to expensive operations by ensuring only one call is in flight at a time, sharing results among all callers.

0 favorites 0 likes

#optimization

The Fil-C Optimized Calling Convention

Hacker News Top ↗ · 2026-05-16 Cached

The Fil-C optimized calling convention ensures memory safety for C programs even under adversarial misuse, while maintaining efficiency by omitting safety checks in the common case. It explains the generic and register-passing optimizations that handle type violations via panics or well-defined behavior.

0 favorites 0 likes

#optimization

@gdb: codex for improving computational complexity

X AI KOLs Following ↗ · 2026-05-16 Cached

A Codex skill that analyzes codebases to identify performance hotspots such as loops, repeated lookups, and N+1 patterns.

0 favorites 0 likes

#optimization

How to Write to SSDs

Lobsters Hottest ↗ · 2026-05-16 Cached

This paper proposes out-of-place write optimizations for database systems to fully leverage SSD performance, achieving 1.65-2.24x throughput improvement and 6.2-9.8x reduction in flash writes on OLTP benchmarks.

0 favorites 0 likes

optimization

Submit Feedback