optimization

#optimization

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper derives batch scaling laws for sketched linear regression under power-law spectra, analyzing one-pass and multi-pass mini-batch SGD. It provides explicit risk decompositions showing how batch size affects bias, variance, and fluctuation terms, and establishes that without-replacement sampling yields lower noise than with-replacement.

0 favorites 0 likes

#optimization

A lift for input-convex neural network training

arXiv cs.LG ↗ · 2026-05-26 Cached

Proposes a 'lift' method for training input-convex neural networks (ICNNs) that uses an unconstrained hypernetwork to emit non-negative inter-layer weights, softening the loss landscape and escaping gradient attenuation, achieving lower test loss than projected gradient descent and softplus reparametrization.

0 favorites 0 likes

#optimization

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

arXiv cs.AI ↗ · 2026-05-26 Cached

This paper analyzes tradeoffs between latency, reliability, and cost in LLM-enabled agentic workflows, introducing performance models and deriving optimal resource allocation policies like water-filling token allocation.

0 favorites 0 likes

#optimization

@dosco: i'm seeing a lot of industry papers that are karpathy's auto research loop (not cited) or a codex optimization goal for…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

A critical observation about recent industry AI papers lacking novelty, citing examples like SkillOpt that treat natural-language skills as trainable external parameters.

0 favorites 0 likes

#optimization

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Hugging Face Daily Papers ↗ · 2026-05-26 Cached

This paper systematically studies scale vectors in LLM normalization layers, showing they optimize training through a self-amplifying preconditioning effect, and proposes three lightweight improvements that enhance performance and scaling behavior with negligible overhead.

0 favorites 0 likes

#optimization

Wrote a custom C++ engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B) to bypass framework overhead

Reddit r/LocalLLaMA ↗ · 2026-05-25

Developed a custom C++ inference engine for MiniCPM-V 4.6 on Orange Pi AIPro (Ascend 310B NPU), achieving 2x speedup over stock framework by writing optimized AscendC kernels for matmul and causal-conv1d, reaching 5.90 tokens/s.

0 favorites 0 likes

#optimization

Anytime Training with Schedule-Free Spectral Optimization

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper introduces SF-NorMuon, a schedule-free spectral optimizer that matches or exceeds tuned AdamW on language models up to 772M parameters, with theoretical guarantees for stationarity and long-horizon stability.

0 favorites 0 likes

#optimization

Solving the Aircraft Disassembly Scheduling Problem

arXiv cs.AI ↗ · 2026-05-25 Cached

This paper presents the aircraft disassembly scheduling problem, a large-scale combinatorial optimization task involving thousands of tasks, precedence relations, balance constraints, and limited space. It proposes a Constraint Programming model and a MIP model tested on real operational instances with up to 1450 tasks.

0 favorites 0 likes

#optimization

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

Hugging Face Daily Papers ↗ · 2026-05-25 Cached

DVAO adaptively weights objectives based on reward variance to improve multi-reward RL training stability and multi-objective performance.

0 favorites 0 likes

#optimization

@Italianclownz: Converted Qwen 3.6 35b a3b to ROCmfp4 and this is flying. Used the mtp version bc this ROCmfp4 can also incorporate the…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

Converted the Qwen 3.6 35b a3b model to ROCmfp4 format, leveraging MTP benefits for improved performance on AMD hardware.

0 favorites 0 likes

#optimization

@davideciffa: If you have an Nvidia RTX 4090 --ddtree-budget 36 is the best configuration that buys you 2.5x speed up during decoding…

X AI KOLs Timeline ↗ · 2026-05-24 Cached

A tweet recommending --ddtree-budget 36 for Nvidia RTX 4090, claiming 2.5x speedup during decoding for Qwen3.6_27B.

0 favorites 0 likes

#optimization

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

Hugging Face Daily Papers ↗ · 2026-05-24 Cached

This paper studies reward hacking in reinforcement learning for language models through the geometry of updates, identifying optimization drift as a key factor. It proposes trusted-direction projection to constrain gradients within a clean reference subspace, delaying shortcut exploitation and preserving task performance.

0 favorites 0 likes

#optimization

@no_stp_on_snek: @antirez Turbo3 BEATS fp8 by +5% decode tok/s at 32K context still tinkering but i've been cooking TQ+ in your kitchen

X AI KOLs Following ↗ · 2026-05-23 Cached

Turbo3 achieves 5% faster decode tokens per second compared to fp8 at 32K context, a performance improvement in quantization or model optimization.

0 favorites 0 likes

#optimization

@LigengZhu: Excited to share the KDA: Kernel Design Agents that powers HAN Lab Kernel Mafia top ranking #1~3 kernels at Kernel Cont…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

KDA is an agent-driven kernel design framework that helped HAN Lab achieve top rankings in the MLSys FlashInfer Kernel Contest by minimizing human involvement. The agent leverages Humanize, KernelWiki, and profiler skills to produce state-of-the-art kernels.

0 favorites 0 likes

#optimization

@no_stp_on_snek: Always start with uncompressed k and compressed V and go more aggressively from there. Model families have different se…

X AI KOLs Following ↗ · 2026-05-23 Cached

A tip on KV-cache compression for transformer models: start with uncompressed keys and compressed values, then adjust based on model family sensitivity; try asymmetric before symmetric compression.

0 favorites 0 likes

#optimization

@L1vsun: i spent 3 months building the most optimized claude code setup possible it was running worse than day one 23 plugins, 8…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

A developer shares that after spending 3 months over-optimizing a Claude code setup with 23 plugins and multiple frameworks, performance was worse than day one; deleting almost everything dramatically improved results, emphasizing that a minimal setup often works best.

0 favorites 0 likes

#optimization

@techNmak: This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look in…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

Explains the mathematical concepts of gradient, Jacobian, and Hessian as fundamental tools in AI model training, describing how they measure change and their roles in optimization.

0 favorites 0 likes

#optimization

Making Deep Learning Go Brrrr from First Principles

Hacker News Top ↗ · 2026-05-23 Cached

A comprehensive blog post explaining how to optimize deep learning performance by understanding three key components: compute, memory bandwidth, and overhead, using first principles to identify the performance regime and focus on effective optimizations.

0 favorites 0 likes

#optimization

That one time I used Go panics for flow control

Lobsters Hottest ↗ · 2026-05-23 Cached

A Go engineer recounts an incident where an in-memory datastore became overloaded due to slow sorting, and they implemented context cancellation inside sort functions by using panics and recover for non-local flow control, similar to how encoding/json handles errors.

0 favorites 0 likes

#optimization

@charliermarsh: Are you allowed to make things hundreds of times faster? Does anyone know?

X AI KOLs Following ↗ · 2026-05-22 Cached

Charlie Marsh (creator of Ruff) rhetorically asks if it's allowed to make things hundreds of times faster, likely referencing a major optimization in a software development tool.

0 favorites 0 likes

optimization

Submit Feedback