optimization

#optimization

Convergence of Steepest Descent and Adam under Non-Uniform Smoothness

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.

0 favorites 0 likes

#optimization

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

arXiv cs.AI ↗ · 2026-06-01 Cached

Proposes UniScale, an online framework that unifies model routing and test-time scaling via contextual bandit optimization for better quality-cost trade-offs in LLM inference.

0 favorites 0 likes

#optimization

A Unified Framework for Gradient Aggregation in Multi-Objective Optimization

arXiv cs.LG ↗ · 2026-06-01 Cached

This paper presents a unified theoretical framework for gradient aggregation in multi-objective optimization, establishing convergence rates to Pareto stationarity. The authors introduce a sufficient alignment condition and demonstrate its application to existing and new algorithms, such as capped MGDA.

0 favorites 0 likes

#optimization

The Most Dangerous Procurement Agent Is the One That Works Perfectly

Reddit r/artificial ↗ · 2026-05-31 Cached

An analysis of the dangers of AI agents in procurement that execute their tasks perfectly but optimize for the wrong metrics, leading to systemic failures that are harder to detect than hallucinations. The article warns that over-optimization for proxies like cost or delivery time can collapse suppliers or violate sustainability regulations, and that human intuition is missing from these systems.

0 favorites 0 likes

#optimization

Flash Attention for llama.cpp on RDNA3: 47% less KV VRAM than Vulkan f16 K, KLD almost losselss on F16 K / q4_0 V. Part 1.

Reddit r/LocalLLaMA ↗ · 2026-05-31

A new packed16 K technique for llama.cpp on RDNA3 GPUs reduces KV cache VRAM by 47% compared to Vulkan fp16, using int8 packing and native dot4 instructions to maintain fp16-quality K values with minimal KLD loss.

0 favorites 0 likes

#optimization

Try this tool to reduce Claude costs by changing Effort/Thinking parameters based on prompt complexity

Reddit r/openclaw ↗ · 2026-05-31

A GitHub tool that reduces Claude API costs by dynamically adjusting effort/thinking parameters based on prompt complexity.

0 favorites 0 likes

#optimization

Bias Compounds, Variance Washes Out

Hacker News Top ↗ · 2026-05-29 Cached

This article demonstrates that using stochastic rounding for BF16 optimizer state can match FP32 performance because unbiased errors cancel over time, whereas round-to-nearest stalls due to compounding bias. An experiment with an MLP shows BF16+SR achieves similar loss to FP32 while using less memory.

0 favorites 0 likes

#optimization

Unveiling Multi-regime Patterns in SciML: Distinct Failure Modes and Regime-specific Optimization

arXiv cs.LG ↗ · 2026-05-29 Cached

This paper identifies a consistent three-regime structure in scientific machine learning models, showing that optimization effectiveness is regime-specific and can challenge conventional loss-landscape interpretations. It proposes a regime-aware diagnostic framework validated across PINNs, neural operators, and neural ODEs.

0 favorites 0 likes

#optimization

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper proposes DynSess, a unified session-level evaluation and optimization framework for role-playing agents, addressing the limitation of turn-level metrics by scoring complete dialogue sessions and using session-level rewards to train more coherent character models.

0 favorites 0 likes

#optimization

UX (Humans) vs. AX (Agents)

Reddit r/AI_Agents ↗ · 2026-05-29

Compares UX for humans to AX for AI agents, introducing OpenIngress, a tool that provides accessibility scores and fixes to optimize web interfaces for agent interactions.

0 favorites 0 likes

#optimization

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Hugging Face Blog ↗ · 2026-05-29 Cached

A beginner-friendly guide to using PyTorch's torch.profiler for profiling and optimizing neural network operations, starting with matrix multiplication and bias addition. It explains how to read profiler traces and understand CPU/GPU interactions.

0 favorites 0 likes

#optimization

Generating Robust Portfolios of Optimization Models using Large Language Models

arXiv cs.AI ↗ · 2026-05-27 Cached

Proposes a method to generate portfolios of optimization models using LLMs, with theoretical guarantees and empirical validation.

0 favorites 0 likes

#optimization

Developing a Totally Unimodular Linear Program for Optimal Conformance Checking: When and Why It Complements A*

arXiv cs.AI ↗ · 2026-05-27 Cached

This paper introduces a totally unimodular linear programming reformulation for alignment-based conformance checking, which complements A* search by providing speedups for long traces with deviations. The approach achieves 38.6% average runtime savings with 96% selection accuracy.

0 favorites 0 likes

#optimization

The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper investigates the mechanisms underlying sequential knowledge editing in LLMs, showing that many regularization strategies are unnecessary and that stability emerges naturally from properly accounting for accumulated editing constraints.

0 favorites 0 likes

#optimization

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

arXiv cs.AI ↗ · 2026-05-27 Cached

UnityMAS-O introduces a general RL optimization framework for LLM-based multi-agent systems, treating entire workflows as optimization units with role-level credit assignment and configurable parameter sharing, demonstrating significant gains on QA and code generation tasks.

0 favorites 0 likes

#optimization

In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

arXiv cs.CL ↗ · 2026-05-27 Cached

This paper studies retrieval-augmented generation as an in-context optimization process, showing that linear self-attention can implement gradient descent on a unified RAG objective. It proposes a lightweight method for frozen RAG LLMs that predicts context-conditioned updates, improving performance across multiple QA benchmarks.

0 favorites 0 likes

#optimization

@pallavishekhar_: Math Behind Gradient Descent Read here: https://outcomeschool.com/blog/math-behind-gradient-descent…

X AI KOLs Timeline ↗ · 2026-05-26 Cached

This blog post explains the math behind gradient descent, the fundamental optimization algorithm used to train machine learning models, with a step-by-step numeric example and intuition.

0 favorites 0 likes

#optimization

Accelerating std::copy_if using SIMD

Lobsters Hottest ↗ · 2026-05-26 Cached

Blog post analyzing and implementing a SIMD-accelerated version of std::copy_if using AVX-512 instructions on AMD Zen 4, with performance analysis and comparisons to compiler auto-vectorization.

0 favorites 0 likes

#optimization

@Xudong07452910: This SkillOpt paper is quite interesting—it actually addresses a very important point: AI agents in the future won't just rely on humans writing prompts; they can train their own 'job descriptions'. Currently, many skills/prompts are written one-off, and when real tasks pile up, various edge cases start to fail...

X AI KOLs Timeline ↗ · 2026-05-26 Cached

SkillOpt introduces a systematic controllable text-space optimizer that enables AI agents to train and improve their own skills (like 'work instructions') through iterative edits and validation, outperforming human-crafted and one-shot prompts across multiple benchmarks and models.

0 favorites 0 likes

#optimization

From One-Pass SGD to Data Reuse: Mini-Batch Scaling Laws in Sketched Linear Regression

arXiv cs.LG ↗ · 2026-05-26 Cached

This paper derives batch scaling laws for sketched linear regression under power-law spectra, analyzing one-pass and multi-pass mini-batch SGD. It provides explicit risk decompositions showing how batch size affects bias, variance, and fluctuation terms, and establishes that without-replacement sampling yields lower noise than with-replacement.

0 favorites 0 likes

optimization

Submit Feedback