Tag
This paper generalizes non-uniform smoothness assumptions to objectives whose curvature is affine in the objective value, proving convergence rates for steepest descent and diagonal variants of RMSProp and Adam, with applications to logistic regression and neural networks.
Proposes UniScale, an online framework that unifies model routing and test-time scaling via contextual bandit optimization for better quality-cost trade-offs in LLM inference.
This paper presents a unified theoretical framework for gradient aggregation in multi-objective optimization, establishing convergence rates to Pareto stationarity. The authors introduce a sufficient alignment condition and demonstrate its application to existing and new algorithms, such as capped MGDA.
An analysis of the dangers of AI agents in procurement that execute their tasks perfectly but optimize for the wrong metrics, leading to systemic failures that are harder to detect than hallucinations. The article warns that over-optimization for proxies like cost or delivery time can collapse suppliers or violate sustainability regulations, and that human intuition is missing from these systems.
A new packed16 K technique for llama.cpp on RDNA3 GPUs reduces KV cache VRAM by 47% compared to Vulkan fp16, using int8 packing and native dot4 instructions to maintain fp16-quality K values with minimal KLD loss.
A GitHub tool that reduces Claude API costs by dynamically adjusting effort/thinking parameters based on prompt complexity.
This article demonstrates that using stochastic rounding for BF16 optimizer state can match FP32 performance because unbiased errors cancel over time, whereas round-to-nearest stalls due to compounding bias. An experiment with an MLP shows BF16+SR achieves similar loss to FP32 while using less memory.
This paper identifies a consistent three-regime structure in scientific machine learning models, showing that optimization effectiveness is regime-specific and can challenge conventional loss-landscape interpretations. It proposes a regime-aware diagnostic framework validated across PINNs, neural operators, and neural ODEs.
This paper proposes DynSess, a unified session-level evaluation and optimization framework for role-playing agents, addressing the limitation of turn-level metrics by scoring complete dialogue sessions and using session-level rewards to train more coherent character models.
Compares UX for humans to AX for AI agents, introducing OpenIngress, a tool that provides accessibility scores and fixes to optimize web interfaces for agent interactions.
A beginner-friendly guide to using PyTorch's torch.profiler for profiling and optimizing neural network operations, starting with matrix multiplication and bias addition. It explains how to read profiler traces and understand CPU/GPU interactions.
Proposes a method to generate portfolios of optimization models using LLMs, with theoretical guarantees and empirical validation.
This paper introduces a totally unimodular linear programming reformulation for alignment-based conformance checking, which complements A* search by providing speedups for long traces with deviations. The approach achieves 38.6% average runtime savings with 96% selection accuracy.
This paper investigates the mechanisms underlying sequential knowledge editing in LLMs, showing that many regularization strategies are unnecessary and that stability emerges naturally from properly accounting for accumulated editing constraints.
UnityMAS-O introduces a general RL optimization framework for LLM-based multi-agent systems, treating entire workflows as optimization units with role-level credit assignment and configurable parameter sharing, demonstrating significant gains on QA and code generation tasks.
This paper studies retrieval-augmented generation as an in-context optimization process, showing that linear self-attention can implement gradient descent on a unified RAG objective. It proposes a lightweight method for frozen RAG LLMs that predicts context-conditioned updates, improving performance across multiple QA benchmarks.
This blog post explains the math behind gradient descent, the fundamental optimization algorithm used to train machine learning models, with a step-by-step numeric example and intuition.
Blog post analyzing and implementing a SIMD-accelerated version of std::copy_if using AVX-512 instructions on AMD Zen 4, with performance analysis and comparisons to compiler auto-vectorization.
SkillOpt introduces a systematic controllable text-space optimizer that enables AI agents to train and improve their own skills (like 'work instructions') through iterative edits and validation, outperforming human-crafted and one-shot prompts across multiple benchmarks and models.
This paper derives batch scaling laws for sketched linear regression under power-law spectra, analyzing one-pass and multi-pass mini-batch SGD. It provides explicit risk decompositions showing how batch size affects bias, variance, and fluctuation terms, and establishes that without-replacement sampling yields lower noise than with-replacement.