Tag
EnergyLens is an end-to-end framework for predictive energy-aware optimization of multi-GPU LLM inference, validated on Llama3 and Qwen3-MoE, achieving mean absolute percentage errors between 9.25% and 13.19% and revealing significant energy variation across configurations.
This paper develops a principled scaling theory for Mixture-of-Experts (MoE) architectures, introducing the Maximally Scale-Stable Parameterization (MSSP) that ensures stable training and hyperparameter transfer across width, depth, expert width, and number of experts, validated by experiments.
Proposes PPOW, a reinforcement learning framework for optimizing draft models in speculative decoding using window-level objectives and adaptive windowing, achieving significant speedups across multiple benchmarks.
A user shares a fix for performance bottlenecks when running AI models on AMD GPUs in Windows 11 by disabling memory compression via the command 'Disable-mmagent -mc'.
Introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework for combining multiple task-specific experts into a single model, achieving state-of-the-art performance on vision and language benchmarks.
This paper identifies 'staleness amplification' in bilevel optimization under delayed feedback and proposes IGT-OMD, which uses Implicit Gradient Transport to achieve sublinear regret and improve decision loss on benchmarks like Warcraft shortest-path and LQR.
Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.
The article argues that the primary AI risk may not be superintelligence but rather systems that optimize flawed, incomplete representations of reality, leading to institutional drift, automated misclassification, and invisible governance failures.
TanStack Devtools migrated to OxcProject parser and magic-string, achieving a 3.56× speedup with per-file transform dropping from 1.65 ms to 0.46 ms.
Crustimate is a tool that helps optimize your LinkedIn profile to be discovered by AI-powered recruiters.
DeepSpeed is an open-source deep learning optimization library from Microsoft that enables efficient distributed training and inference of large-scale models with features like ZeRO, 3D parallelism, and Mixture-of-Experts.
The article discusses using Google's OR-Tools CP-SAT solver to optimize maintenance scheduling for cloud infrastructure at Akamai, addressing complex constraints like capacity and concurrency.
The article discusses Partial Static Single Information (SSI) form, an extension to SSA in compilers that captures path-dependent type information. It proposes a practical shortcut for implementing Partial SSI during SSA construction in dynamic languages, specifically referencing an implementation in Ruby's ZJIT.
This paper challenges the geometric justification for the Muon optimizer, arguing that precise structure is less important than step-size optimality. It introduces Freon and Kaon optimizers to demonstrate that random or inverted spectra can perform as well as Muon.
This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.
The article introduces Newton's Lantern, a reinforcement learning framework for finetuning warm start models to solve the AC power flow problem more efficiently, particularly near voltage collapse.
This paper introduces ReVision, a method to reduce token usage in computer-use agents by removing redundant visual patches from consecutive screenshots. It demonstrates that this efficiency gain allows agents to process longer trajectories and improve performance on benchmarks like OSWorld.
SPIN is a planning wrapper that ensures structurally valid DAG plans and uses prefix-based execution control to reduce task steps and tool calls in industrial LLM agent systems, improving plan validity and efficiency.
FlowCompile is a compiler for structured LLM workflows that performs compile-time exploration of configurations to balance accuracy and latency, achieving up to 6.4x speedup without retraining.
F-GRPO proposes a factorized group-relative policy optimization framework that unifies candidate generation and ranking in a single autoregressive LLM, addressing credit assignment issues and improving top-ranked performance across sequential recommendation and multi-hop QA benchmarks.