optimization

Tag

Cards List
#optimization

EnergyLens: Predictive Energy-Aware Exploration for Multi-GPU LLM Inference Optimization

arXiv cs.LG · 7h ago Cached

EnergyLens is an end-to-end framework for predictive energy-aware optimization of multi-GPU LLM inference, validated on Llama3 and Qwen3-MoE, achieving mean absolute percentage errors between 9.25% and 13.19% and revealing significant energy variation across configurations.

0 favorites 0 likes
#optimization

How to Scale Mixture-of-Experts: From muP to the Maximally Scale-Stable Parameterization

arXiv cs.LG · 7h ago Cached

This paper develops a principled scaling theory for Mixture-of-Experts (MoE) architectures, introducing the Maximally Scale-Stable Parameterization (MSSP) that ensures stable training and hyperparameter transfer across width, depth, expert width, and number of experts, validated by experiments.

0 favorites 0 likes
#optimization

Performance-Driven Policy Optimization for Speculative Decoding with Adaptive Windowing

arXiv cs.CL · 7h ago Cached

Proposes PPOW, a reinforcement learning framework for optimizing draft models in speculative decoding using window-level objectives and adaptive windowing, achieving significant speedups across multiple benchmarks.

0 favorites 0 likes
#optimization

If you're using Windows, disable memory compression to stop bottlenecks!

Reddit r/LocalLLaMA · 23h ago

A user shares a fix for performance bottlenecks when running AI models on AMD GPUs in Windows 11 by disabling memory compression via the command 'Disable-mmagent -mc'.

0 favorites 0 likes
#optimization

Bayesian Model Merging

arXiv cs.LG · yesterday Cached

Introduces Bayesian Model Merging (BMM), a plug-and-play bi-level optimization framework for combining multiple task-specific experts into a single model, achieving state-of-the-art performance on vision and language benchmarks.

0 favorites 0 likes
#optimization

IGT-OMD: Implicit Gradient Transport for Decision-Focused Learning under Delayed Feedback

arXiv cs.LG · yesterday Cached

This paper identifies 'staleness amplification' in bilevel optimization under delayed feedback and proposes IGT-OMD, which uses Implicit Gradient Transport to achieve sublinear regret and improve decision loss on benchmarks like Warcraft shortest-path and LQR.

0 favorites 0 likes
#optimization

@LakshyAAAgrawal: Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization.…

X AI KOLs Following · yesterday

Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.

0 favorites 0 likes
#optimization

The biggest AI risk may not be superintelligence — but optimized misunderstanding

Reddit r/artificial · yesterday

The article argues that the primary AI risk may not be superintelligence but rather systems that optimize flawed, incomplete representations of reality, leading to institutional drift, automated misclassification, and invisible governance failures.

0 favorites 0 likes
#optimization

@tan_stack: TanStack Devtools just migrated to @OxcProject parser + magic-string! The results: Per-file transform: 1.65 ms → 0.46 m…

X AI KOLs Following · yesterday Cached

TanStack Devtools migrated to OxcProject parser and magic-string, achieving a 3.56× speedup with per-file transform dropping from 1.65 ms to 0.46 ms.

0 favorites 0 likes
#optimization

Crustimate

Product Hunt · yesterday

Crustimate is a tool that helps optimize your LinkedIn profile to be discovered by AI-powered recruiters.

0 favorites 0 likes
#optimization

@_vmlops: MICROSOFT RESEARCHERS BUILT THIS TO TRAIN 530B PARAMETER MODELS Deepspeed is a deep learning optimization library that …

X AI KOLs Timeline · yesterday Cached

DeepSpeed is an open-source deep learning optimization library from Microsoft that enables efficient distributed training and inference of large-scale models with features like ZeRO, 3D parallelism, and Mixture-of-Experts.

0 favorites 0 likes
#optimization

Using OR-Tools CP-SAT for Scheduling Problems

Hacker News Top · yesterday Cached

The article discusses using Google's OR-Tools CP-SAT solver to optimize maintenance scheduling for cloud infrastructure at Akamai, addressing complex constraints like capacity and concurrency.

0 favorites 0 likes
#optimization

Partial static single information form

Lobsters Hottest · 2d ago Cached

The article discusses Partial Static Single Information (SSI) form, an extension to SSA in compilers that captures path-dependent type information. It proposes a practical shortcut for implementing Partial SSI during SSA construction in dynamic languages, specifically referencing an implementation in Ruby's ZJIT.

0 favorites 0 likes
#optimization

Muon is Not That Special: Random or Inverted Spectra Work Just as Well

arXiv cs.LG · 2d ago Cached

This paper challenges the geometric justification for the Muon optimizer, arguing that precise structure is less important than step-size optimality. It introduces Freon and Kaon optimizers to demonstrate that random or inverted spectra can perform as well as Muon.

0 favorites 0 likes
#optimization

Optimistic Dual Averaging Unifies Modern Optimizers

arXiv cs.LG · 2d ago Cached

This paper introduces SODA, a generalization of Optimistic Dual Averaging that unifies various modern optimizers like Muon and Lion. It proposes a practical wrapper that improves performance across different scales without requiring additional hyperparameter tuning for weight decay.

0 favorites 0 likes
#optimization

Newton's Lantern: A Reinforcement Learning Framework for Finetuning AC Power Flow Warm Start Models

arXiv cs.LG · 2d ago Cached

The article introduces Newton's Lantern, a reinforcement learning framework for finetuning warm start models to solve the AC power flow problem more efficiently, particularly near voltage collapse.

0 favorites 0 likes
#optimization

ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction

arXiv cs.CL · 2d ago Cached

This paper introduces ReVision, a method to reduce token usage in computer-use agents by removing redundant visual patches from consecutive screenshots. It demonstrates that this efficiency gain allows agents to process longer trajectories and improve performance on benchmarks like OSWorld.

0 favorites 0 likes
#optimization

SPIN: Structural LLM Planning via Iterative Navigation for Industrial Tasks

Hugging Face Daily Papers · 2d ago Cached

SPIN is a planning wrapper that ensures structurally valid DAG plans and uses prefix-based execution control to reduce task steps and tool calls in industrial LLM agent systems, improving plan validity and efficiency.

0 favorites 0 likes
#optimization

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

Hugging Face Daily Papers · 2d ago Cached

FlowCompile is a compiler for structured LLM workflows that performs compile-time exploration of configurations to balance accuracy and latency, achieving up to 6.4x speedup without retraining.

0 favorites 0 likes
#optimization

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Hugging Face Daily Papers · 2d ago Cached

F-GRPO proposes a factorized group-relative policy optimization framework that unifies candidate generation and ranking in a single autoregressive LLM, addressing credit assignment issues and improving top-ranked performance across sequential recommendation and multi-hop QA benchmarks.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback