optimization

#optimization

Not All MTP Assistants Are Created Equal

Reddit r/LocalLLaMA ↗ · 2026-06-12

A detailed technical exploration of MTP speculative decoding in llama.cpp with Gemma 4 models, showing that assistant model selection and quantization significantly impact speedups, and that not all 'same name' assistants perform equally.

0 favorites 0 likes

#optimization

EAGLE3 has landed in llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-12 Cached

EAGLE3, a speculative decoding method, has been integrated into llama.cpp, enabling faster inference.

0 favorites 0 likes

#optimization

The Hidden Power of Scaling Factor in LoRA Optimization

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper reveals that the scaling factor α in LoRA optimization is more influential than the learning rate, and proposes LoRA-α, a framework that improves performance and simplifies hyperparameter search by restoring α to its principled regime.

0 favorites 0 likes

#optimization

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

arXiv cs.AI ↗ · 2026-06-12 Cached

Arbor introduces structured tree search as a cognition layer for autonomous agents, enabling multi-day, full-stack LLM inference optimization with up to 193% throughput-latency improvement over vendor baselines through a checks-and-balances multi-agent architecture.

0 favorites 0 likes

#optimization

NaturalFlow: Reducing Disruptive Pauses for Natural Speech Flow in Simultaneous Speech-to-Speech Translation

arXiv cs.CL ↗ · 2026-06-12 Cached

This paper introduces NaturalFlow, a fluency-aware optimization framework that reduces disruptive pauses in simultaneous speech-to-speech translation by leveraging model-internal signals, achieving a balance between low latency and natural speech flow.

0 favorites 0 likes

#optimization

Finding Optimal Tokenizers

Hacker News Top ↗ · 2026-06-11 Cached

This blog post presents an algorithm using integer linear programming to compute optimal tokenizers for language models, drawing parallels to solving the Traveling Salesman Problem. It notes that while the result is theoretically interesting, practical tokenizers are already near-optimal and the method may not generalize well.

0 favorites 0 likes

#optimization

Mirror Descent Beyond Euclidean Stability: An Exponential Separation in Initialization Sensitivity

arXiv cs.LG ↗ · 2026-06-11 Cached

This paper reveals that Mirror Descent with non-quadratic regularizers can be exponentially more sensitive to initialization than Gradient Descent, even under well-conditioned settings, which has implications for reproducibility in RL and LLM post-training.

0 favorites 0 likes

#optimization

SwiftCTS: Fast Cross-Design Prediction and Pareto Optimization of Clock Tree Metrics via Few-Shot Calibration

arXiv cs.LG ↗ · 2026-06-11 Cached

SwiftCTS is a physics-informed surrogate framework that uses gradient-boosted ensembles and few-shot calibration to rapidly predict and Pareto-optimize clock tree metrics (power, wirelength, timing skew) across unseen designs, achieving high accuracy with minimal training data.

0 favorites 0 likes

#optimization

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

arXiv cs.CL ↗ · 2026-06-11 Cached

Introduces Compatibility-Aware Dynamic Fine-Tuning (CADFT), an extension of Dynamic Fine-Tuning that controls sample-level optimization variance in LLM supervised fine-tuning, improving stability and generalization.

0 favorites 0 likes

#optimization

Inverse Rubric Optimization: A testbed for agent science

Hacker News Top ↗ · 2026-06-11 Cached

Fulcrum Research introduces Inverse Rubric Optimization (IRO), a testbed for studying long-horizon agent behavior where agents must optimize the preferences of a black-box judge. The approach enables smooth scaling and rich behavior analysis, with experiments showing frontier models like Fable 5 and Opus 4.6 have different scaling characteristics.

0 favorites 0 likes

#optimization

@gregpr07: Browser Use Beta just achieved SOTA on our hardest internal web agent benchmark. Fable is genuinely amazing for optimiz…

X AI KOLs Following ↗ · 2026-06-11 Cached

Browser Use Beta achieved state-of-the-art results on a difficult internal web agent benchmark, using Fable for optimization and analysis.

0 favorites 0 likes

#optimization

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

Hugging Face Daily Papers ↗ · 2026-06-11 Cached

This paper analyzes on-policy distillation (OPD), finding that OPD updates are sparse, distributed across layers and FFN-heavy, and retain geometric properties distinct from dense parameter rewriting. The sparse structure is operationally useful, but sparsity-inducing SGD underperforms AdamW due to heterogeneous gradient scales.

0 favorites 0 likes

#optimization

Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-10 Cached

A pull request for llama.cpp that removes padding and multiple device-to-device copies for Multi-Token Prediction (MTP), improving performance on GPU.

0 favorites 0 likes

#optimization

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

arXiv cs.LG ↗ · 2026-06-10 Cached

This paper proposes trainable smooth-rotation transforms with quantile-robust scaling and gradient-based optimization to improve post-training quantization of LLMs, achieving significant error reduction on LLaMA-3.2-1B under W4A4 quantization.

0 favorites 0 likes

#optimization

Sim2Schedule: A Simulator-Guided LLM Framework for Autonomous Open-Pit Mine Scheduling

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper introduces Sim2Schedule, a simulator-guided LLM framework for autonomous open-pit mine scheduling that achieves 94-99% of the optimal NPV from MILP while scaling linearly in computation time, operating zero-shot without fine-tuning.

0 favorites 0 likes

#optimization

An Efficient Method for the Optimal Control of Microgrids Under Uncertainties using Local Reduction

Hugging Face Daily Papers ↗ · 2026-06-10 Cached

Proposes and compares two mathematical formulations for robust microgrid sizing and power scheduling under uncertainties, using a local reduction algorithm that achieves high feasibility rates in Monte Carlo simulations.

0 favorites 0 likes

#optimization

Optimality of Sequential Filtering Under Independent Cost and Selectivity Models

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper formalizes the problem of ordering filters in sequential filtering pipelines under independent cost and selectivity models, proving that ordering by increasing ratio of cost to rejection probability is optimal. Monte Carlo simulations demonstrate that this ordering dominates common heuristics both in expectation and across the full distribution of outcomes.

0 favorites 0 likes

#optimization

ggml-webgpu: Improve prefill speeds for k-quants + refactor matmul for Q4/Q5/Q8 and k-quants by yomaytk · Pull Request #24225 · ggml-org/llama.cpp

Reddit r/LocalLLaMA ↗ · 2026-06-09 Cached

Improves prefill speeds for k-quants and refactors matrix multiplication for Q4/Q5/Q8 and k-quants in llama.cpp's WebGPU backend.

0 favorites 0 likes

#optimization

@TheTuringPost: AutoScientists – a research lab made of agents @Harvard researchers connected agents into a self-organizing scientific …

X AI KOLs Timeline ↗ · 2026-06-09 Cached

Harvard researchers present AutoScientists, a multi-agent system that forms self-organizing scientific teams without a central coordinator, achieving strong results on BioML-Bench and optimization tasks.

0 favorites 0 likes

#optimization

Value Numbering

Hacker News Top ↗ · 2026-06-08 Cached

The article explains value numbering, a compiler optimization technique that identifies identical computations to avoid redundancy, building on Static Single Assignment (SSA) form and using hash-consing for efficient comparison.

0 favorites 0 likes

optimization

Submit Feedback