optimization

#optimization

@ying11231: Impressive performance on TPU.

X AI KOLs Timeline ↗ · 2026-06-17 Cached

A blog post from LMSYS Org details optimizing Ling-2.6-1T, a 1 trillion parameter hybrid MoE model, on TPU v7x using SGLang-JAX, achieving efficient inference by hiding MoE data movement behind computation with a single Pallas kernel.

0 favorites 0 likes

#optimization

@jerryjliu0: We made Claude better and faster at understanding PDFs The trick isn’t just creating the fastest free document parser o…

X AI KOLs Following ↗ · 2026-06-17 Cached

LlamaIndex improved their LiteParse PDF parsing skill for Claude agents, making it 37% cheaper and more accurate by optimizing agent behavior through evaluation traces.

0 favorites 0 likes

#optimization

Computed goto for efficient dispatch tables (2012)

Hacker News Top ↗ · 2026-06-17 Cached

Explains the use of GCC's computed goto extension to improve the performance of bytecode VM dispatch tables compared to traditional switch statements, with a simple example.

0 favorites 0 likes

#optimization

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv cs.LG ↗ · 2026-06-17 Cached

Proposes MGUP, a momentum-gradient alignment update policy for selective intra-layer parameter updates in stochastic optimization, which integrates with optimizers like AdamW, Lion, and Muon, and provides theoretical convergence guarantees along with superior performance on large-scale model training tasks.

0 favorites 0 likes

#optimization

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper uses a Transformer-based model on MLB Statcast data to counterfactually optimize baseball pitch sequences, finding that optimizing both final and setup pitches can improve season-level statistics like K/9 by over 1.0.

0 favorites 0 likes

#optimization

Rethinking Groups in Critic-Free RLVR

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper rethinks the role of grouping in critic-free reinforcement learning for LLMs and proposes negative token filtering to enable stable training with a single rollout per prompt, achieving comparable or better performance on reasoning and agentic tasks.

0 favorites 0 likes

#optimization

Skill-Constrained Model Predictive Control for Resilient Manufacturing Supply Chains

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper presents a skill-constrained model predictive control approach for resilient manufacturing supply chains, where training decisions affect future certified capacity. The controller solves a finite-horizon mixed-integer program and is evaluated on synthetic scenarios, showing that predictive control helps when bottlenecks are forecastable but is not universally superior.

0 favorites 0 likes

#optimization

I didn't know it was possible to compile llamacpp to run cuda + vulkan at the same time..

Reddit r/LocalLLaMA ↗ · 2026-06-16

The author discovered that compiling llama.cpp with both CUDA and Vulkan backends simultaneously is possible, yielding a ~10% improvement in tokens/sec for decoding. They plan to run further benchmarks to assess the benefits.

0 favorites 0 likes

#optimization

Making ast.walk 220x Faster

Hacker News Top ↗ · 2026-06-16 Cached

The Reflex team optimized Python's ast.walk by 220x for their AI code generation linter by removing generator overhead, inlining functions, and implementing a Rust binding.

0 favorites 0 likes

#optimization

@umichkim: AI for Science is moving from “writing text” to “writing and testing scientific code.” A new Nature paper introduces ER…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

A new Nature paper introduces ERA, an AI system that iteratively writes, runs, scores, and improves scientific code through tree search, moving AI for science from text generation to code testing.

0 favorites 0 likes

#optimization

The time the x86 emulator team found code so bad that they fixed it during emulation

Lobsters Hottest ↗ · 2026-06-16 Cached

A story from a Windows x86 emulator team about encountering a program with a fully unrolled 64KB initialization loop (65,536 instructions) and adding a special optimization to replace it with a tight loop.

0 favorites 0 likes

#optimization

Large Language Models as Optimizers: A Survey of Direct vs. Tool-Augmented Approaches and Their Performance Frontiers

arXiv cs.AI ↗ · 2026-06-16 Cached

This survey categorizes LLM-based optimization into three paradigms—direct, tool-augmented, and tool-creating—and reviews their performance frontiers and limitations.

0 favorites 0 likes

#optimization

Spokes: Optimizing for Diverse Pretraining Data Selection

arXiv cs.CL ↗ · 2026-06-16 Cached

This paper introduces Spokes, a probabilistic diversification framework using the G-Vendi score to optimize diversity in pretraining data selection, achieving significant improvements in downstream task performance on FineWeb and DCLM by jointly optimizing quality and diversity.

0 favorites 0 likes

#optimization

When to use what Schatten-$p$ norm in deep learning?

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper provides guidance on the appropriate use of different Schatten-p norms in deep learning, analyzing their theoretical properties and practical implications for model regularization and optimization.

0 favorites 0 likes

#optimization

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper introduces AdaNAGED, a method that combines zero-order optimization, parameter-free adaptation, and non-Euclidean update geometry for memory-efficient fine-tuning of large language models, with theoretical convergence guarantees and validation on the OPT-1.3B model.

0 favorites 0 likes

#optimization

{\alpha}-Fair Insurance Pricing: A Fairness Continuum

arXiv cs.LG ↗ · 2026-06-16 Cached

This paper proposes an α-Fair Individual Solvent Premium (α-FISP) framework for insurance pricing that balances actuarial fairness and solidarity fairness while ensuring solvency, using constrained optimization to yield a continuum of pricing solutions.

0 favorites 0 likes

#optimization

DFlash and Spec V2 Decoding (14 minute read)

TLDR AI ↗ · 2026-06-16 Cached

Z Lab, SGLang, and Modal release DFlash, a new speculative decoding model for Qwen 3.5 397B-A17B that uses block diffusion and KV injection to achieve over 4x throughput improvement over baseline and 1.5x over native MTP.

0 favorites 0 likes

#optimization

@songhan_mit: Explore our continued efforts on KV cache compression:

X AI KOLs Following ↗ · 2026-06-15 Cached

A tweet from Song Han highlights continued work on KV cache compression, featuring a blog by Weian Mao that discusses system-level aspects often overlooked in papers.

0 favorites 0 likes

#optimization

This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b

Reddit r/LocalLLaMA ↗ · 2026-06-15

A new KV cache optimization called kvflash doubles generation speed and reduces VRAM usage for Qwen 3.6-27B on a single RTX 3090 while maintaining accuracy.

0 favorites 0 likes

#optimization

Clojure is almost as fast as C (with some help)

Lobsters Hottest ↗ · 2026-06-15 Cached

This article details how Clojure, with the JVM's Vector API and careful optimization, achieved frame rates within 20% of C for a 3D stress test, demonstrating that a dynamic language can approach low-level performance on hot loops.

0 favorites 0 likes

optimization

Submit Feedback