optimization

#optimization

@no_stp_on_snek: Always start with uncompressed k and compressed V and go more aggressively from there. Model families have different se…

X AI KOLs Following ↗ · 2026-05-23 Cached

A tip on KV-cache compression for transformer models: start with uncompressed keys and compressed values, then adjust based on model family sensitivity; try asymmetric before symmetric compression.

0 favorites 0 likes

#optimization

@L1vsun: i spent 3 months building the most optimized claude code setup possible it was running worse than day one 23 plugins, 8…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

A developer shares that after spending 3 months over-optimizing a Claude code setup with 23 plugins and multiple frameworks, performance was worse than day one; deleting almost everything dramatically improved results, emphasizing that a minimal setup often works best.

0 favorites 0 likes

#optimization

@techNmak: This math sits underneath every AI model being trained right now. Gradient. Jacobian. Hessian. Three words that look in…

X AI KOLs Timeline ↗ · 2026-05-23 Cached

Explains the mathematical concepts of gradient, Jacobian, and Hessian as fundamental tools in AI model training, describing how they measure change and their roles in optimization.

0 favorites 0 likes

#optimization

Making Deep Learning Go Brrrr from First Principles

Hacker News Top ↗ · 2026-05-23 Cached

A comprehensive blog post explaining how to optimize deep learning performance by understanding three key components: compute, memory bandwidth, and overhead, using first principles to identify the performance regime and focus on effective optimizations.

0 favorites 0 likes

#optimization

That one time I used Go panics for flow control

Lobsters Hottest ↗ · 2026-05-23 Cached

A Go engineer recounts an incident where an in-memory datastore became overloaded due to slow sorting, and they implemented context cancellation inside sort functions by using panics and recover for non-local flow control, similar to how encoding/json handles errors.

0 favorites 0 likes

#optimization

@charliermarsh: Are you allowed to make things hundreds of times faster? Does anyone know?

X AI KOLs Following ↗ · 2026-05-22 Cached

Charlie Marsh (creator of Ruff) rhetorically asks if it's allowed to make things hundreds of times faster, likely referencing a major optimization in a software development tool.

0 favorites 0 likes

#optimization

Qwen3.6-35B-A3B Q4 262k context on 8GB 3070 Ti = +30tps

Reddit r/LocalLLaMA ↗ · 2026-05-22

The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.

0 favorites 0 likes

#optimization

@RisingSayak: I realized that what I cannot profile, I cannot optimize. This is why I embarked on a little project in Diffusers, to t…

X AI KOLs Following ↗ · 2026-05-22 Cached

Sayak Paul describes a project to profile and optimize Diffusers pipelines using torch.compile, and announces a tutorial series by Ari G. on the topic.

0 favorites 0 likes

#optimization

Network-Based Interventions for HIV Prevention via Cascade-Aware Suppression of Transmission

arXiv cs.AI ↗ · 2026-05-22 Cached

This paper introduces CAST, a polynomial-time approximation algorithm for strategically allocating HIV treatment resources to virally unsuppressed individuals in a transmission network to minimize new infections, outperforming existing baselines on real-world networks.

0 favorites 0 likes

#optimization

Models Can Model, But Can't Bind: Structured Grounding in Text-to-Optimization

arXiv cs.LG ↗ · 2026-05-22 Cached

This paper introduces Text2Opt-Bench, a scalable benchmark for text-to-optimization, and identifies that LLMs struggle with 'binding' (grounding problem data) rather than 'modeling' (choosing optimization structure). The authors propose BIND, a simple inference-time method that externalizes numeric data, significantly improving accuracy across models.

0 favorites 0 likes

#optimization

DualOptim+: Bridging Shared and Decoupled Optimizer States for Better Machine Unlearning in Large Language Models

arXiv cs.LG ↗ · 2026-05-22 Cached

Introduces DualOptim+, an optimization framework for LLM unlearning that uses shared base states and decoupled delta states to balance forgetting and retaining objectives, with a quantized variant for reduced memory.

0 favorites 0 likes

#optimization

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

arXiv cs.AI ↗ · 2026-05-22 Cached

The paper introduces COSMO-Agent, a tool-augmented reinforcement learning framework that trains LLMs to perform closed-loop CAD-CAE optimization, iteratively generating parametric geometries and running simulations until constraints are satisfied, with a multi-constraint reward and a new industry-aligned dataset.

0 favorites 0 likes

#optimization

@maximelabonne: Turns out you never really needed µP, you just needed to scale the embedding learning rate by model width I'm no nanoGP…

X AI KOLs Following ↗ · 2026-05-21 Cached

A tweet suggests that scaling the embedding learning rate by model width can replace the need for µP (micro-parameterization), referencing Muon optimizer for hidden layers and Adam for the rest.

0 favorites 0 likes

#optimization

@ManningBooks: Prompt engineering gets messy fast. What starts as a simple instruction can turn into endless tweaking, context adjustm…

X AI KOLs Following ↗ · 2026-05-21 Cached

Manning Books announces a new early access book 'Building LLM Applications with DSPy', teaching how to use the DSPy framework to optimize LLM prompts with Python. The book is 50% off through June 3rd.

0 favorites 0 likes

#optimization

@charliermarsh: /goal for finding silly one-line optimizations that speed up your parser by 20-30%

X AI KOLs Following ↗ · 2026-05-21 Cached

Charlie Marsh shares a personal goal of finding simple one-line optimizations that can speed up a parser by 20-30%.

0 favorites 0 likes

#optimization

@Xudong07452910: https://x.com/Xudong07452910/status/2057386528859381870

X AI KOLs Timeline ↗ · 2026-05-21 Cached

A configuration guide for Claude Code beginners, introducing 8 key environment variables to optimize performance, reduce costs, and improve the experience.

0 favorites 0 likes

#optimization

What should AI's goal be? I think it should be protecting human agency.

Reddit r/ArtificialInteligence ↗ · 2026-05-21

This article argues that AI's primary goal should be protecting human agency, framing agency as the foundational substrate for values, preferences, and alignment. It explores how degradation of agency undermines meaningful evaluation and action, and proposes that legitimacy in AI systems must come from demonstrable protection of agency at the local level.

0 favorites 0 likes

#optimization

@morganlinton: I asked Teknium, who is probably one of the smartest agent devs in the world, what he did recently to speed up tool cal…

X AI KOLs Following ↗ · 2026-05-21 Cached

Teknium shares recent performance improvements for tool calling in AI agents, including deferring imports, cutting 47% of per-conversation function calls, and deferring compression feasibility checks, with links to working code on GitHub.

0 favorites 0 likes

#optimization

Build 9254 fixes my TG regression and adds PDL for NVIDIA GPUs

Reddit r/LocalLLaMA ↗ · 2026-05-20

Build 9254 of llama.cpp fixes a token generation regression and adds Programmatic Dependent Launch (PDL) support for NVIDIA GPUs, yielding up to 10% speedup in token generation on newer hardware.

0 favorites 0 likes

#optimization

40+tok/s - optimized recipe for Qwen 3.5 122B Int4 on a single DGX Spark with vLLM

Reddit r/LocalLLaMA ↗ · 2026-05-20

User shares an optimized recipe for running Qwen 3.5 122B Int4 on a single DGX Spark with vLLM, achieving over 40 tokens per second. They invite others to try and further optimize it.

0 favorites 0 likes

optimization

Submit Feedback