Tag
Charlie Marsh (creator of Ruff) rhetorically asks if it's allowed to make things hundreds of times faster, likely referencing a major optimization in a software development tool.
The author shares detailed tuning tips for running the Qwen3.6-35B-A3B MoE model on an 8GB RTX 3070 Ti with up to 262k context using llama.cpp, achieving 30+ tps, and notes a 25% speed boost when switching from Windows to Ubuntu Server.
Sayak Paul describes a project to profile and optimize Diffusers pipelines using torch.compile, and announces a tutorial series by Ari G. on the topic.
This paper introduces CAST, a polynomial-time approximation algorithm for strategically allocating HIV treatment resources to virally unsuppressed individuals in a transmission network to minimize new infections, outperforming existing baselines on real-world networks.
This paper introduces Text2Opt-Bench, a scalable benchmark for text-to-optimization, and identifies that LLMs struggle with 'binding' (grounding problem data) rather than 'modeling' (choosing optimization structure). The authors propose BIND, a simple inference-time method that externalizes numeric data, significantly improving accuracy across models.
Introduces DualOptim+, an optimization framework for LLM unlearning that uses shared base states and decoupled delta states to balance forgetting and retaining objectives, with a quantized variant for reduced memory.
The paper introduces COSMO-Agent, a tool-augmented reinforcement learning framework that trains LLMs to perform closed-loop CAD-CAE optimization, iteratively generating parametric geometries and running simulations until constraints are satisfied, with a multi-constraint reward and a new industry-aligned dataset.
A tweet suggests that scaling the embedding learning rate by model width can replace the need for µP (micro-parameterization), referencing Muon optimizer for hidden layers and Adam for the rest.
Manning Books announces a new early access book 'Building LLM Applications with DSPy', teaching how to use the DSPy framework to optimize LLM prompts with Python. The book is 50% off through June 3rd.
Charlie Marsh shares a personal goal of finding simple one-line optimizations that can speed up a parser by 20-30%.
A configuration guide for Claude Code beginners, introducing 8 key environment variables to optimize performance, reduce costs, and improve the experience.
This article argues that AI's primary goal should be protecting human agency, framing agency as the foundational substrate for values, preferences, and alignment. It explores how degradation of agency undermines meaningful evaluation and action, and proposes that legitimacy in AI systems must come from demonstrable protection of agency at the local level.
Teknium shares recent performance improvements for tool calling in AI agents, including deferring imports, cutting 47% of per-conversation function calls, and deferring compression feasibility checks, with links to working code on GitHub.
Build 9254 of llama.cpp fixes a token generation regression and adds Programmatic Dependent Launch (PDL) support for NVIDIA GPUs, yielding up to 10% speedup in token generation on newer hardware.
User shares an optimized recipe for running Qwen 3.5 122B Int4 on a single DGX Spark with vLLM, achieving over 40 tokens per second. They invite others to try and further optimize it.
A technical deep-dive into reducing the size of Zig ELF binaries, starting from 2180K to under 500 bytes by stripping debug info, switching to ReleaseSmall, and using a freestanding target.
An article explaining the concepts of strong convexity and L-smoothness in optimization, known as the quadratic sandwich, and their role in gradient descent performance.
microsandbox replaced its slow user-space FUSE filesystem with a kernel-mounted EROFS disk image, achieving a 47× geometric mean speedup across filesystem operations and eliminating the VM/host round-trip bottleneck.
An article discussing a technique to convert an integer to a decimal string in under two nanoseconds, focusing on performance optimization.
Introduces QuantFPFlow, a reinforcement learning framework that uses quantum amplitude estimation to achieve a quadratic speedup in estimating the Fokker-Planck partition function for continuous control, improving exploration and avoiding local optima.