Tag
The team at Vercel has significantly optimized the performance of their homepage, using techniques like WebGPU shaders and scrutinizing every frame, and they plan to share the lessons learned.
A guide on avoiding rate limits and reducing costs when using the GLM 5.2 model, covering prompt batching, caching, free model alternatives, effort levels, context window management, and self-hosting.
This article details a performance improvement in libffi, where caching argument placement as a flat list of moves (a 'plan') eliminates redundant reclassification on every function call, offering significant speedups without resorting to JIT compilation.
A speculative discussion questioning why LLMs are not trained to think in an optimized internal language rather than natural language, and whether that could improve efficiency.
A detailed tutorial on implementing CUDA Graphs in an LLM inference server Tokn, covering FastAPI server setup, engine initialization, and CUDA Graph capture for optimized decode phases.
A developer created a unified installer that combines existing token-saving tools like OpenSpec, RTK, and ccusage for Copilot and Claude Code, with a command-line interface that shows real token consumption savings.
Introduces RACL, a reasoning-agent control layer that improves metaheuristic optimization by learning to control internal search behavior from operational memory, showing cost improvements in vehicle routing tests.
本文介绍ORAgentBench,一个用于评估LLM代理在端到端运筹学任务中表现的执行基准,包含107个经过人工审查的任务。实验表明,当前最佳代理仅通过35.51%的任务,揭示了在可靠决策制定方面的重大不足。
This paper models a question-answering forum staffed by expert knowledge workers, studying optimal scheduling to maximize system capacity and stability.
Introduces Evolving Programmatic Bottlenecks (EPB), a framework for interpreting neural combinatorial optimization policies by distilling black-box models into human-readable program portfolios using LLM-guided evolution.
This draft book chapter provides an infographic and detailed analysis of operation costs in CPU clock cycles for modern C++ CPUs, covering multiplication, division, and RTTI with latency tables for various architectures.
A guide on building minimal NixOS ISOs and reducing their size, with comparisons to Alpine Linux and step-by-step optimization techniques.
This paper proposes a POMDP framework for multi-objective decision making in lithium production, addressing geological, demand, and pricing uncertainties to optimize mine opening and extraction method selection. The approach outperforms human-inspired heuristics by dynamically adapting to shifting price regimes through belief state planning.
A blog post from LMSYS Org details optimizing Ling-2.6-1T, a 1 trillion parameter hybrid MoE model, on TPU v7x using SGLang-JAX, achieving efficient inference by hiding MoE data movement behind computation with a single Pallas kernel.
LlamaIndex improved their LiteParse PDF parsing skill for Claude agents, making it 37% cheaper and more accurate by optimizing agent behavior through evaluation traces.
Explains the use of GCC's computed goto extension to improve the performance of bytecode VM dispatch tables compared to traditional switch statements, with a simple example.
Proposes MGUP, a momentum-gradient alignment update policy for selective intra-layer parameter updates in stochastic optimization, which integrates with optimizers like AdamW, Lion, and Muon, and provides theoretical convergence guarantees along with superior performance on large-scale model training tasks.
This paper uses a Transformer-based model on MLB Statcast data to counterfactually optimize baseball pitch sequences, finding that optimizing both final and setup pitches can improve season-level statistics like K/9 by over 1.0.
This paper rethinks the role of grouping in critic-free reinforcement learning for LLMs and proposes negative token filtering to enable stable training with a single rollout per prompt, achieving comparable or better performance on reasoning and agentic tasks.
This paper presents a skill-constrained model predictive control approach for resilient manufacturing supply chains, where training decisions affect future certified capacity. The controller solves a finite-horizon mixed-integer program and is evaluated on synthetic scenarios, showing that predictive control helps when bottlenecks are forecastable but is not universally superior.