Tag
The paper introduces RiVER, a reinforcement learning method that improves LLMs' coding performance on problems without known gold solutions by ranking programs on hidden test cases and providing graded feedback.
A user details their setup running Qwen 27B with llama.cpp on an RTX PRO 6000 Blackwell for local coding agents, compares performance to Claude models, and asks for help resolving frequent crashes and malformed response issues.
This paper derives a scaling law for sketched linear contrastive learning under a Gaussian latent-variable model, analyzing how risk decomposes into approximation, optimization, and statistical terms, and provides theoretical guidance for balancing model size, data, and compute in contrastive learning.
This paper presents CASOP, a framework for context-aware synthesis and evaluation of optimization pipelines for warehouse order fulfillment, enabling automatic construction of valid algorithmic pipelines from a modular repository.
This paper provides optimal high-probability bounds for stochastic gradient descent under Markovian noise for PL-smooth objectives, closing gaps between expectation and high-probability guarantees and extending to heavy-tailed settings with matching lower bounds.
This paper proposes an agentic aggregator framework for coordinating electric bus fleet operations, integrating optimization-based scheduling with supervisory AI agents to handle disturbances, tariff adaptation, and value allocation, revealing trade-offs between operational efficiency and profit-oriented pricing.
BunnyxStudio spent 3 weeks removing SwiftData, resulting in a significant improvement in Hive's startup speed. A library of 66,000 images is almost instantly usable without waiting.
LFM2.5 230M model achieves 1,400 tokens per second in-browser using custom WebGPU kernels, demonstrating efficient local inference.
This article discusses how traditional primary key designs can isolate tables, and introduces structured primary keys as an alternative approach to improve SQL query performance and maintain relational integrity.
Describes a technique to improve AI agent speed by moving stable context out of the prompt, reducing token usage and latency.
The article discusses how LLM code style choices affect token consumption and costs, offering optimizations such as using Web API standards and simpler indentation to reduce output tokens.
This paper presents Agentic-LTPO, a nested bilevel optimization framework that uses agentic AI to adapt physical layer configurations under dynamic operator policies, achieving 57.2% long-term performance improvement in cell-free MIMO beamforming.
This article revisits techniques for creating extremely small ELF executables on Linux, exploring how to reduce size to 45 bytes by abusing header fields and overlapping structures while maintaining ELF specification conformance.
A discussion about the focus of AI evaluations, questioning whether practitioners are optimizing prompts, context, or the entire harness, and noting a shift toward holistic optimization.
Update on running a non-quantized DeepSeek-v4-Flash model at 11 tok/s on a single DGX Spark using sglang inference and a custom mega-kernel, progressing towards GLM-5.2.
HALO is an open-source desktop app that uses reinforcement learning from model-based (RLM) techniques to debug and optimize AI agent traces locally, providing analysis and actionable recommendations.
The author measured token waste in AI coding agents and found 42% avoidable, then built a tool to catch it. The tool works with Claude Code, Cursor, and Codex.
The article describes libdeflate's new level 13, a deliberately slow DEFLATE compression level that achieves marginally better compression (0.134% on Silesia) at the cost of being 56x slower than level 12, designed for scenarios where data is compressed once and decompressed many times.
The article explains how the author achieved p99 zero-millisecond perceived latency for autocomplete on 240 million domain names by prefetching suggestions on keyDown and caching, with a fast API built on Tranco and CZDS data.
The team at Vercel has significantly optimized the performance of their homepage, using techniques like WebGPU shaders and scrutinizing every frame, and they plan to share the lessons learned.