batching

#batching

BatchDAG: LLM-Planned Execution Graphs for Scalable Ad-Hoc Analysis Over Enterprise Data

arXiv cs.AI ↗ · 7h ago Cached

BatchDAG introduces a system where an LLM generates typed directed acyclic graphs of operations for scalable ad-hoc analysis over enterprise data, achieving up to 47x reduction in LLM calls and sub-60-second query times over 50,000+ meetings.

0 favorites 0 likes

#batching

Defining new Jax types with hijax

Hacker News Top ↗ · 2026-07-12 Cached

This documentation introduces hijax types, a new feature in JAX that allows defining custom types with their own invariants, tangent types, batching, and sharding behavior, illustrated with an example of quantized arrays.

0 favorites 0 likes

#batching

@injaneity: https://x.com/injaneity/status/2075659478096376158

X AI KOLs Timeline ↗ · 2026-07-10 Cached

This article explains how batching and parallel operations improve latency and efficiency in AI computer use systems, highlighting open-source implementations like pi-computer-use and cua-driver that achieved significant performance gains before similar features appeared in Codex.

0 favorites 0 likes

#batching

@athleticKoder: A 1600-word note on how llm inference work: Covering: 1. Attention - the only place tokens interact 2. KV caching - why…

X AI KOLs Timeline ↗ · 2026-07-02 Cached

A detailed thread explaining key concepts of LLM inference: attention, KV caching, chunked prefill, and batching techniques, including continuous batching used in vLLM and SGLang.

0 favorites 0 likes

#batching

An open handbook on LLM inference at scale (GPU internals, KV cache, batching, vLLM/SGLang/TensorRT-LLM) [P]

Reddit r/MachineLearning ↗ · 2026-06-20

An open, in-progress handbook explaining LLM inference internals including GPU memory hierarchy, KV cache, batching, and popular inference engines like vLLM and TensorRT-LLM.

0 favorites 0 likes

#batching

Why does your agent still give different answers at temperature 0?

Reddit r/AI_Agents ↗ · 2026-06-04

Setting temperature to 0 does not guarantee deterministic tool calls in agents due to batched inference causing floating-point reduction order shifts, leading to token flips and different actions under load.

0 favorites 0 likes

#batching

@LangChain: https://x.com/LangChain/status/2061864647884464430

X AI KOLs Following ↗ · 2026-06-02 Cached

A study by LangChain and Harvey explores methods to reduce the cost of verifying legal agent outputs by batching criteria evaluations and using open models, achieving order-of-magnitude cost savings while maintaining near-frontier performance.

0 favorites 0 likes

#batching

@Greptime: On Prometheus remote write, the bottleneck wasn't network or memtable — it was the Region Worker holding &mut while dec…

X AI KOLs Following ↗ · 2026-06-02 Cached

GreptimeDB v1.0 introduces Pending Rows Batcher, a three-stage pipeline that moves CPU-intensive work off the Datanode's critical section, improving Prometheus remote write throughput from 1.20M to 2.17M points/sec and reducing Datanode CPU usage by 20%.

0 favorites 0 likes

#batching

Threshold-Based Exclusive Batching for LLM Inference

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper analyzes the trade-off between mixed batching and exclusive batching for LLM inference, showing that the optimal choice depends on GPU memory bandwidth. It proposes a threshold-based hybrid scheduler that dynamically switches between the two methods, achieving up to 41.9% higher throughput on bandwidth-constrained GPUs.

0 favorites 0 likes

#batching

@adrgrondin: Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs …

X AI KOLs Following ↗ · 2026-05-20 Cached

Demonstrates running subagents locally on a MacBook Pro M5 using Codex CLI and LM Studio with Qwen 3.6 and MLX batching for code review and bug detection.

0 favorites 0 likes

#batching

@lmstudio: Batching for vision models is now available in Beta with our latest MLX engine update The updated engine also brings ma…

X AI KOLs Following ↗ · 2026-05-14 Cached

LM Studio announces a beta update to its MLX engine, introducing batching for vision models and improved caching for faster inference.

0 favorites 0 likes

batching

Submit Feedback