batching

#batching

Why does your agent still give different answers at temperature 0?

Reddit r/AI_Agents ↗ · 23h ago

Setting temperature to 0 does not guarantee deterministic tool calls in agents due to batched inference causing floating-point reduction order shifts, leading to token flips and different actions under load.

0 favorites 0 likes

#batching

@LangChain: https://x.com/LangChain/status/2061864647884464430

X AI KOLs Following ↗ · 2d ago Cached

A study by LangChain and Harvey explores methods to reduce the cost of verifying legal agent outputs by batching criteria evaluations and using open models, achieving order-of-magnitude cost savings while maintaining near-frontier performance.

0 favorites 0 likes

#batching

@Greptime: On Prometheus remote write, the bottleneck wasn't network or memtable — it was the Region Worker holding &mut while dec…

X AI KOLs Following ↗ · 2d ago Cached

GreptimeDB v1.0 introduces Pending Rows Batcher, a three-stage pipeline that moves CPU-intensive work off the Datanode's critical section, improving Prometheus remote write throughput from 1.20M to 2.17M points/sec and reducing Datanode CPU usage by 20%.

0 favorites 0 likes

#batching

Threshold-Based Exclusive Batching for LLM Inference

arXiv cs.AI ↗ · 2d ago Cached

This paper analyzes the trade-off between mixed batching and exclusive batching for LLM inference, showing that the optimal choice depends on GPU memory bandwidth. It proposes a threshold-based hybrid scheduler that dynamically switches between the two methods, achieving up to 41.9% higher throughput on bandwidth-constrained GPUs.

0 favorites 0 likes

#batching

@adrgrondin: Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs …

X AI KOLs Following ↗ · 2026-05-20 Cached

Demonstrates running subagents locally on a MacBook Pro M5 using Codex CLI and LM Studio with Qwen 3.6 and MLX batching for code review and bug detection.

0 favorites 0 likes

#batching

@lmstudio: Batching for vision models is now available in Beta with our latest MLX engine update The updated engine also brings ma…

X AI KOLs Following ↗ · 2026-05-14 Cached

LM Studio announces a beta update to its MLX engine, introducing batching for vision models and improved caching for faster inference.

0 favorites 0 likes

batching

Why does your agent still give different answers at temperature 0?

@LangChain: https://x.com/LangChain/status/2061864647884464430

@Greptime: On Prometheus remote write, the bottleneck wasn't network or memtable — it was the Region Worker holding &mut while dec…

Threshold-Based Exclusive Batching for LLM Inference

@adrgrondin: Subagents running locally and simultaneously on MacBook Pro M5 with Codex CLI + @lmstudio to review code and find bugs …

@lmstudio: Batching for vision models is now available in Beta with our latest MLX engine update The updated engine also brings ma…

Submit Feedback