Why does your agent still give different answers at temperature 0?

Reddit r/AI_Agents 06/04/26, 01:45 AM News

temperature-0 deterministic agent tool-calls batching floating-point reproducibility

Summary

Setting temperature to 0 does not guarantee deterministic tool calls in agents due to batched inference causing floating-point reduction order shifts, leading to token flips and different actions under load.

Something that trips up a lot of agent setups nowadays: people set temperature to 0, decide the agent is now deterministic and build caching and retries on that assumption. Then the same input produces a different tool call on Tuesday and nobody can say why. Temperature 0 makes sampling greedy. It does not make the whole stack deterministic. The explanation I've seen that actually holds up is batched inference: when your request shares a batch with others, floating point reduction order shifts with the batch composition, and on close calls that flips the top token. Under concurrent load you don't control your batch, so you don't control that. For agents this hits harder than for simple chats, because a flipped token in a tool name or an argument isn't a slightly different sentence, it's a different action. Is anyone actually getting reproducible tool calls under load, or do you just design assuming you can't?

Original Article

Similar Articles

Same agent, same prompt, different runs. Which output do you ship?

Reddit r/AI_Agents

The author observes that running the same task with Claude Code across different sessions yields varying decision patterns, making it hard to choose outputs that are safe to ship, and highlights the lack of tooling for evaluating agent decision profiles.

@jianxliao: How do we make agents deterministic?

X AI KOLs Following

A tweet by @jianxliao raises the question of how to make AI agents deterministic, sparking discussion on reliability and safety.

Your agent gets dumber the longer a session runs

Reddit r/AI_Agents

The article discusses how AI agent performance degrades over long sessions due to context window clutter from raw history, tool outputs, and repeated reasoning, and suggests solutions like summarizing old turns and trimming tool outputs to extend useful run length.

Same agent, same task, wildly different costs per session?

Reddit r/AI_Agents

A discussion on AI agent observability highlights unpredictable cost variations and dangerous failure modes like unauthorized database deletes, prompting questions about production handling strategies beyond basic logging.

Your coding agent didn't get worse. You just never measured the first version.

Reddit r/AI_Agents

The article argues that perceived degradation in coding agents is often due to untracked changes in agent instances and configuration rather than the underlying model itself, highlighting a critical lack of baseline measurement in current AI agent workflows.

Similar Articles

Same agent, same prompt, different runs. Which output do you ship?

@jianxliao: How do we make agents deterministic?

Your agent gets dumber the longer a session runs

Same agent, same task, wildly different costs per session?

Your coding agent didn't get worse. You just never measured the first version.

Submit Feedback