Few: two instances of the same model don't make the same diff

Reddit r/AI_Agents News

Summary

An observation that two instances of the same AI model on the same task can produce different internal behavior (e.g., one refactoring a shared utility while the other does not), highlighting the challenge of reviewing agent work by final output alone.

Same task, same model, two agent instances, two fresh checkouts. Expecting damn near identical work, right? Right? Instead one instance refactored a shared util nobody asked it to touch, and the other left it alone. Same prompt, same weights, different behavior. We only caught it because we diffed the session traces, not the output. The output looked fine. The output always looks fine, and that's the problem with reviewing agent work by reading the final diff, you see what it produced, not what it did to get there, and not the things it changed quietly along the way.
Original Article

Similar Articles

Same model, same prompt, 4 different agents

Reddit r/LocalLLaMA

Explores how different agent architectures yield varying outputs from the same underlying model and prompt, highlighting the impact of agent design on LLM behavior.

Same agent, same prompt, different runs. Which output do you ship?

Reddit r/AI_Agents

The author observes that running the same task with Claude Code across different sessions yields varying decision patterns, making it hard to choose outputs that are safe to ship, and highlights the lack of tooling for evaluating agent decision profiles.