@rohanpaul_ai: New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science star…

X AI KOLs Following 06/08/26, 08:57 PM Papers

anthropic ai-agents biology coding reproducibility data-retrieval ebola

Summary

Anthropic research reveals that AI agents struggle with biology databases, producing highly variable answers for the same query (e.g., Ebola sequence counts ranging from 5 to 106 vs. expected 266), but adding a repeatable retrieval tool significantly improves consistency and accuracy.

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts. Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt. In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266. Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it. One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014. The biology databases were too hard to use reliably through current AI tools. The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts. The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

Original Article

View Cached Full Text

Cached at: 06/08/26, 11:29 PM

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.

In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.

Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.

One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.

The biology databases were too hard to use reliably through current AI tools.

The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.

The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

Anthropic (@AnthropicAI): New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they’re designed for different traffic.

How do we build infrastructure agents can use?

@rohanpaul_ai: New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science star…

Similar Articles

@AnthropicAI: New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built …

Jun 8, 2026SciencePaving the way for agents in biology

The weirdest thing about AI agents is how human failure patterns start showing up

Less human AI agents, please

Anyone else feel like AI agents are amazing right up until things get complicated?

Submit Feedback

Similar Articles

@AnthropicAI: New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built …

Jun 8, 2026SciencePaving the way for agents in biology

The weirdest thing about AI agents is how human failure patterns start showing up

Anyone else feel like AI agents are amazing right up until things get complicated?