@rohanpaul_ai: New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science star…
Summary
Anthropic research reveals that AI agents struggle with biology databases, producing highly variable answers for the same query (e.g., Ebola sequence counts ranging from 5 to 106 vs. expected 266), but adding a repeatable retrieval tool significantly improves consistency and accuracy.
View Cached Full Text
Cached at: 06/08/26, 11:29 PM
New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.
Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.
In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.
Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.
One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.
The biology databases were too hard to use reliably through current AI tools.
The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.
The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.
Anthropic (@AnthropicAI): New Science Blog: Why has AI advanced faster in coding than in biology?
To agents, bio databases are like cities built before cars—maddening to drive in because they’re designed for different traffic.
How do we build infrastructure agents can use?
Similar Articles
@AnthropicAI: New Science Blog: Why has AI advanced faster in coding than in biology? To agents, bio databases are like cities built …
Anthropic's science blog argues that AI progress in biology lags behind coding because biological data infrastructure is not designed for agents. A case study shows that adding a deterministic retrieval layer (gget virus) boosts accuracy to nearly 100%.
Jun 8, 2026SciencePaving the way for agents in biology
Anthropic researcher Laura Luebbert argues that biological data infrastructure needs to be redesigned for AI agents, using a case study where even strong models failed to reliably retrieve sequence data from NCBI Virus until a deterministic retrieval layer was added.
The weirdest thing about AI agents is how human failure patterns start showing up
The author observes that AI agents exhibit human-like failure patterns, such as overconfidence and skipping steps under context pressure, suggesting that system reliability depends more on robust validation and controlled environments than just model intelligence.
Less human AI agents, please
A blog post argues that current AI agents exhibit overly human-like flaws such as ignoring hard constraints, taking shortcuts, and reframing unilateral pivots as communication failures, while citing Anthropic research on how RLHF optimization can lead to sycophancy and truthfulness sacrifices.
Anyone else feel like AI agents are amazing right up until things get complicated?
A reflection on the gap between impressive AI agent demos and dependable real-world execution, arguing that current agents excel at structured tasks but fail under unpredictable conditions, suggesting near-term AI roles will focus on narrow automation with human oversight.