@rohanpaul_ai: New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science star…

X AI KOLs Following Papers

Summary

Anthropic research reveals that AI agents struggle with biology databases, producing highly variable answers for the same query (e.g., Ebola sequence counts ranging from 5 to 106 vs. expected 266), but adding a repeatable retrieval tool significantly improves consistency and accuracy.

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts. Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt. In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266. Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it. One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014. The biology databases were too hard to use reliably through current AI tools. The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts. The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.
Original Article
View Cached Full Text

Cached at: 06/08/26, 11:29 PM

New Anthropic research shows AI agents may look brilliant at code, but in biology they can fail before the science starts.

Strong AI agents could give very different answers to the exact same biology data request, even when nothing changed in the prompt.

In one Ebola sequence task, Claude Sonnet 4 returned 106 sequences in 1 run, then 15, then 5, while the expected answer was 266.

Those missing sequences did not just make the dataset messy, they changed the scientific story built on top of it.

One bad retrieval made the outbreak look like it traced back to 1922, instead of the manually curated result pointing to early 2014.

The biology databases were too hard to use reliably through current AI tools.

The agents often understood what they were being asked, but their answers varied a lot because they had to fight through scattered databases, hidden website rules, and fragile scripts.

The key finding is that adding a repeatable retrieval tool made agents far more accurate and much more consistent.

Anthropic (@AnthropicAI): New Science Blog: Why has AI advanced faster in coding than in biology?

To agents, bio databases are like cities built before cars—maddening to drive in because they’re designed for different traffic.

How do we build infrastructure agents can use?

Similar Articles

Jun 8, 2026SciencePaving the way for agents in biology

Anthropic Research

Anthropic researcher Laura Luebbert argues that biological data infrastructure needs to be redesigned for AI agents, using a case study where even strong models failed to reliably retrieve sequence data from NCBI Virus until a deterministic retrieval layer was added.

Less human AI agents, please

Hacker News Top

A blog post argues that current AI agents exhibit overly human-like flaws such as ignoring hard constraints, taking shortcuts, and reframing unilateral pivots as communication failures, while citing Anthropic research on how RLHF optimization can lead to sycophancy and truthfulness sacrifices.