Tag
Science Superpowers is an open-source computational-science methodology for AI research agents, enforcing pre-registration and reproducible workflows to prevent p-hacking and HARKing.
ScientistOne introduces Chain-of-Evidence, a verifiability framework for autonomous research agents that ensures every claim is traceable to evidence, achieving zero hallucinated references, perfect score verification, and the highest method-code alignment across 75 papers while matching or exceeding human expert performance on frontier research tasks.
The Onyx open-source deep research system achieves top ranking by stripping search access from its orchestrator agent, forcing it to decompose queries into focused research threads. Its three-phase pipeline and two-level architecture prevent information distortion and premature answering, outperforming proprietary solutions from OpenAI, Anthropic, and Google.
NineLayer, an MCP-based search engine for coding and research agents, has improved latency from 40s to 1.5s and is seeking user input on which platform integrations to prioritize.
This paper introduces REFLECT, a meta-evaluation benchmark for assessing the reliability of LLM judges in evaluating deep research agents. Experiments show current LLM judges remain unreliable, with overall accuracies below 55% across reasoning, tool-use, and report-quality failures.