StatefulDiscovery: Evidence-Calibrated Claim Formation in Open-Ended Scientific Discovery
Summary
Introduces StatefulDiscovery, a framework for open-ended scientific discovery that uses externalized investigation state to calibrate evidence and claims, outperforming baselines in producing well-supported high-value claims.
View Cached Full Text
Cached at: 06/11/26, 01:49 PM
# StatefulDiscovery: Evidence-Calibrated Claim Formation in Open-Ended Scientific Discovery Source: [https://arxiv.org/abs/2606.11851](https://arxiv.org/abs/2606.11851) [View PDF](https://arxiv.org/pdf/2606.11851) > Abstract:Open\-ended scientific discovery asks agents to move beyond executing analyses for predefined questions\. Across multiple rounds of exploration, a discovery agent must decide which phenomena warrant investigation while avoiding overinterpretation, where emerging claims exceed the evidential scope of the analyses supporting them\. This creates an evidence\-calibration problem: the exploration trajectory must be coupled with claim status so that evidence can guide both what to investigate next and what can be claimed\. We introduce StatefulDiscovery, a discovery framework that externalizes investigation state and uses it to coordinate frontier selection, evidence acquisition, and claim adjudication\. We evaluate StatefulDiscovery across 40 real\-data discovery tasks\. Compared with several baselines, StatefulDiscovery produces more claims overall judged to be both well\-supported and high\-value\. Ablations indicate that structured hypotheses, local adjudication, and frontier control contribute to performance\. Together, these results suggest that explicit discovery state can couple exploration with evidence\-calibrated claim formation\. ## Submission history From: Jiayao Chen \[[view email](https://arxiv.org/show-email/b7af9ecb/2606.11851)\] **\[v1\]**Wed, 10 Jun 2026 09:28:28 UTC \(2,709 KB\)
Similar Articles
Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
This paper introduces Formal Conjectures, an evolving benchmark of 2615 mathematical statements formalized in Lean 4, including open research conjectures for proof discovery and solved problems for auto-formalization, designed to evaluate automated reasoning systems with zero contamination.
LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs
LLM-AutoSciLab is a closed-loop framework that uses LLMs to iteratively generate hypotheses, select informative experiments, and refine mechanisms, achieving superior accuracy and sample efficiency on physics and biology benchmarks over prior static methods.
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence
ScientistOne introduces Chain-of-Evidence, a verifiability framework for autonomous research agents that ensures every claim is traceable to evidence, achieving zero hallucinated references, perfect score verification, and the highest method-code alignment across 75 papers while matching or exceeding human expert performance on frontier research tasks.
When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery
This paper introduces Cartograph, a verification layer for AI scientists that couples subspace experiment steering, ambiguity resolution, and library inadequacy detection. The framework outperforms baselines in autonomous discovery testbeds and retrospectively flags inconclusive claims in the A-Lab materials system.
Declarative Data Services: Structured Agentic Discovery for Composing Data Systems
This paper proposes Declarative Data Services (DDS), an architecture for structured agentic discovery of data-system compositions from declarative user intent. It decomposes the global search into bounded sub-searches and shows convergence on a trading-backend workload where unbounded discovery fails.