CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Hugging Face Daily Papers 05/28/26, 12:00 AM Papers

Summary

CausaLab is a scalable environment for evaluating LLM agents on interactive causal discovery, assessing both predictive accuracy and faithful recovery of underlying causal mechanisms. Experiments reveal a gap between prediction and mechanism recovery, highlighting limits in current LLM agents as experimental causal reasoners.

We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is grounded in a faithful recovered causal mechanism. Each episode places an agent in a synthetic laboratory: it receives prior measurement records, intervenes on a manipulator crystal, and predicts the resonance frequency of a held-out reactor crystal governed by the same mechanism. The hidden data-generating process is a randomly sampled structural causal model (SCM), so success requires recovering both a causal graph and structural equations rather than recalling prior knowledge. Experiments show a persistent gap between prediction and mechanism recovery: in the purely observational 6-node setting, GPT-5.2-high reaches 92% task accuracy but only 0.471 all-edge F_1. Mixed observation-intervention strategies improve structural fidelity, while pure intervention remains difficult even for strong agents. We identify premature stopping as a major weakness and show that consistency verification mitigates it. CausaLab therefore separates predictive success from causal understanding and exposes current LLM agents' limits as experimental causal reasoners.

Original Article

View Cached Full Text

Cached at: 05/29/26, 07:00 AM

Paper page - CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Source: https://huggingface.co/papers/2605.26029 Published on May 28

Submitted byhttps://huggingface.co/shizhuo2

Dylanon May 29

Abstract

CausaLab evaluates LLM agents on causal discovery by requiring both accurate predictions and faithful recovery of underlying causal mechanisms through synthetic experimental scenarios.

We introduce CausaLab, a scalable environment for evaluating interactivecausal discoveryby LLM agents. Unlike prior evaluations, CausaLab evaluates both whether an agent can solve a problem using causal evidence and whether its answer is grounded in a faithful recovered causal mechanism. Each episode places an agent in a synthetic laboratory: it receives prior measurement records, intervenes on a manipulator crystal, and predicts the resonance frequency of a held-out reactor crystal governed by the same mechanism. The hidden data-generating process is a randomly sampledstructural causal model(SCM), so success requires recovering both acausal graphandstructural equationsrather than recalling prior knowledge. Experiments show a persistent gap between prediction and mechanism recovery: in the purely observational 6-node setting, GPT-5.2-high reaches 92% task accuracy but only 0.471 all-edge F_1. Mixed observation-interventionstrategies improve structural fidelity, while pureinterventionremains difficult even for strong agents. We identify premature stopping as a major weakness and show that consistency verification mitigates it. CausaLab therefore separatespredictive successfromcausal understandingand exposes current LLM agents’ limits as experimental causal reasoners.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2605\.26029

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.26029 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.26029 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.26029 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Paper page - CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

LLM Explainability with Counterfactual Chains and Causal Graphs

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents

Submit Feedback

Similar Articles

"Excuse me, may I say something..." CoLabScience, A Proactive AI Assistant for Biomedical Discovery and LLM-Expert Collaborations

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

LLM Explainability with Counterfactual Chains and Causal Graphs

Counterfactual Evaluation Reveals Hidden Capability Profiles in Clinical LLMs and Agents