Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

Hugging Face Daily Papers 05/09/26, 12:00 AM Papers

Summary

This paper challenges the claim that prediction bottlenecks in models like Mamba recover causal structure, demonstrating through a new benchmark that gains are largely due to confounds and robustness artifacts rather than true causal discovery.

A Mamba state-space model trained only for next-step prediction appears to recover Granger-causal structure through a simple readout S = |W_{out} W_{in}|, with early experiments suggesting the phenomenon generalized across architectures and benefited from interventional data at p < 10^{-5}. We package the protocol used to test that claim -- standardized synthetic generators (VAR/Lorenz/CauseMe-style), three intervention semantics (do(X=c), soft-noise, random-forcing), edge-provenance cards on three real datasets, and size-matched control arms -- as a reusable falsification benchmark, and walk the claim through it in five stages. The method-level claim does not survive: (i) a plain linear bottleneck does as well or better; (ii) tuned Lasso beats the bottleneck on synthetic CauseMe-style benchmarks, and on Lorenz-96 (the only real benchmark with unambiguous ground truth) classical PCMCI and Granger lead a tight cluster in which the bottleneck trails; (iii) the headline intervention advantage is roughly 60% a sample-size confound, and the residual disappears under standard do(X=c) interventions, surviving only under a non-standard random-forcing scheme; (iv) even that residual reproduces, with a larger effect, in classical bivariate Granger -- the effect is method-agnostic. What survives is a narrow characterization result; the benchmark is the lasting artifact, and each stage above is one of its control arms.

Original Article

View Cached Full Text

Cached at: 05/13/26, 04:13 AM

Paper page - Prediction Bottlenecks Don’t Discover Causal Structure (But Here’s What They Actually Do)

Source: https://huggingface.co/papers/2605.09169 This paper falsifies the claim that next-step prediction bottlenecks—especially Mamba/SSM weight projections—recover causal structure, showing instead that their apparent gains are mostly low-rank regression, sample-size confounds, intervention-semantics artifacts, and target-corruption robustness, with the main durable contribution being a reusable falsification benchmark.

➡️ 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬 𝐨𝐟 𝐭𝐡𝐞𝐢𝐫 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐨𝐧-𝐚𝐬-𝐂𝐚𝐮𝐬𝐚𝐥-𝐃𝐢𝐬𝐜𝐨𝐯𝐞𝐫𝐲 𝐅𝐚𝐥𝐬𝐢𝐟𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐅𝐫𝐚𝐦𝐞𝐰𝐨𝐫𝐤:

🧪 𝑹𝒆𝒖𝒔𝒂𝒃𝒍𝒆 𝑭𝒊𝒗𝒆-𝑺𝒕𝒂𝒈𝒆 𝑭𝒂𝒍𝒔𝒊𝒇𝒊𝒄𝒂𝒕𝒊𝒐𝒏 𝑩𝒆𝒏𝒄𝒉𝒎𝒂𝒓𝒌: Introduces a control-heavy benchmark spanning VAR, Lorenz-96, CauseMe-style generators, real datasets with edge-provenance cards, matched-capacity architectures, size-matched observational controls, and multiple intervention semantics to stress-test claims that prediction models implicitly recover causal graphs.

🧩 𝑾𝒆𝒊𝒈𝒉𝒕-𝑷𝒓𝒐𝒋𝒆𝒄𝒕𝒊𝒐𝒏 𝑪𝒂𝒖𝒔𝒂𝒍𝒊𝒕𝒚 𝑫𝒐𝒆𝒔 𝑵𝒐𝒕 𝑺𝒖𝒓𝒗𝒊𝒗𝒆 𝑪𝒐𝒏𝒕𝒓𝒐𝒍𝒔: Tests the extraction rule (S = |W_{out}W_{in}|) for bottleneck predictors and shows that linear bottlenecks match or beat Mamba SSMs, tuned Lasso dominates on synthetic graph recovery, and classical PCMCI/Granger-style methods outperform the bottleneck on clean Lorenz-96 ground truth.

🧠 𝑰𝒏𝒕𝒆𝒓𝒗𝒆𝒏𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏𝒔 𝑨𝒓𝒆 𝑪𝒐𝒏𝒇𝒐𝒖𝒏𝒅𝒔, 𝑵𝒐𝒕 𝑪𝒂𝒖𝒔𝒂𝒍 𝑬𝒙𝒕𝒓𝒂𝒄𝒕𝒊𝒐𝒏: Demonstrates that the reported interventional advantage mostly comes from extra sample size and a non-standard per-step random-forcing intervention; under proper (do(X_i=c)) interventions the effect nearly vanishes, while the residual appears even more strongly in classical bivariate Granger, indicating method-agnostic target-corruption robustness rather than learned causal discovery.

Prediction Bottlenecks Don't Discover Causal Structure (But Here's What They Actually Do)

Paper page - Prediction Bottlenecks Don’t Discover Causal Structure (But Here’s What They Actually Do)

Similar Articles

The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

Score-Based Causal Discovery of Latent Variable Causal Models

A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Getting good predictions without data cleaning (Why "Garbage In, Garbage Out" is sometimes a trap)

Submit Feedback

Similar Articles

The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

Score-Based Causal Discovery of Latent Variable Causal Models

A Geometric View of Counterfactual Behavior: Interaction of Boundary Proximity and Local Support

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Getting good predictions without data cleaning (Why "Garbage In, Garbage Out" is sometimes a trap)