PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models
Summary
The paper presents Prometheus, a framework that uses large language models to extract local causal claims from text and organizes them into navigable causal atlases, enabling deep causal research across diverse domains.
View Cached Full Text
Cached at: 05/14/26, 06:14 AM
# Automating Deep Causal Research Integrating Text, Data, and Scientific ModelsDraft under revision.
Source: [https://arxiv.org/html/2605.12835](https://arxiv.org/html/2605.12835)
## Prometheus: Automating Deep Causal Research Integrating Text, Data, and Scientific Models††thanks:Draft under revision\.
Sridhar Mahadevan Adobe Research and University of Massachusetts, Amherst smahadev@adobe\.com, mahadeva@umass\.edu
###### Abstract
Large language models can extract local causal claims from text, but those claims become more useful when organized as persistent, navigable world models rather than as flat summaries\. We introducePrometheus, a framework that turns retrieved literature, filings, reviews, reports, agent traces, source data, code, simulations, and scientific models into*causal atlases*: sheaf\-like families of local causal predictive\-state models over an explicit cover of a research substrate\. Each local region contains causal episodes, structured claim tables, predictive tests, support statistics, and provenance; restriction maps compare overlapping regions; gluing diagnostics expose agreement, drift, contradiction, and underdetermination\. The resulting Topos World Model is not a single universal graph\. It is a research instrument for navigating what a corpus says, where it says it, how strongly it is supported, and where local claims fail to assemble into a coherent global view\. We describe thePrometheuspipeline, formalize causal episodes and local predictive\-state sheaves, present the Claims Atlas interface, and propose an evaluation program centered on coverage, drift visibility, provenance, support aggregation, expert navigation time, and rerun consistency\. Three literature\-atlas case studies—ocean\-temperature impacts on marine populations, GLP\-1 weight\-loss evidence, and resveratrol/red\-wine health\-benefit claims—illustrate deep causal research from text with explicit locality, evidence, persistent state, and gluing tension\. Four grounded\-counterfactual case studies—a Nature Climate Change microplastics forcing paper, an Indus Valley hydrology paper with VIC\-derived figure data and model code, and the canonical Sachs protein\-signaling study with single\-cell perturbation data, and a Nature singing\-mouse study with MAPseq projection matrices—show a stronger mode: when a paper ships source data, simulation outputs, or code,Prometheuscan evaluate a counterfactual against that scientific substrate and then rebuild the sheaf world model around the measured intervention result\.
*K*eywordsCausal Discovery⋅\\cdotLarge Language Models⋅\\cdotPredictive State Representations⋅\\cdotSheaves⋅\\cdotTopos World Models
## 1Introduction
Large language models are now capable summarizers of scientific papers, financial filings, product reviews, and operational records\. They can retrieve relevant passages, restate conclusions, and extract many local causal claims\. For deep research, however, this is not enough\. A researcher often needs to know which causal claims recur across a corpus, which claims are regime\-specific, which apparent disagreements are due to different populations or measurements, and where a literature has enough support to justify a follow\-up query, experiment, or decision\. Ordinary summaries are too flat for this task\.
Prometheusstarts from the thesis that language\-derived causal knowledge should be represented as a family of local predictive models over a corpus\. A paper, section, time window, population, product\-use context, or regulatory workflow can each induce a local model\. These local models overlap\. When their shared predictions and causal claims agree, they may glue into larger coherent regions\. When they do not, the disagreement is not noise to be averaged away; it is a research signal\. It may indicate drift, confounding, incompatible measurement, source\-quality variation, or a genuine regime boundary\.
We call the resulting object a*Topos World Model*\. Operationally, it is a sheaf\-like causal atlas: local predictive\-state representations indexed by contexts, connected by restriction maps, and annotated with support, gluing tension, drift, and provenance\. The atlas is designed for use by a human or agentic researcher who wants to ask: What is the main causal spine of this literature? Which regions support it? Which contexts break it? Which passages and tables justify a local claim? What changes between two retrieval runs or two time periods?
This paper positionsPrometheusas a research instrument rather than as a claim that a single benchmark score captures the value of the system\. The intended contribution is causal extraction plus topological organization plus navigable evidence\. The resulting object is not merely a knowledge graph, not merely a retrieval\-augmented summary, and not merely a structural causal model\. It is a finite, inspectable approximation to a sheaf of local causal predictive states over text\.
#### Contributions\.
This paper makes six contributions\.
1. 1\.We introducePrometheus, a language\-to\-Topos\-World\-Model pipeline for deep causal research from text\.
2. 2\.We extend the earlierDemocritusline of causal claim extraction and cSQL\-style causal tables into local causal predictive\-state models with covers, restrictions, gluing diagnostics, persistent states, and causal atlases\.
3. 3\.We formalize causal events, episodes, contexts, covers, restrictions, local predictive\-state tables, and gluing tensions in a finite sheaf\-theoretic setting suitable for implementation\.
4. 4\.We describe the Claims Atlas, an interface object that organizes a corpus by causal spine, local regions, support, drift, regime tension, and provenance drill\-downs\.
5. 5\.We propose an evaluation program appropriate to causal research instruments: not only extraction accuracy, but also coverage, drift visibility, support aggregation, provenance quality, expert navigation time, and consistency across reruns\.
6. 6\.We demonstrate grounded counterfactual layers in four scientific domains: a Nature Climate Change microplastics paper whose source tables support an optical\-forcing intervention, and an Indus Valley hydrology paper whose VIC\-derived figure data and model code support a drought\-restoration intervention, and the Sachs protein\-signaling benchmark whose single\-cell environment panel supports an experimental\-regime substitution, and a Nature singing\-mouse study whose MAPseq projection matrices support a species\-level projection\-attenuation intervention\. In each case, modified causal observations are used to rebuild the sheaf world model\.
## 2Related Work
Retrieval\-augmented generation improves factual grounding by conditioning answers on retrieved passages\(Lewis et al\.,[2020](https://arxiv.org/html/2605.12835#bib.bib11)\)\. Causal extraction pipelines go further by identifying cause\-effect statements in text\(Girju,[2003](https://arxiv.org/html/2605.12835#bib.bib2); Hendrickx et al\.,[2010](https://arxiv.org/html/2605.12835#bib.bib5)\)\. Yet both approaches often collapse the corpus into an answer\. This is precisely what a deep researcher cannot afford to lose\.
Consider a corpus on ocean warming and fish populations\. One region of the literature may emphasize thermal stress and migration\. Another may focus on food\-web disruption\. A third may show adaptation or local resilience in particular species\. A flat answer such as “warming reduces fish populations” is directionally useful but structurally poor\. It hides the population, location, temperature range, time scale, measurement protocol, and ecological mediators that decide whether the claim should be transported to a new case\.
The same issue appears in product reviews\. A shoe can be comfortable for short runs but painful for long mileage; waterproof in light rain but poor after repeated washing; well\-rated overall but return\-prone in a narrow sizing regime\. In SEC or operational workflows, a filing may describe investment, optimization, supply\-chain risk, regulatory exposure, and expected margin effects, but each relationship may only be valid under specific market and time\-window assumptions\.
Prometheustherefore treats each text\-derived claim as local\. It asks where the claim lives, which neighboring contexts it overlaps, which tests it predicts, and whether its restrictions agree with nearby local models\.
#### Causal relation extraction from text\.
There is a long line of work on identifying causal relations in natural language, from cue\-phrase and pattern\-based systems to neural classifiers; see surveys of causal relation extraction and event causality identification\(Yang et al\.,[2022](https://arxiv.org/html/2605.12835#bib.bib25); He et al\.,[2023](https://arxiv.org/html/2605.12835#bib.bib4)\)\. Classical systems usually predict whether a pair of spans or events in a sentence stands in a causal relation, while corpus\-scale work connects such local predictions to event forecasting or explanatory retrieval\(Radinsky et al\.,[2012](https://arxiv.org/html/2605.12835#bib.bib20)\)\.Prometheususes such extracted relations as evidence units, but the paper’s object of study is downstream: how thousands of local claims should be localized, compared, transported, or blocked across corpus regions\.
#### Causal knowledge bases and graphs from corpora\.
Causal knowledge\-base projects mine cause–effect tuples from large corpora and aggregate them into graph\-structured resources\(Hassanzadeh et al\.,[2020](https://arxiv.org/html/2605.12835#bib.bib3)\)\. This graph\-building perspective is close to the firstDemocrituscontribution, where LLM\-generated causal statements are compiled into local causal models and larger causal atlases\(Mahadevan,[2025a](https://arxiv.org/html/2605.12835#bib.bib15)\)\.Prometheusextends that line by treating local graphs and cSQL rows as observations for local causal PSRs\. The global object is therefore not one merged graph but a sheaf\-like family of charts whose overlaps reveal agreement, drift, contradiction, and underdetermination\.
#### LLMs for causal discovery and reasoning\.
A growing literature asks whether LLMs can propose causal directions, graph structures, interventions, or counterfactual explanations from variable descriptions and textual context\(Kıcıman et al\.,[2024](https://arxiv.org/html/2605.12835#bib.bib9); Le et al\.,[2024](https://arxiv.org/html/2605.12835#bib.bib10)\)\.Prometheusis deliberately more conservative\. It does not treat the LLM as an oracle for ground\-truth causal discovery\. Instead, the LLM helps surface causal discourse: claims, mechanisms, modifiers, regimes, and source passages that can be normalized, audited, and compared\. Local intervention probes in the atlas are therefore model\-internal research tests unless paired with external data and identification assumptions\.
#### Agentic systems for automated scientific discovery\.
Recent systems also aim to automate larger portions of the scientific workflow\. The AI Scientist\-v2, for example, uses agentic tree search to propose hypotheses, design and execute machine\-learning experiments, analyze and visualize results, and write scientific manuscripts\(Yamada et al\.,[2025](https://arxiv.org/html/2605.12835#bib.bib24)\)\. This line of work is close in ambition toPrometheus: both ask how AI systems can participate in scientific discovery rather than merely answer questions about existing papers\. The emphasis is different\. AI Scientist\-v2 organizes autonomous experimentation and manuscript generation, primarily in machine\-learning research settings\.Prometheusinstead constructs an explicit causal topos world model from heterogeneous research artifacts—text, data, figures, source code, and scientific models—so that local claims, gluing failures, evidentiary limits, and grounded counterfactual revisions remain inspectable\. In this sense,Prometheuscan be viewed as a complementary world\-model layer for scientific agents: it records what a research substrate supports, where it does not glue, and which counterfactuals can actually be evaluated\.
#### Causality\-aware NLP\.
More broadly, causal ideas have been used to study text effects, counterfactual augmentation, representation robustness, and explanations for NLP systems\(Jin et al\.,[2021](https://arxiv.org/html/2605.12835#bib.bib8)\)\.Prometheuspoints in the opposite direction: it uses NLP and LLM extraction to construct explicit causal artifacts for human research\. The Claims Atlas is meant to be inspected, corrected, extended, and rerun, so provenance and gluing failures are part of the output rather than post\-hoc debugging aids\.
## 3FromDemocritustoPrometheus
Democritusis the predecessor pipeline: a language\-to\-causal\-model system for compiling documents into local causal models, causal databases, and interactive diagnostic artifacts\(Mahadevan,[2025a](https://arxiv.org/html/2605.12835#bib.bib15)\)\. A public implementation of the releasedDemocritusclient is available asDemocritus\_OpenAI\(Mahadevan,[2025d](https://arxiv.org/html/2605.12835#bib.bib18)\)\. The broader categorical machine learning background for this line of work is developed inMahadevan \([2025c](https://arxiv.org/html/2605.12835#bib.bib17)\)\. It extracts local causal claims, organizes them into causal triples or local causal models, and stores structured outputs in cSQL\-like causal tables\. This is already useful: it turns unstructured text into queryable causal objects\.
Prometheuschanges the central object\. Instead of treating a local DAG or causal table as the final representation,Prometheustreats it as evidence for a local predictive\-state model\. A local model records not only thatXXcauses or influencesYY, but which histories and tests are present, what the model predicts under those tests, how much support each cell has, and where the evidence came from\. The global object is not one merged DAG\. It is a sheaf\-like family of local models, equipped with restriction and gluing diagnostics\.
Table 1:Prometheusinherits the extraction discipline ofDemocritusbut shifts the representation from graph\-centered synthesis to predictive\-state sheaves\.This shift matters because deep research is rarely about finding one graph\. It is about understanding which local graphs, claims, and predictions are transportable\. Topological organization gives the system a way to say: these regions agree, these overlap but pull apart, and this claim should not be moved without additional evidence\.
## 4Which Parts of theDemocritusPipeline Are Reused?
The originalDemocritusarXiv paper described a six\-module pipeline for constructing large causal models from language\(Mahadevan,[2025a](https://arxiv.org/html/2605.12835#bib.bib15)\)\.Prometheusshould be read as a continuation of that pipeline, but not as a simple rebranding of it\. The first four modules still provide the extraction discipline: they turn a seed research domain or corpus into topics, causal questions, causal statements, and typed relational triples\. The new contribution begins when those triples and cSQL rows are no longer treated as the terminal graph artifact\. Instead, they become observations from whichPrometheusconstructs local causal predictive\-state models indexed by an explicit cover\.
Table 2:HowPrometheusreuses and extends the six\-moduleDemocrituspipeline\. Modules 1–4 supply causal extraction and normalization; the new Topos World Model construction begins when extracted claims are assembled into local predictive states over a cover\.This mapping clarifies the boundary between extraction and world modeling\. InDemocritus, Module 4 produces a relational graph, Module 5 geometrically organizes that graph, and Module 6 stores the result as a topos slice\. InPrometheus, the graph is an intermediate observation layer\. A triple such as
\(food limitation,reduces,thermal tolerance\)\(\\text\{food limitation\},\\ \\texttt\{reduces\},\\ \\text\{thermal tolerance\}\)is not merely an edge in a global graph\. It is assigned to one or more local contexts, connected to evidence units, folded into causal episodes, and used to populate predictive tests such as
food limitation→thermal stress⇒larval\-survival change\.\\text\{food limitation\}\\to\\text\{thermal stress\}\\Rightarrow\\text\{larval\-survival change\}\.The output is therefore a table\-valued local section with support and provenance, not only a point or edge in an embedding\.
#### Stage A: inherited causal extraction\.
The inheritedDemocritusside ofPrometheusperforms topic or retrieval expansion, causal\-question generation, causal\-statement extraction, and typed triple normalization\. In open\-ended exploratory runs, topic expansion still acts as a breadth\-first search over a domain\. In artifact\-backed runs such as the ocean\-temperature case study, the retrieval layer supplies the primary corpus and the Democritus\-style modules operate on acquired documents, paragraphs, tables, abstracts, and metadata\. Either way, this stage produces the familiar claim substrate: causal rows, local graphs, relation types, modifiers, confidence scores, and provenance links\.
#### Stage B: context and episode construction\.
The firstPrometheus\-specific step is to stop treating all extracted triples as commensurable\. Each claim is assigned to one or more contexts: document, subtopic, species group, measurement regime, product\-use stage, company workflow, fiscal year, or agent role\. Claims are then assembled into causal episodes
h=\(e1,…,ek\),h=\(e\_\{1\},\\ldots,e\_\{k\}\),where each eventeie\_\{i\}carries an action or condition, an observation or outcome, time or ordering information when available, and provenance\. This is the point at which a causal graph becomes a history/test object suitable for predictive\-state estimation\.
#### Stage C: local predictive\-state induction\.
For each contextUU,Prometheusenumerates supported testsτ\\tau, estimatesMU\[h,τ\]M\_\{U\}\[h,\\tau\], and stores support and uncertainty\. The claim table is still present, but it is now embedded inside a local causal PSR\. The practical effect is that a researcher can ask not only whether a claim was extracted, but which future continuations it supports, how strongly it is supported in a local regime, and whether neighboring regimes agree\.
#### Stage D: topos world\-model construction\.
The new topos layer begins with a cover of contexts and a family of local PSR sections\. Restriction maps align shared histories, tests, claims, and provenance across overlaps\. Gluing diagnostics then determine whether local sections assemble into a larger section or whether their disagreement should be recorded as contradiction, drift, regime dependence, or underdetermination\. Persistent state makes this object cumulative: later runs can add evidence, compare against earlier sections, and report whether a tension was repaired or made sharper\.
Thus,Prometheusis best understood as:
Prometheus=Democritus extraction\+context\-indexed causal PSRs\\displaystyle=\\text\{Democritus extraction\}\+\\text\{context\-indexed causal PSRs\}\+restriction/gluing diagnostics\+persistent causal atlas\.\\displaystyle\\quad\+\\text\{restriction/gluing diagnostics\}\+\\text\{persistent causal atlas\}\.The inherited modules supply breadth and causal normalization\. The new modules supply locality, predictive state, transport tests, and explicit certificates of non\-gluing\.
## 5ThePrometheusPipeline
Figure[1](https://arxiv.org/html/2605.12835#S5.F1)shows the pipeline\.Prometheusbegins with a research question or seed corpus\. A retrieval and acquisition layer gathers documents, sections, passages, tables, and metadata\. Extraction converts these units into causal events, episodes, claims, and cSQL\-style rows with provenance\. A cover constructor builds local contexts such as topic regions, document clusters, time windows, populations, product\-use situations, regulatory stages, or agent roles\. Each context receives a local causal predictive\-state representation\. Restriction maps compare overlaps, gluing diagnostics expose tensions, and the Claims Atlas renders the resulting world model for research navigation\.
Research question and retrieved corpusDocument acquisition: papers, filings, reviews, tables, reportsClaim extraction: causal events, episodes, cSQL rows, provenanceContext cover: topics, documents, populations, time windows, workflows, regimesLocal causal PSRs: histories, tests, predictions, support, uncertaintyRestrictions and gluing diagnostics: agreement, drift, contradiction, underdeterminationClaims Atlas: causal spine, local regions, tensions, provenance drill\-downsFigure 1:Prometheusturns a corpus into a navigable causal atlas\.#### Evidence units\.
An evidence unit may be a paper, paragraph, table row, review, filing section, transcript segment, benchmark report, or agent trace\. Each extracted object retains source identity, offsets or page references when available, retrieval metadata, model prompts, and extraction confidence\.
#### LLM backend and usage accounting\.
The current artifact runs use an OpenAI API\-based LLM backend for extraction, normalization, and local synthesis calls\.Prometheusrecords request counts, token counts, and estimated API cost as run metadata, so the resulting world model can be audited not only for evidence and provenance but also for computational budget\. These numbers are implementation\-dependent rather than theoretical properties of the framework, but they are useful for reproducing the scale of a run and comparing alternative extraction strategies\.
#### Causal episodes\.
Prometheusrepresents text\-derived causal structure as episodes:
h=\(e1,…,ek\),ei=\(actor,action/condition,observation,ti,pi\),h=\(e\_\{1\},\\ldots,e\_\{k\}\),\\qquad e\_\{i\}=\(\\text\{actor\},\\text\{action/condition\},\\text\{observation\},t\_\{i\},p\_\{i\}\),wherepip\_\{i\}is provenance\. Episodes can encode observational relations, declared interventions, temporal order, and qualitative outcomes\. They are converted into histories and tests for local predictive\-state estimation\.
#### cSQL and predictive state\.
cSQL tables store normalized causal rows: cause, effect, mediator, modifier, polarity, strength, context, and provenance\.Prometheuskeeps these rows, but uses them to populate predictive\-state tables\. A row such as “thermal stress reduces juvenile survival in speciesssunder regionrr” becomes a claim, a context assignment, and a set of tests concerning survival, migration, and population change under nearby histories\.
#### Persistent state\.
APrometheusrun emits a durable world\-model artifact\. Follow\-up runs can be conditioned on a previous state and compared against it\. This allows the system to report whether new evidence stabilized a region, introduced drift, repaired a gluing tension, or opened a new local context\.
## 6From Text to Local Causal PSRs
We now describe the predictive\-state construction used byPrometheus\. Classical predictive state representations model a controlled process through observable predictions rather than latent variables\(Littman et al\.,[2001](https://arxiv.org/html/2605.12835#bib.bib12); Singh et al\.,[2004](https://arxiv.org/html/2605.12835#bib.bib22)\)\. A controlled history is a finite sequence
h=\(a1,o1\)⋯\(at,ot\),h=\(a\_\{1\},o\_\{1\}\)\\cdots\(a\_\{t\},o\_\{t\}\),with actionsai∈𝒜a\_\{i\}\\in\\mathcal\{A\}and observationsoi∈𝒪o\_\{i\}\\in\\mathcal\{O\}\. A future testτ\\tauis an action–observation continuation, and the state athhis the vector of probabilitiesPr\(τ∣h\)\\Pr\(\\tau\\mid h\)over a selected family of tests\. Spectral PSR learning organizes these quantities into controlled Hankel matrices
Hp,τ=P\(pτ\),H\_\{p,\\tau\}=P\(p\\tau\),whose rows are history prefixes and whose columns are suffix tests\. Under a finite\-rank assumption, a truncated factorization ofHHyields observable operatorsWa,oW\_\{a,o\}and a finite predictive realization\.
Prometheuskeeps this observable\-prediction viewpoint but modifies the learning recipe for text\. Literature and filing corpora do not give one clean controlled trajectory with complete action–observation pairs\. They give many short, heterogeneous fragments: claims, mechanisms, plans, measurements, caveats, and outcomes\. A direct Hankel/SVD pipeline would therefore be brittle\. The implemented estimator instead constructs a compressed, context\-indexed Hankel family from language\-derived episodes\.
Algorithm 1Language\-to\-local\-PSR construction1:Retrieved corpus
DD, extracted evidence units
EE, context cover
𝒰\\mathcal\{U\}
2:Local PSRs
\{𝒫\(U\)\}U∈𝒰\\\{\\mathcal\{P\}\(U\)\\\}\_\{U\\in\\mathcal\{U\}\}, restrictions, gluing diagnostics
3:Extract causal events and normalized cSQL rows with provenance\.
4:Assemble events into episodes
h=\(e1,…,ek\)h=\(e\_\{1\},\\ldots,e\_\{k\}\)\.
5:Assign each event and episode to one or more local contexts
UU\.
6:Enumerate supported compressed tests
τ\\tau, e\.g\.
a1→⋯→ak⇒oa\_\{1\}\\to\\cdots\\to a\_\{k\}\\Rightarrow o, cause–effect continuations, or mechanism–outcome motifs\.
7:Estimate smoothed local probabilities
p^U\(τ\)\\hat\{p\}\_\{U\}\(\\tau\)from support in
UU, neighboring contexts, and the full corpus\.
8:Form local PSR tables
MU\[h,τ\]M\_\{U\}\[h,\\tau\]together with support, uncertainty, and provenance\.
9:Build restriction maps by aligning shared histories, tests, claims, and provenance across overlaps\.
10:Record gluing diagnostics from mismatch on shared cells or projected local sections\.
In the ocean\-temperature artifact, tests are not product\-review sentiment tests; they are causal and mechanistic continuations such as thermal stress leading to larval survival changes, food limitation reducing thermal tolerance, subpolar gyre weakening affecting heat\-wave regimes, or sea\-grass habitat suitability shifting under warming\. A local contextUUstores a table
MU\[h,τ\]≈p^U\(τ∣h\),M\_\{U\}\[h,\\tau\]\\approx\\hat\{p\}\_\{U\}\(\\tau\\mid h\),plus support counts and provenance links\. When evidence is too sparse for a literal conditional probability,MUM\_\{U\}should be read as a normalized predictive support score\. This is why the HTML artifact calls the result a*finite local predictive\-state family*: it is an inspectable, language\-adapted PSR object, not an opaque embedding\.
Restrictions and gluing are computed directly on overlapping test signatures\. If two contexts share histories or tests,Prometheuscompares the corresponding cells and reports mean and maximum gaps\. Compatible overlaps can be summarized as larger sections; incompatible overlaps become obstruction data\. This is the practical compromise made throughout this implementation: exact spectral operator recovery is deferred, while the current system exposes the objects needed for prediction, intervention\-style comparison, and multi\-context consistency\.
## 7Formal Model
We give a finite operational formalization, drawing on sheaf and topos semantics\(Mac Lane and Moerdijk,[1992](https://arxiv.org/html/2605.12835#bib.bib14); Abramsky and Brandenburger,[2011](https://arxiv.org/html/2605.12835#bib.bib1)\)and the categorical machine learning perspective developed inMahadevan \([2025c](https://arxiv.org/html/2605.12835#bib.bib17)\)\. The goal is not to claim that the implementation realizes all of topos theory, but to make the modeling contract precise enough to support inspection, testing, and extension\.
#### Predictive contexts and covers\.
The basic modeling choice is to treat world modeling from text as a local problem\. A context is not merely a document identifier; it is a region in which the extracted claims are expected to have comparable semantics\. In the three case studies, contexts include literature subtopics and ecological regimes, filing workflow stages, product\-use stages, and agent roles\. Covers encode the comparability relation used for aggregation: two documents may cover a literature theme, four workflow slices may cover a company\-year, and a set of sector\-matched company\-years may cover a fiscal\-year regime\.
###### Definition 7\.1\(Context site\)\.
Let𝒞\\mathcal\{C\}be a finite category of corpus contexts\. ObjectsU∈𝒞U\\in\\mathcal\{C\}are local regions such as documents, topics, time windows, populations, product\-use situations, workflow stages, or agent roles\. A morphismV→UV\\to Urepresents inclusion, overlap, refinement, projection, or a declared translation\. A cover\{Ui→U\}i∈I\\\{U\_\{i\}\\to U\\\}\_\{i\\in I\}declares that the familyUiU\_\{i\}coversUUfor a specified modeling purpose\.
In the ideal set\-valued case, a model over𝒞\\mathcal\{C\}is an object of the presheaf topos\[𝒞op,𝐒𝐞𝐭\]\[\\mathcal\{C\}^\{op\},\\mathbf\{Set\}\]\. In the numerical implementation, the values are finite tables, vectors, and metadata records, so it is often more accurate to view the object as a presheaf into a finite data category or a vector\-valued functor category\. The set\-valued topos supplies the semantics of locality and gluing; the finite tables supply the artifact that users can inspect\.
###### Definition 7\.2\(Local causal predictive\-state representation\)\.
For each contextUU,Prometheusassigns a local object
𝒫\(U\)=\(HU,TU,MU,SU,ΠU,DU\)\.\\mathcal\{P\}\(U\)=\(H\_\{U\},T\_\{U\},M\_\{U\},S\_\{U\},\\Pi\_\{U\},D\_\{U\}\)\.HereHUH\_\{U\}is a finite set of histories,TUT\_\{U\}is a finite set of tests,MU\[h,t\]M\_\{U\}\[h,t\]is a prediction or score for testttafter historyhh,SU\[h,t\]S\_\{U\}\[h,t\]records support,ΠU\[h,t\]\\Pi\_\{U\}\[h,t\]records provenance, andDUD\_\{U\}stores diagnostics such as sparsity, uncertainty, extraction confidence, and local rank\.
This definition is inspired by predictive state representations\(Littman et al\.,[2001](https://arxiv.org/html/2605.12835#bib.bib12); Singh et al\.,[2004](https://arxiv.org/html/2605.12835#bib.bib22)\)\. The difference is thatPrometheuslearns or estimates many local PSRs from text and then studies their compatibility over a context cover\.
#### Connection to classical PSR learning\.
Classical spectral PSR learning starts with controlled histories
h=\(a1,o1\)⋯\(at,ot\)h=\(a\_\{1\},o\_\{1\}\)\\cdots\(a\_\{t\},o\_\{t\}\)and future testsτ\\tauconsisting of action–observation continuations\. The central empirical object is a controlled Hankel matrix
Hp,τ=P\(pτ\),H\_\{p,\\tau\}=P\(p\\tau\),whereppranges over prefixes andτ\\tauover suffix tests\. Under a finite\-rank assumption, a truncated factorization ofHHyields a finite predictive realization with observable operatorsWa,oW\_\{a,o\}\.Prometheususes this as a reference model rather than as a literal estimator\. Text corpora rarely provide one synchronized controlled trajectory with complete action–observation pairs; they provide heterogeneous fragments with missing intermediate observations, uneven support, and context\-dependent meanings\.
The implemented estimator therefore replaces one global Hankel matrix by a family of compressed local tables\. For a contextUU, a compressed test has the schematic form
τ=a1→⋯→ak⇒o,\\tau=a\_\{1\}\\to\\cdots\\to a\_\{k\}\\Rightarrow o,whereaia\_\{i\}may be a normalized action, condition, mechanism, or causal motif, andoois a downstream observation or outcome label\. In the ocean case, tests include mechanism–outcome continuations such as warming→\\tothermal stress⇒\\Rightarrowlarval\-survival change; in filing and review cases they correspond to workflow and usage continuations\.
#### Local estimation\.
LetnU\(h,τ\)n\_\{U\}\(h,\\tau\)denote the support for testτ\\tauafter historyhhin contextUU, and letNU\(h\)N\_\{U\}\(h\)be the supported mass for histories comparable tohh\. The simplest local estimate is a smoothed frequency
p^U\(τ∣h\)=nU\(h,τ\)\+αp^0\(τ\)NU\(h\)\+α,\\hat\{p\}\_\{U\}\(\\tau\\mid h\)=\\frac\{n\_\{U\}\(h,\\tau\)\+\\alpha\\hat\{p\}\_\{0\}\(\\tau\)\}\{N\_\{U\}\(h\)\+\\alpha\},wherep^0\\hat\{p\}\_\{0\}is a corpus\-, cover\-, or domain\-level backoff distribution\. The concrete pipelines use the same principle with several backoff levels\. For example, in domains with nested context levels, the estimator can blend local, neighboring\-cover, and corpus\-level support:
p^U\(τ\)=wloc\(U,τ\)p^loc\(τ\)\+wnbr\(U,τ\)p^nbr\(τ\)\+wcorp\(U,τ\)p^corp\(τ\),\\hat\{p\}\_\{U\}\(\\tau\)=w\_\{\\mathrm\{loc\}\}\(U,\\tau\)\\hat\{p\}\_\{\\mathrm\{loc\}\}\(\\tau\)\+w\_\{\\mathrm\{nbr\}\}\(U,\\tau\)\\hat\{p\}\_\{\\mathrm\{nbr\}\}\(\\tau\)\+w\_\{\\mathrm\{corp\}\}\(U,\\tau\)\\hat\{p\}\_\{\\mathrm\{corp\}\}\(\\tau\),with weights determined by available support\. The ocean\-temperature artifact uses analogous local, neighboring\-context, and corpus backoff\. Each table cell therefore carries both a predictive value and the evidential basis needed to audit it: counts, smoothing/backoff level, extraction confidence, and provenance\.
###### Definition 7\.3\(Restriction map\)\.
For a morphismr:V→Ur:V\\to U, a restriction map
ρUV:𝒫\(U\)→𝒫\(V\)\\rho\_\{UV\}:\\mathcal\{P\}\(U\)\\to\\mathcal\{P\}\(V\)aligns histories, tests, claims, support, and provenance fromUUtoVV\. In the finite implementation,ρUV\\rho\_\{UV\}may be a partial alignment map plus a comparison of shared cells\.
Operationally, restriction has three parts\. First, histories and tests are projected onto the vocabulary shared by the two contexts\. Second, claim and provenance identifiers are aligned so that apparent agreement can be traced back to evidence rather than only to surface labels\. Third, the resulting shared cells are scored for mismatch\. If
ΩUV=\(HU×TU\)∩\(HV×TV\)\\Omega\_\{UV\}=\(H\_\{U\}\\times T\_\{U\}\)\\cap\(H\_\{V\}\\times T\_\{V\}\)is the shared signature, a typical overlap discrepancy is
Δ\(U,V\)=1\|ΩUV\|∑\(h,τ\)∈ΩUVλh,τ\|MU\[h,τ\]−MV\[h,τ\]\|,\\Delta\(U,V\)=\\frac\{1\}\{\|\\Omega\_\{UV\}\|\}\\sum\_\{\(h,\\tau\)\\in\\Omega\_\{UV\}\}\\lambda\_\{h,\\tau\}\\left\|M\_\{U\}\[h,\\tau\]\-M\_\{V\}\[h,\\tau\]\\right\|,whereλh,τ\\lambda\_\{h,\\tau\}downweights unsupported or low\-confidence cells\. The artifact also records maxima and provenance\-level explanations for high\-gap cells, because a small mean can hide a scientifically important contradiction\.
###### Definition 7\.4\(Gluing tension\)\.
For local sectionssi∈𝒫\(Ui\)s\_\{i\}\\in\\mathcal\{P\}\(U\_\{i\}\)over a cover\{Ui→U\}\\\{U\_\{i\}\\to U\\\}, the pairwise gluing tension on overlaps is
τij=wij‖ρUi,Ui∩Uj\(si\)−ρUj,Ui∩Uj\(sj\)‖2,\\tau\_\{ij\}=w\_\{ij\}\\left\\\|\\rho\_\{U\_\{i\},U\_\{i\}\\cap U\_\{j\}\}\(s\_\{i\}\)\-\\rho\_\{U\_\{j\},U\_\{i\}\\cap U\_\{j\}\}\(s\_\{j\}\)\\right\\\|^\{2\},wherewijw\_\{ij\}is an overlap\-confidence or support weight\. The total gluing tension isτ\(\{si\}\)=∑i<jτij\\tau\(\\\{s\_\{i\}\\\}\)=\\sum\_\{i<j\}\\tau\_\{ij\}\.
Low tension indicates that local models agree on shared tests or claims\. High tension is classified by the atlas as contradiction, drift, regime dependence, or underdetermination depending on support, directionality, time, and context metadata\.
#### Approximate sheaf condition\.
Exact sheaf gluing would require compatible local sections to determine a unique global section\.Prometheususes a finite tolerance version: local sections are consideredϵ\\epsilon\-compatible on a cover when their weighted overlap gaps are below a declared tolerance on sufficiently supported shared signatures\. A candidate glued section is then formed only from the compatible cells, for example by support\-weighted aggregation,
MU\[h,τ\]=∑iωi\(h,τ\)MUi\[h,τ\]∑iωi\(h,τ\),M\_\{U\}\[h,\\tau\]=\\frac\{\\sum\_\{i\}\\omega\_\{i\}\(h,\\tau\)M\_\{U\_\{i\}\}\[h,\\tau\]\}\{\\sum\_\{i\}\\omega\_\{i\}\(h,\\tau\)\},whereωi\\omega\_\{i\}combines support, extraction confidence, and relevance to the cover\. Unsupported cells remain local\. Incompatible cells are not averaged away; they become obstruction records with source passages and context metadata\.
###### Proposition 7\.5\(Operational sheaf condition\)\.
If all local sections over a declared cover have compatible restrictions on overlaps and the support of the shared cells exceeds a user\-specified threshold, thenPrometheusmay construct a glued section overUU\. If compatibility fails,Prometheusdoes not force a global merge; it records the obstruction as a gluing diagnostic\.
#### Two\-stage gluing\.
The same formalism can be iterated across levels of analysis\. In the filing experiments, the local workflow slices
xC,yops,xC,ymkt,xC,yfin,xC,yinnx\_\{C,y\}^\{\\mathrm\{ops\}\},\\quad x\_\{C,y\}^\{\\mathrm\{mkt\}\},\\quad x\_\{C,y\}^\{\\mathrm\{fin\}\},\\quad x\_\{C,y\}^\{\\mathrm\{inn\}\}may first glue into a company\-year sectionsC,ys\_\{C,y\}\. These company\-year sections can then be compared over a second cover, such as sector\-year or temporal\-neighborhood covers\. The important point is that there is no single unconditional global average over all firms, papers, reviews, or agents\. Globality is always relative to a declared cover, and non\-gluing at a coarser cover is a meaningful result\.
#### Localized interventions\.
Prometheustreats interventions as local tests\. Ajj\-do querydoj\(X=x\)\\operatorname\{do\}\_\{j\}\(X=x\)modifies histories or tests inside a contextUUand asks how the local predictive\-state table changes under comparable covers\. The result is not automatically an identified causal effect in Pearl’s sense\(Pearl,[2009](https://arxiv.org/html/2605.12835#bib.bib19)\); it is an intervention\-conditioned probe of the language\-derived world model\.Prometheusreports the support and provenance behind the probe rather than presenting it as a source\-free causal estimate\.
More explicitly, let
j\(U\)=\{ui:Ui→U\}j\(U\)=\\\{u\_\{i\}:U\_\{i\}\\to U\\\}be a cover of contexts considered comparable for the query, and let
IUia:𝒫\(Ui\)→𝒫do\(a\)\(Ui\)I^\{a\}\_\{U\_\{i\}\}:\\mathcal\{P\}\(U\_\{i\}\)\\to\\mathcal\{P\}^\{\\operatorname\{do\}\(a\)\}\(U\_\{i\}\)be a local intervention map that edits a test, fixes an action, inserts a repair step, or conditions on an explicitly declared regime\. Thejj\-localized intervention state is computed by restriction, local intervention, and aggregation:
doj\(a\)U\(s\)=Aggui:Ui→U∈j\(U\)\(IUia\(ρU,Ui\(s\)\)\)\.\\operatorname\{do\}\_\{j\}\(a\)\_\{U\}\(s\)=\\operatorname\{Agg\}\_\{u\_\{i\}:U\_\{i\}\\to U\\in j\(U\)\}\\left\(I^\{a\}\_\{U\_\{i\}\}\\bigl\(\\rho\_\{U,U\_\{i\}\}\(s\)\\bigr\)\\right\)\.Compatibility is then checked after the intervention\. If the intervened local sections glue, the atlas may report a coherent intervention\-conditioned prediction overUU\. If they do not, the query is only locally supported, and the failed overlaps identify where comparability, measurement, or evidence breaks down\.
## 8The Claims Atlas
The primary user\-facing object is the Claims Atlas\. It is designed to answer research questions that flat summaries obscure\.
#### Main causal spine\.
The atlas extracts recurrent, high\-support causal paths that organize the corpus\. In an ocean\-warming corpus, a spine may include warming, stratification, oxygen loss, prey availability, migration, recruitment, and population change\. In SEC workflows, a spine may include investment, supply\-chain constraints, margin pressure, capital allocation, and realized outcomes\.
#### Local context regions\.
Each spine is decomposed into local regions\. A region may correspond to a species group, geography, time period, document cluster, product aspect, or workflow stage\. Users can enter a region and inspect its local PSR, support, claims, and provenance\.
#### Drift detection\.
When local models change across time, retrieval runs, or document strata,Prometheusreports drift\. Drift can be textual, causal, predictive, or topological: the support distribution changes, a causal polarity changes, a test prediction changes, or the overlap graph itself changes\.
#### Regime tensions\.
The atlas highlights where local models resist gluing\. Some tensions are contradictions; others are legitimate regime boundaries\. The interface should make this distinction visible by exposing modifiers, populations, measurement protocols, and source provenance\.
#### Provenance drill\-downs\.
Every atlas claim points back to evidence units\. A user can inspect source passages, extracted rows, normalized claims, support counts, and neighboring contexts\. Provenance is not decorative metadata; it is the mechanism by which the atlas remains corrigible\.
## 9Ocean\-Temperature Artifact Case Study
We use aPrometheusGUI run on the query “analyze 10 recent studies of the impact of rising ocean temperatures on fish populations” as the paper’s concrete artifact case study\. The run retrieved and acquired eleven documents because the acquisition layer retained an additional closely related study\. The corpus includes studies on marine fish endoparasites, larval thermal tolerance, the North Atlantic heat wave, Argo temperature artifacts, kuruma shrimp aquaculture, global fisheries economics, coral mortality, sea\-grass carbon storage, mixotrophic phytoplankton, and*Vibrio vulnificus*\. This is a good stress test forPrometheusbecause the query is nominally about fish populations, but the retrieved literature naturally fans out into ecological, measurement, aquaculture, microbial, economic, and carbon\-cycle regimes\.
The run produced3,0653\{,\}065extracted events,1111causal episodes,199199local contexts,199199local PSRs,199199sheaf objects,198198restriction and gluing diagnostics,160160compatible restrictions,194194compatible gluing overlaps, and44tense gluing overlaps\. The LLM backend made4,3834\{,\}383requests using2,361,7492\{,\}361\{,\}749total tokens, with an estimated API cost of about$1\.24\\mathdollar 1\.24\. The most frequent causal relations in the extracted atlas were*leads to*\(1,1521\{,\}152\),*reduces*\(577577\),*increases*\(466466\),*influences*\(441441\),*causes*\(214214\), and*affects*\(214214\)\. Prominent high\-support local contexts included larval survival under food scarcity, subpolar gyre weakening, rising ocean surface temperatures, fisheries output and secondary activities, mixotrophic metabolic evolution, zooxanthellae density fluctuations, parasite transmission, and sea\-grass habitat suitability\.
Table 3:Summary statistics for the ocean\-temperaturePrometheusartifact\.#### PrometheusPSR bundle\.
The PSR bundle is the computational layer beneath the Claims Atlas\. It exposes the finite predictive\-state family rather than only the extracted causal rows: the corpus\-level PSR has rank2,7842\{,\}784,2,8962\{,\}896histories, and2,9032\{,\}903tests, while the local family contains199199local PSRs\. The bundle reports160160compatible restriction arrows out of198198root restriction checks, mean gluing loss0\.01790\.0179, no learned non\-root overlap edges, and no attachedjj\-do probes for this particular run\. The absence ofjj\-do probes is itself informative: this artifact is a literature atlas, so the main signal is not a computed intervention query but the support, drift, and gluing behavior of local causal charts\.
Table[4](https://arxiv.org/html/2605.12835#S9.T4)shows representative local PSRs\. These are the objects a researcher sees before reading source passages\. Each local PSR has a finite set of histories and tests induced by extracted causal observation events\. The restriction columns compare the local chart with the corpus chart; the gluing columns show whether the local latent section is geometrically compatible with the corpus section after projection\.
Table 4:Representative local PSRs from the ocean\-temperature bundle\. “Hist\.” denotes finite histories\. Shared cells and mean gap are corpus\-to\-context restriction diagnostics\. Weighted loss is the gluing diagnostic when reported in the bundle view\. The table shows why the artifact is not a single graph: biological mechanisms, circulation mechanisms, economic consequences, measurement issues, disease pathways, and retrieval drift all become separate local predictive\-state charts\.The focus\-context matrix makes the PSR concrete\. For the persistent focus*rising ocean surface temperatures*, the displayed local table has3030histories by3030tests and rank2424\. Representative histories include surface temperatures affecting evolutionary shifts back toward photosynthesis, causing higher grazing rates that reduce prey abundance, increasing grazing rates that reduce prey abundance, and increasing photosynthetic reliance over heterotrophy\. Representative tests include the same prey\-abundance mechanisms plus a coastal\-ecosystem\-dynamics claim\. Most displayed cells have predictive mass0\.03120\.0312, while two salient transitions have mass0\.09380\.0938: from the history “surface temperatures cause higher grazing rates which reduce prey abundance” to the corresponding grazing\-rate test, and from the history “surface temperatures increase grazing rates which reduce prey abundance” to the coastal\-ecosystem\-dynamics test\. Thus the local PSR does not merely store claim counts; it records which histories make which causal tests more probable inside a local chart\.
The gluing\-backpropagation rows provide examples of what the system treats as stable local sections\. The larval\-survival context has4545overlap sections, confidence0\.99810\.9981, weighted loss0\.0019500\.001950, and maximum section gap0\.0028100\.002810; its representative sections include food scarcity reducing sea urchin larval thermal tolerance, decreasing thermal tolerance, and increasing physiological stress\. The subpolar\-gyre context has4141overlap sections, confidence0\.99660\.9966, weighted loss0\.0034360\.003436, and maximum section gap0\.0044240\.004424; its sections connect reduced Arctic\-water inflow and warm subtropical inflow to elevated North Atlantic temperatures\. The mixotroph context has3333overlap sections, confidence0\.99780\.9978, weighted loss0\.0021630\.002163, and maximum section gap0\.0054920\.005492; it connects greater heterotrophic reliance to higher grazing rates and reduced prey abundance\. The focus context itself has3030overlap sections, confidence0\.99740\.9974, weighted loss0\.0026390\.002639, and maximum section gap0\.0038350\.003835, with sections for reduced prey abundance and higher grazing rates\. These rows are the mechanical reason the persistent state can recommend continuing from the surface\-temperature chart\.
Drift appears in two complementary ways\. First, the Claims Atlas groups off\-query material into drift regions rather than suppressing it\. The human\-health drift region contains160160events over1111contexts, including climate change and maternal health, malaria transmission and pregnancy risks, healthcare access, water insecurity, and food insecurity\. The off\-query climate drift region contains5858events over77contexts, including Antarctic resource claims, treaty\-meeting opposition, and Indian Antarctic research\-base proposals\. Second, the PSR bundle marks narrow restriction or gluing rows as divergent when a local chart has too little overlap or too large a gap to be transported\. Examples include*rising ocean temperatures*itself as a one\-section gluing divergence\(\(weighted loss0\.1108\)0\.1108\), geopolitical tensions over Antarctic territory\(0\.1012\)\(0\.1012\), extreme weather and maternal health\(0\.0915\)\(0\.0915\), and sea\-grass habitat\-suitability shifts\(0\.0824\)\(0\.0824\)\. These rows are useful precisely because they prevent the paper’s example from pretending that broad retrieval produced a clean, single\-topic corpus\.
The Claims Atlas is the paper\-facing compression of this run\. Its first screen does not present a single answer to the query\. Instead, it partitions the extracted claims into named causal regions, identifies the recurrent causal spine, and marks the places where the run drifted away from the requested fish\-population question\. The displayed atlas contains1111documents,3,0653\{,\}065claims,200200displayed local contexts including the corpus\-level view,152152regime surfaces,44tense gluing overlaps, and mean glue loss0\.01790\.0179\. The main atlas regions are summarized in Table[5](https://arxiv.org/html/2605.12835#S9.T5)\.
Table 5:Claims Atlas partition for the ocean\-temperature run\. The atlas makes both central evidence and retrieval drift visible: most events sit in the core query spine, but sizable neighboring, measurement, economic, and off\-query regions are preserved rather than averaged into one global summary\.The highest\-support spine claims show how the atlas differs from a generic summary\. The most supported claim family is a larval\-temperature mechanism: food limitation reduces the thermal tolerance of purple sea urchin larvae by lowering their capacity to maintain membrane fluidity and survive heat exposure\(45\(45extracted claims,88regime aliases\)\. A second family connects the weakening of the subpolar gyre to increased warm subtropical inflow, elevated North Atlantic temperatures, and changes in long\-term ocean\-temperature records\(26\(26claims,88aliases\)\. Other spine families connect rising surface temperatures to mixotrophic phytoplankton trade\-offs, increased grazing pressure, lower prey abundance, endoparasite exposure through host feeding behavior, and fisheries value\-chain effects\. These are not merged into a single “warming harms fish” proposition\. They are kept as local claim families with documents, surface variants, regime aliases, and evidence statements attached\.
The atlas also exposes the run’s regime tensions\. One obstruction involves the claim that reduced larval survival lowers recruitment and affects kelp\-forest productivity: this link appears across kelp\-forest and marine\-heatwave regimes with competing*affects*and*reduces*surfaces\. Several regime\-sensitive rows involve host feeding behavior and endoparasite diversity, where the same subject\-object pair appears across parasite\-prevalence, environmental\-factor, and feeding\-behavior contexts with different relation surfaces\. Additional tensions appear around coral sedimentation stress,*Vibrio*virulence activation, and fisheries output multipliers\. The point is not that these rows are errors\. They are exactly the places where a researcher should inspect modifiers, measurement conditions, population, and provenance before transporting a claim\.
Table 6:Examples of regime tensions surfaced by the Claims Atlas\. The important feature is not only whether a row is globally consistent, but whether the atlas can tell the researcher which local regimes make the claim portable, which require caveats, and which should remain obstructed\.The Persistent World State turns this atlas into a continuation state for research\. The selected focus context is*rising ocean surface temperatures*\. Its local PSR has2424effective local observations over a30×3030\\times 30history\-test table\. Representative histories and tests include surface temperatures affecting evolutionary shifts back toward photosynthesis, causing higher grazing rates that reduce prey abundance, increasing photosynthetic reliance over heterotrophy, and influencing coastal ecosystem dynamics\. The state records one relevant restriction check from the corpus to the focus context; it is aligned, with900900shared cells and maximum gap0\.09340\.0934\. The corresponding gluing overlap is also aligned, with weighted loss0\.00260\.0026over3030sections\. The state therefore recommends accepting the current world state for this focus: no blocking local issue was detected, and no focus\-context repair probes were attached\. In paper terms, the persistent state is the system’s explicit answer to “where should the next research step start?” rather than a passive dashboard\.
This example makes the centralPrometheusclaim concrete\. A conventional summary would likely report that warming affects marine organisms through heat stress, food limitation, disease, habitat shifts, and economic consequences\. ThePrometheusartifact instead exposes the local structure behind that answer: which studies support each region, which contexts glue cleanly to the corpus model, which narrow regions remain tense, and which provenance paths justify a claim\. The output is therefore not just a summary of recent literature\. It is a persistent causal atlas that a researcher can inspect, revise, and extend\.
## 10GLP\-1 Weight\-Loss Literature Case Study
We also ranPrometheuson the query “Analyze 10 recent studies of the weight loss drug GLP\-1 and synthesize their joint support\.” The acquisition layer retained1111documents\. The selected corpus includes work on a MOGAT2 inhibitor that increases GLP\-1 concentrations in obese mice, systematic and network\-review material on GLP\-1 receptor agonists and co\-agonists for adults without diabetes, pharmacist counseling and misuse concerns, chronic kidney disease, incretin polyagonists as bariatric\-surgery alternatives, drug\-target Mendelian randomization for gastrointestinal outcomes, pharmacovigilance studies of endocrine and dermatologic safety, tirzepatide and cocaine motivation in rodents, semaglutide approval for MASH, and oral GLP\-1 drug development\. This is a useful health\-literature stress test because the retrieval is not a clean efficacy\-only corpus\. It mixes weight\-loss efficacy, safety signals, access and adherence, cardiometabolic benefit, drug\-delivery constraints, and off\-query neighboring mechanisms\.
Table 7:Summary statistics for the GLP\-1 weight\-lossPrometheusartifact\. The corpus is intentionally heterogeneous: it contains direct weight\-loss efficacy material, safety and pharmacovigilance material, implementation and access material, and adjacent metabolic\-disease and addiction studies\.The most frequent relation families were*leads to*\(1,2031\{,\}203\),*reduces*\(831831\),*increases*\(546546\),*influences*\(489489\),*affects*\(176176\), and*causes*\(129129\)\. These counts already show why a flat answer is fragile: the corpus contains benefit claims, mechanism claims, access claims, and adverse\-event claims, and these should not be collapsed into a single endorsement or warning\.
#### PrometheusPSR bundle\.
The GLP\-1 PSR bundle exposes the corpus as a health\-research atlas rather than as a single therapy summary\. The corpus\-level PSR has rank2,9972\{,\}997,3,1443\{,\}144histories, and3,1523\{,\}152tests\. There are no learned non\-root overlap edges and no attachedjj\-do probes in this run; the signal is therefore carried by root\-to\-context restrictions and gluing diagnostics\. The largest local PSRs are safety, access, dual\-incretin, counseling, and metabolic mechanism charts, not only direct weight\-loss\-efficacy charts\. This is the main practical difference between an atlas and a review abstract: the artifact keeps the benefit and risk surfaces inspectable as separate local sections\.
Table 8:Representative local PSRs from the GLP\-1 bundle\. “Hist\.” denotes finite histories\. Shared cells and mean gap are corpus\-to\-context restriction diagnostics\. The high\-support charts show that the run is not simply an efficacy summary: safety, access, adherence, dual\-agonist mechanisms, cardiorenal outcomes, and animal metabolic mechanisms all become separate local predictive\-state charts\.Table 9:Paper\-facing atlas lenses for the GLP\-1 run\. Unlike the ocean\-temperature artifact, the automatic Claims Atlas placed almost all local contexts in a residual bucket plus a tiny measurement bucket\. For exposition, we therefore group local contexts into non\-exclusive health\-research lenses\. The non\-exclusivity is itself informative: GLP\-1 evidence crosses efficacy, safety, access, comorbidity, mechanism, and off\-query reward\-pathway surfaces\.The gluing diagnostics expose which parts of the corpus transport cleanly and which require local caveats\. Large regions such as telogen effluvium and androgenetic alopecia, improved drug accessibility and storage, tirzepatide dual GLP\-1/GIP agonism, obesity\-treatment pharmacotherapy, patient education, glucose\-tolerance and insulin testing, and safety profiles of GLP\-1 receptor agonists align with the corpus section\. The persistent state nevertheless marks the overall state as provisional because it finds4141divergent restriction checks and44divergent gluing overlaps\.
Table 10:Examples of regime tensions and glued surfaces in the GLP\-1 Claims Atlas\. The important behavior is selective transport: core efficacy and some cardiorenal claims glue across regimes, while GI mediation, hair\-loss, hormonal, and reward\-pathway claims remain narrow or obstructed\.This is exactly the behavior wanted in a health setting\.Prometheusdoes not convert heterogeneous literature into individualized medical advice or a global causal verdict\. It preserves locality: weight\-loss efficacy in non\-diabetic adults, mouse metabolic mechanisms, polyagonist alternatives, cardiorenal benefits, MASH material, pharmacist counseling, pharmacovigilance signals, and hair\-loss/hormonal safety concerns are visible as different local sections\. The artifact’s conclusion is not “GLP\-1 works” or “GLP\-1 is risky\.” It is a navigable map of where the corpus supports treatment, which mechanisms and populations carry that support, and where safety or transportability claims should remain under inspection\.
## 11Resveratrol and Red\-Wine Health\-Benefit Case Study
The third artifact case study uses the query “Analyze 10 recent studies of the health benefits of Resveratrol in red wine and synthesize their joint support\.” The acquisition layer retained1313documents\. The selected corpus includes a 2025 systematic review of red\-wine consumption and cardiovascular risk, work on alcohol hypersensitivity in aspirin\-exacerbated respiratory disease, winemaking\-residue valorization, older studies on red\-wine resveratrol stability and high\-trans\-resveratrol wine, a trauma\- hemorrhage organ\-function study, a review of red wine and resveratrol effects on human health, intestinal\-cancer material, commentary on resveratrol, microbiota\-derived resveratrol metabolites as biomarkers of red\-wine consumption, and a Mediterranean\-diet well\-being study\. This corpus is a good test ofPrometheusbecause the phrase “health benefits of resveratrol in red wine” pulls together benefit claims, wine\-processing claims, bioavailability objections, biomarker studies, cell\-line and animal studies, and diet\-measurement drift\.
Table 11:Summary statistics for the Resveratrol/red\-winePrometheusartifact\. The corpus contains direct cardiovascular and anti\-inflammatory claims, mechanistic SIRT1 and cancer\-cell claims, bioavailability concerns, wine\-production and stability claims, biomarker claims, and diet\-measurement drift\.The most frequent relation families were*leads to*\(1,4391\{,\}439\),*reduces*\(827827\),*increases*\(727727\),*influences*\(621621\),*causes*\(256256\), and*affects*\(182182\)\. The top spine claims include resveratrol reducing oxidative stress in cardiovascular tissues, poor bioavailability limiting systemic effects, SIRT1 activation influencing glucose metabolism and PCOS\-related mechanisms, red\-wine polyphenols improving lipid profile and endothelial function, cyclooxygenase inhibition in Caco\-2 cell contexts, and microbiota\-derived metabolite links to inflammation\.
#### PrometheusPSR bundle\.
The Resveratrol PSR bundle has a corpus\-level rank of3,6323\{,\}632, with3,8023\{,\}802histories and3,8133\{,\}813tests\. The largest local charts are not all clinical\-benefit charts\. They include resveratrol metabolism and cancer prevention, Caco\-2 cell models, grape\-extract intervention, lipid metabolism, fermentation temperature, food\-frequency\-questionnaire limitations, and anti\-inflammatory effects\. The bundle therefore surfaces a central limitation of the red\-wine/resveratrol literature: transportability depends on whether the claim is about wine chemistry, consumed red wine, resveratrol as an isolated compound, microbial metabolites, a cell\-line assay, or human cardiovascular risk\.
Table 12:Representative local PSRs from the Resveratrol bundle\. “Hist\.” denotes finite histories\. Shared cells and mean gap are corpus\-to\-context restriction diagnostics\. The table shows why the artifact is not a single “red wine is healthy” graph: production chemistry, cell models, bioavailability, cardiovascular mechanisms, inflammation, measurement, and controversy become separate predictive\-state charts\.Table 13:Paper\-facing atlas lenses for the Resveratrol run\. The generated Claims Atlas again placed most contexts in a residual bucket, so the paper groups local contexts into non\-exclusive research lenses\. The point is not to force one global health claim, but to show the distinct evidence surfaces that must be inspected before transporting any benefit claim\.The persistent state marks the Resveratrol artifact as provisional: the corpus has4949divergent restriction checks and55divergent gluing overlaps\. The tense gluing rows are concentrated in narrow but epistemically important regions: cardiovascular benefits of moderate red wine, red/white wine concentration differences, inflammation reduction via resveratrol metabolites, alcohol\-sensitivity mechanisms in chronic rhinosinusitis with nasal polyps, and urinary biomarkers of red\-wine intake\. These are exactly the places where the literature should not be flattened into one answer\.
Table 14:Examples of regime tensions surfaced by the Resveratrol Claims Atlas\. The artifact distinguishes benefit mechanisms from wine\-production chemistry, bioavailability constraints, cell\-line evidence, hypersensitivity mechanisms, and measurement or controversy surfaces\.This case study shows why the atlas representation matters for nutrition and natural\-product claims\. A flat summary would be tempted to say that resveratrol in red wine is antioxidant and cardioprotective\. ThePrometheusartifact instead says something more useful: some local contexts support antioxidant, lipid, endothelial, inflammatory, and cell\-line mechanisms; other contexts warn that bioavailability, wine chemistry, measurement, hypersensitivity, and credibility issues determine whether the claim transports\. The resulting object is not advice to drink red wine\. It is a map of which parts of the literature support which mechanistic claims, and where those claims should remain local\.
## 12Additional Case\-Study Templates
We describe five case\-study templates that exercise different parts of thePrometheuscontract\.
#### Ocean warming and fish populations\.
The corpus consists of papers and reports on ocean warming, oxygen loss, stratification, prey shifts, habitat migration, reproductive success, and fish population dynamics\. The atlas should reveal the main causal spine and show which species, geographies, and time scales support or break each link\.
#### Product reviews\.
Shoes\-ACOSI and targeted\-sentiment datasets provide product\-feedback corpora in which local contexts include fit, comfort, activity, failure mode, return risk, service, price/value, and quality\.Prometheusbuilds local review\-experience PSRs and exposes gluing tensions such as “comfortable” in short\-use contexts versus “painful” in long\-mileage contexts\.
#### SEC workflows\.
Annual filings and earnings\-call transcripts describe plans, risks, investments, and outcomes\. Contexts include company\-year, sector, strategic theme, and workflow stage\. The atlas can compare claims about digital investment, supply\-chain optimization, price actions, margin effects, and risk exposure across years\.
#### Health literature\.
Health corpora stress the need for locality\. Population, dosage, study design, endpoint, and time horizon often determine whether claims transport\. APrometheusatlas should make non\-transportability visible and should never present a local literature claim as individualized medical advice\.
#### Network\-economy and agent traces\.
In small simulations, producers, transporters, and consumers emit local textual reports\.Prometheusconverts these reports into role\-local PSRs, then measures whether supply, capacity, demand, and payoff contexts glue into a coherent operating regime\. These traces test whether the same atlas machinery can serve agentic decision support\.
## 13Grounded Counterfactuals with Paper Source Code
The previous case studies treat counterfactuals as probes of the language\-derived world model\. This is useful, but it is still model\-internal: the atlas can ask how a local causal neighborhood would change if a claim, mechanism, or repair were edited, but the result is only as grounded as the extracted claims and their support\. A stronger opportunity appears when a scientific paper ships source data, executable code, simulation inputs, or plot\-generation artifacts\. In that setting,Prometheuscan bind a symbolic sheaf intervention to an external scientific substrate, execute the intervention there, and then push the measured result back into the topos world model\.
We tested this mode in four separate domains\. The first is a climate\-forcing example from airborne microplastics, where source tables and gridded figure data support a direct optical\-forcing intervention\. The second is a palaeohydrology example from the Indus Valley Civilization, where VIC\-derived discharge anomalies, climate figure data, and an accompanying VIC codebase support a drought\-restoration intervention\. The third is the well\-known Sachs protein\-signaling benchmark, where single\-cell perturbation data support measured experimental\-regime substitutions\. The fourth is a comparative\-neuroscience example from singing mice, where MAPseq projection matrices support a species\-level intervention on motor\-cortex projection expansion\. The point is not that the same numeric machinery applies everywhere\. The point is that the same Topos World Model loop recurs: extract a claim sheaf, locate an external scientific substrate, execute or evaluate an intervention there, and rebuild the local world model from the changed observations\.
### 13\.1Microplastics: Optical\-Forcing Intervention
We first tested this mode on the Nature Climate Change paper*Atmospheric warming contributions from airborne microplastics and nanoplastics*\(Liu et al\.,[2026](https://arxiv.org/html/2605.12835#bib.bib13)\)\. The paper reports that atmospheric microplastic and nanoplastic particles \(MNPs\) have a mean direct radiative forcing of0\.039±0\.019Wm−20\.039\\pm 0\.019\\,\\mathrm\{W\\,m^\{\-2\}\}, equivalent to about16\.2%16\.2\\%of black\-carbon forcing, and that colored particles absorb much more strongly than pristine particles\. The local paper folder contains both the PDF and the source\-data spreadsheets used for the reported figures, including the workbook behind the paper’s gridded forcing map and source tables for microplastic and nanoplastic optical\-forcing values\. This makes it possible to evaluate a counterfactual that is not merely verbal:
colored MNP optical forcing⟼white/pristine MNP optical forcing\.\\text\{colored MNP optical forcing\}\\longmapsto\\text\{white/pristine MNP optical forcing\}\.
#### Language topos\.
We first built aPrometheusclaims world model from the document\-level causal extraction\. The baseline paper topos contained1111extracted causal events,88local PSRs,88sheaf objects,77root restriction checks, and77gluing diagnostics\. The extracted local contexts included colored MNP light absorption, atmospheric ageing optical effects, direct MNP radiative forcing, regional forcing hotspots, radiative\-transfer estimation, and the interpretation of MNPs as previously unrecognized climate\-forcing agents\. In the ordinary sheaf\-query layer, a counterfactual question about suppressing colored\-particle absorption would remain an internal claim\-topos probe\.
#### Executable intervention\.
We then used the paper’s source spreadsheets as an executable substrate\. The intervention holds the published spatial MNP distribution fixed and replaces the colored\-particle optical\-forcing value with the white/pristine value derived from the paper’s Source Table 1 and Source Table 2\. LetFbaseF\_\{\\mathrm\{base\}\}be the published gridded all\-sky MNP direct\-radiative forcing map and letmbasem\_\{\\mathrm\{base\}\}andmwhitem\_\{\\mathrm\{white\}\}be the table\-derived MP\+NP mean forcing values\. The first executable proxy computes
Fcf=Fbasemwhitembase\.F\_\{\\mathrm\{cf\}\}=F\_\{\\mathrm\{base\}\}\\frac\{m\_\{\\mathrm\{white\}\}\}\{m\_\{\\mathrm\{base\}\}\}\.In the artifact,mbase=0\.039m\_\{\\mathrm\{base\}\}=0\.039andmwhite=0\.0036667Wm−2m\_\{\\mathrm\{white\}\}=0\.0036667\\,\\mathrm\{W\\,m^\{\-2\}\}, so the scale factor is0\.09400\.0940\. Area\-weighting the gridded map by latitude gives
F¯base=0\.03914,F¯cf=0\.00368,F¯base−F¯cf=0\.03546Wm−2\.\\bar\{F\}\_\{\\mathrm\{base\}\}=0\.03914,\\qquad\\bar\{F\}\_\{\\mathrm\{cf\}\}=0\.00368,\\qquad\\bar\{F\}\_\{\\mathrm\{base\}\}\-\\bar\{F\}\_\{\\mathrm\{cf\}\}=0\.03546\\quad\\mathrm\{W\\,m^\{\-2\}\}\.Thus the colored\-to\-white intervention suppresses about90\.6%90\.6\\%of the modeled MNP forcing in this source\-table proxy\. Relative to the extracted black\-carbon benchmark map, the baseline MNP forcing is14\.29%14\.29\\%of black\-carbon forcing by area\-weighted mean, while the white\-equivalent counterfactual is about1\.34%1\.34\\%\.[Figure2](https://arxiv.org/html/2605.12835#S13.F2)shows the corresponding counterfactual sheaf slice and source\-data proxy\.
paper claimtopos11 eventsageingoptics2radiativetransfer1white MNPoptics2directforcing2regionalhotspots2Δ=\.030\\Delta=\.030Δ=\.046\\Delta=\.046Δ=\.906\\Delta=\.906Δ=\.906\\Delta=\.906Δ=\.906\\Delta=\.906solid: compatible; dashed: revised by source tables
basewhitedrop0\.039140\.039140\.003680\.0036890\.6%90\.6\\%area\-weighted forcingWm−2\\mathrm\{W\\,m^\{\-2\}\}source tables scale map by0\.0940\.094
Figure 2:A concrete microplastics artifact slice\. Left: local contexts in the counterfactual sheaf after replacing colored\-particle optics with the white/pristine source\-table value\. Right: the executable source\-data proxy holds the published spatial distribution fixed and reduces the area\-weighted MNP forcing from0\.039140\.03914to0\.00368Wm−20\.00368\\,\\mathrm\{W\\,m^\{\-2\}\}\.
#### Back into the sheaf\.
The important step is not only computing the numeric proxy\. After execution,Prometheusrewrites the relevant causal observations and rebuilds the world model from the modified episode\. Six source claims are changed\. The baseline observations
> colored\_mnp\_light\_absorption\|increase\|light\_absorption
and related direct\-forcing and hotspot claims are replaced by counterfactual observations such as
> counterfactual\_white\_mnp\_optics\|sets\_to\| white\_pristine\_absorption
> counterfactual\_mnp\_direct\_radiative\_forcing\|produces\| white\_equivalent\_mean\_direct\_radiative\_forcing
and
> counterfactual\_regional\_mnp\_forcing\_hotspots\|reduced\_by\| 0\.906\_forcing\_fraction\.
The modified episode is then passed through the ordinary Topos World Model builder\. The resulting counterfactual world model again has1111events,88local PSRs,88sheaf objects,77restriction checks, and77gluing diagnostics, but the local cover has changed: it now containscounterfactual\_white\_mnp\_optics,counterfactual\_mnp\_direct\_radiative\_forcing, andcounterfactual\_regional\_mnp\_forcing\_hotspots\. The intervention is therefore not a dashboard annotation\. It is a new sheaf world model\.
#### Interface result\.
The regenerated sheaf explorer exposes this mode as a*Grounded Counterfactual Layer*\. Its top panel shows the intervention flow
original sheaf section→paper source tables and forcing\-grid workbook→modified sheaf section,\\text\{original sheaf section\}\\to\\text\{paper source tables and forcing\-grid workbook\}\\to\\text\{modified sheaf section\},alongside the measured forcing drop and the number of modified causal events\. Buttons jump directly into the modified local contexts, where the user can inspect the local Hankel table, sheaf sections, and gluing diagnostics\. This is the interface analogue of the formal point: grounded counterfactuals are not only answers; they are new local worlds whose compatibility with the rest of the atlas can be inspected\.
#### Why this changes thePrometheuscontract\.
This example separates two kinds of counterfactuals\. An*internal*counterfactual is evaluated only inside the extracted language topos\. It is a useful research probe, but it remains symbolic\. A*grounded*counterfactual is evaluated against an external executable substrate supplied by the paper, then reincorporated into the topos as changed causal observations\. The resulting loop is
text\-derived claim topos→executable source\-data intervention→measured effect→rebuilt counterfactual sheaf\.\\text\{text\-derived claim topos\}\\to\\text\{executable source\-data intervention\}\\to\\text\{measured effect\}\\to\\text\{rebuilt counterfactual sheaf\}\.For papers that include code, equations, tables, simulation outputs, or plot\-generation data, this givesPrometheusa new source of power: counterfactual reasoning can be calibrated against a known scientific model when one is available, while still preserving the local, corrigible, and inspectable structure of the language\-derived world model\.
### 13\.2Indus Valley: VIC\-Derived Drought\-Restoration Intervention
The second grounded\-counterfactual study revisits the Indus Valley Civilization example used in earlierDemocrituswork, but now with an external hydrological substrate\. The paper*River drought forcing of the Harappan metamorphosis*\(Solanki et al\.,[2025](https://arxiv.org/html/2605.12835#bib.bib23)\)combines transient climate simulations with the Variable Infiltration Capacity \(VIC\) hydrological model to reason about severe river droughts during the Harappan transition\. The local paper folder contains the PDF, paper\-specific figure data, and an upstream VIC codebase\. The figure data include station\-level discharge anomalies for the four drought events, rainfall and SPI time series, Indus\-region shapefiles, Harappan site locations, and drought anomaly grids\.
This case is epistemically different from the microplastics example\. The folder does not contain the full transient Indus forcing grids needed to reproduce the authors’ complete basin\-scale VIC experiment\. It does, however, contain the paper’s VIC\-derived discharge\-anomaly table and the VIC codebase itself\. We therefore treat the figure data as the paper\-grounded evaluation substrate and the local VIC run as an executable hydrology harness that confirms the intervention path\. This distinction is recorded in the artifact, so the result is not misreported as a full reproduction of the authors’ simulation\.
#### Language topos\.
The baseline Indus claim world model contains a compact causal chain: transient climate forcings drive VIC hydrological reconstructions; the D3 rainfall deficit and warming reduce river flow; persistent river drought reduces freshwater availability; reduced water availability may contribute to population dispersal from major Harappan centers; and social and economic pressures also shape the transformation\. This last event is important: the atlas does not collapse the historical explanation into a monocausal climate claim\. The hydrology layer is one chart in a larger explanation\.
#### Grounded intervention\.
The executable question is a drought\-restoration counterfactual:
D3 rainfall deficit and warming⟼restored precipitation and baseline temperature\.\\text\{D3 rainfall deficit and warming\}\\longmapsto\\text\{restored precipitation and baseline temperature\}\.From the paper’s discharge\-anomaly source table, D3 has a mean discharge anomaly of−8\.49%\-8\.49\\%across1818stations, with station anomalies ranging from−15\.37%\-15\.37\\%to−5\.22%\-5\.22\\%\. We normalize the D3 discharge state as an index of91\.5191\.51, where100100denotes restored baseline discharge\. The grounded counterfactual therefore recovers8\.498\.49index points, or9\.28%9\.28\\%relative to the D3 drought state\. The rainfall/SPI source\-data panel provides a consistent climate\-side check: over the D3 interval, mean rainfall is approximately312\.62mm312\.62\\,\\mathrm\{mm\}versus a Pre\-Harappan reference mean of346\.53mm346\.53\\,\\mathrm\{mm\}, a−9\.79%\-9\.79\\%deficit, with mean SPI−0\.38\-0\.38\.[Figure3](https://arxiv.org/html/2605.12835#S13.F3)summarizes the induced sheaf revision and the paper\-grounded hydrology index shift\.
Indus claimtopos5 eventsVIChydrology1multi\-factorexplanation1restoredmonsoon1freshwateravailability1Harappanmetamorphosis1Δ=\.041\\Delta=\.041Δ=\.057\\Delta=\.057Δ=\.093\\Delta=\.093Δ=\.093\\Delta=\.093Δ=\.407\\Delta=\.407solid: compatible; dashed: hydrology\-layer revision
D3restoredrecovery91\.5191\.51100\.00100\.00\+8\.49\+8\.49VIC\-derived discharge indexrainfall deficit−9\.79%\-9\.79\\%, SPI−0\.38\-0\.38
Figure 3:A concrete Indus Valley artifact slice\. Left: local contexts in the counterfactual sheaf after the drought\-restoration intervention\. Right: the paper\-grounded discharge\-anomaly substrate moves the D3 hydrology index from91\.5191\.51to100\.00100\.00, while the rainfall/SPI source data supply a climate\-side consistency check\.
#### Back into the sheaf\.
As in the microplastics case, the numeric result is pushed back into the world model\. The baseline observation
> d3\_rainfall\_deficit\|reduces\|river\_flow
is replaced by the counterfactual observation
> counterfactual\_restored\_monsoon\_forcing\|increases\| vic\_water\_availability\_proxy\.
Freshwater\-availability and Harappan\-metamorphosis events are also rewritten, so the rebuilt model contains counterfactual contexts for restored monsoon forcing, freshwater availability, and the weakened hydrology\-only support for drought\-driven dispersal\. In the generated artifact, three causal events are modified and the rebuilt counterfactual world model contains55events,66local PSRs,66sheaf objects,55restriction checks, and55gluing diagnostics\. The grounded layer reports the primary metric as a*VIC\-derived discharge index*: baseline91\.5191\.51, counterfactual100\.00100\.00, effect8\.498\.49\.
### 13\.3Sachs: Experimental\-Regime Substitution in Protein Signaling
The third grounded case uses the canonical Sachs et al\. protein\-signaling study\(Sachs et al\.,[2005](https://arxiv.org/html/2605.12835#bib.bib21)\)\. This domain is especially useful forPrometheusbecause it is both a famous causal\-discovery benchmark and an experimentally perturbed biological system\. We first ran the paper through the standard claims pipeline, producing a Sachs claims atlas centered on T\-cell receptor stimulation, phosphorylation cascades, Bayesian network causal inference, and signaling proteins such as Raf, Mek, PLC\-γ\\gamma, PIP2, PIP3, Erk, Akt, PKA, PKC, p38, and Jnk\. We then used the localsachs\_with\_env\.csvpanel as the external substrate\. The file contains853853single\-cell observations over1111markers, partitioned into four anonymous experimental environments:e0e0with8888rows,e1e1with218218,e2e2with467467, ande3e3with8080\.
Because the local data file preserves environment labels but not the full paper\-condition names, the intervention is stated conservatively as an environment substitution rather than as a named inhibitor or activator\. Taking the largest regimee2e2as the baseline,Prometheuscomputes marker means on alog\(1\+x\)\\log\(1\+x\)abundance scale and evaluates substitutions frome2e2to each other environment\. The strongest coherent marker\-index shift is
e2⟼e0,e2\\longmapsto e0,where the mean log\-abundance index over the three most shifted markers PKA/Akt/Erk changes from3\.9683\.968to4\.9934\.993, a shift of\+1\.025\+1\.025log\-abundance units, or25\.8%25\.8\\%relative to the baseline index\. The canonical Sachs network overlay makes the result biologically interpretable without claiming to re\-infer the network: the largest measured co\-shifts occur onPKA→Akt\\mathrm\{PKA\}\\\!\\to\\\!\\mathrm\{Akt\}andPKA→Erk\\mathrm\{PKA\}\\\!\\to\\\!\\mathrm\{Erk\}, with co\-shift scores1\.251\.25and1\.061\.06, respectively\.
This measured substitution is then pushed back into the Sachs claims world model\. Thirty extracted signaling claims mentioning the shifted marker family are rewritten into counterfactual observations, and a data\-grounded episode is added for the measured marker shifts and canonical\-edge overlay\. The rebuilt counterfactual world model contains383383causal events,2222local PSRs,2222sheaf objects,2121restriction checks, and2121gluing diagnostics\. The new local contexts includecounterfactual\_sachs\_environment\_shift,sachs\_measured\_environment\_shift, andsachs\_canonical\_network\_overlay\. Thus Sachs supplies a third kind of grounding: not source\-code execution, and not a hydrology simulation, but measured perturbation data from a benchmark causal system\.
corpus383 eventsBayesiancausal inf\.36proteinsignaling18counterfactualenv\. shift30measuredenv\. shift6canonicaledge overlay6Δ=\.025\\Delta=\.025Δ=\.052\\Delta=\.052Δ=\.995\\Delta=\.995Δ=\.163\\Delta=\.163Δ=\.163\\Delta=\.163solid: compatible; dashed: tense
PIP2PIP3AktErkJnkPKAstartPIP3AktErkJnkPKA0\.125 baseline; 0\.375 supported continuation
Figure 4:A concrete Sachs artifact slice\. Left: five local contexts from the counterfactual Sachs sheaf, with corpus restriction gapsΔ\\Delta\. Right: a6×66\\times 6Hankel\-style PSR excerpt forsachs\_measured\_environment\_shift, whose rows are histories and columns are tests for the measurede2→e0e2\\\!\\to\\\!e0marker shifts\.For readers used to DAG\-based causal discovery, Figure[4](https://arxiv.org/html/2605.12835#S13.F4)should not be read as a causal graph whose nodes are biological variables and whose arrows are causal effects\. The DAG\-like objects live inside local charts: a local chart may contain an extracted DAG, a Bayesian network, a mechanistic simulator, a data panel, or a local PSR\. The displayed graph is a cover/restriction diagram over those local causal worlds\. ThusPrometheusdoes not discard DAG models; it treats them as one kind of local causal artifact that can be glued, compared, revised, or marked non\-transportable inside a larger sheaf world model\. The aim is therefore not to recover a single global DAG, but to build a small universe of local DAG\-like models, together with the restriction maps that show how they assemble, fail to assemble, or require revision inside the topos world model\.
### 13\.4Singing Mice: MAPseq Projection\-Attenuation Intervention
The fourth grounded case moves from climate, hydrology, and cell\-signaling benchmarks to comparative neuroscience\. Isko et al\. report a specific expansion of motor cortical projections in the Alston’s singing mouse \(*Scotinomys teguina*\) relative to the laboratory mouse \(*Mus musculus*\)\(Isko et al\.,[2026b](https://arxiv.org/html/2605.12835#bib.bib7)\)\. The accompanying Dryad dataset\(Isko et al\.,[2026a](https://arxiv.org/html/2605.12835#bib.bib6)\)contains MAPseq matrices for1212animals:55lab mice \(MMus\) and77singing mice \(STeg\)\. Each matrix row is a barcoded motor\-cortical neuron and each column is a target brain region, with raw counts, binarized counts, and spike\-in\-normalized count matrices\. This makes the paper a useful test of a different grounded\-counterfactual pattern: the substrate is not a simulator, but a single\-neuron projection atlas tied to a behavioral\-evolution claim\.
The baselinePrometheusworld model contains a compact causal bridge: MAPseq barcodes measure target\-region projections; STeg singing mice differ from MMus lab mice in motor\-cortical projection structure; STeg motor\-cortex pathways expand toward the auditory region \(AUD\) and periaqueductal gray \(PAG\); and the AUD/PAG\-biased expansion supports the claim that motor\-cortex pathway expansion broadens the vocal repertoire\. The final bridge is marked as evidential rather than as a direct behavioral simulation: the local executable substrate is a projection matrix, not a full acoustic or vocal\-motor dynamics model\.
The intervention asks what happens to the claim support if the singing\-mouse AUD/PAG projection layer is attenuated to the lab\-mouse species mean:
STeg AUD/PAG projection support⟼MMus\-like AUD/PAG projection support\.\\text\{STeg AUD/PAG projection support\}\\longmapsto\\text\{MMus\-like AUD/PAG projection support\}\.Using the binarized MAPseq matrices,Prometheuscomputes per\-animal species means for the fraction of barcoded neurons with a positive projection to each target\. The AUD fraction changes from0\.0420\.042in MMus to0\.1310\.131in STeg, a3\.10×3\.10\\timesincrease\. The PAG fraction changes from0\.0170\.017to0\.0790\.079, a4\.61×4\.61\\timesincrease\. Summing the two focus targets gives an AUD\+PAG support index of
0\.210for STegversus0\.059for the MMus\-like counterfactual\.0\.210\\quad\\text\{for STeg\}\\qquad\\text\{versus\}\\qquad 0\.059\\quad\\text\{for the MMus\-like counterfactual\}\.Thus the species\-level projection\-attenuation counterfactual reduces the dataset\-backed AUD\+PAG support by0\.1510\.151, or approximately71\.7%71\.7\\%\.
This measured result is then pushed back into the causal observation stream\. The baseline observations
> steg\_motor\_cortex\_pathway\|expands\_to\|auditory\_region
and
> steg\_motor\_cortex\_pathway\|expands\_to\|periaqueductal\_gray
are replaced by counterfactual observations in which STeg AUD and PAG projection support attenuates to the MMus species mean\. The vocal\-repertoire bridge is also rewritten as
> counterfactual\_projection\_attenuation\|weakens\| vocal\_repertoire\_claim\_support\.
The rebuilt counterfactual world model contains55causal events,66local PSRs,66sheaf objects,55restriction checks, and55gluing diagnostics\. The new local contexts includecounterfactual\_auditory\_projection\_expansion,counterfactual\_pag\_vocal\_motor\_projection, andcounterfactual\_vocal\_repertoire\_claim\_bridge\. This case shows that grounded counterfactuals need not be limited to papers with source code: when a paper ships a structured scientific dataset,Prometheuscan turn a verbal mechanism claim into a measured, dataset\-backed sheaf revision\.
Together, the microplastics, Indus, Sachs, and singing\-mouse studies make the grounded\-counterfactual contract much stronger\. One example intervenes on optical forcing in source tables and gridded climate\-forcing data\. A second intervenes on a palaeohydrological drought mechanism using VIC\-derived discharge data and a VIC executable harness\. A third substitutes experimental regimes in a single\-cell causal\-discovery benchmark\. A fourth attenuates a species\-specific motor\-projection expansion in a comparative\-neuroscience dataset\. In all four cases, the measured result is not a visual annotation layered on top of the atlas\. It changes the observations from which the local PSRs, sheaf objects, restrictions, and gluing diagnostics are rebuilt\.
## 14Evaluation
Prometheusshould be evaluated as a causal research instrument\. Extraction accuracy matters, but it is only one part of the story\. The case studies above require metrics that cover both the text\-to\-atlas pipeline and the grounded\-counterfactual loop\. We propose the following evaluation axes\.
Table 15:Prometheusevaluation should measure navigational and epistemic value, not only benchmark extraction accuracy\.Some axes can be scored automatically\. Claim quality can use annotated causal extraction sets\. Rerun consistency can compare atlas topology and claim tables across seeds\. Support aggregation can be tested against synthetic corpora with known repeated local claims\. Grounding quality can be checked against source\-data hashes, executable provenance, intervention parameters, and whether the regenerated sheaf records the difference between full reproduction and partial figure\-data grounding\. Other axes require expert studies\. For example, domain experts can be asked to answer multi\-hop literature questions with and without the atlas, measuring time, evidence recall, missed regime caveats, and whether the system correctly exposes the limits of current knowledge\.
## 15Limitations
Prometheusinherits the weaknesses of its sources and extractors\. Retrieval drift can change the corpus before modeling begins\. LLM extraction can be redundant, overconfident, or sensitive to prompt wording\. Claim canonicalization remains difficult: two passages may express the same relation with different variables, or different relations with deceptively similar language\. Source quality matters; a gluing procedure cannot make weak evidence strong\. Local intervention probes are model\-internal tests unless paired with identification assumptions or external validation\. The microplastics and Indus counterfactuals in[Section13](https://arxiv.org/html/2605.12835#S13)illustrate the stronger case where external source data, figure data, simulation outputs, or code exist, but many papers will not provide a runnable or source\-table substrate suitable for this kind of grounding\. Even when source artifacts exist, the grounding can be partial: the Indus folder contains paper\-specific VIC\-derived figure data and a VIC codebase, but not the complete transient forcing grids needed to reproduce the original basin\-scale simulation\.
Cost is also a practical limitation\. Corpus\-scale extraction, provenance tracking, and repeated reruns can be expensive\.Prometheustherefore needs caching, incremental updates, model routing, and compact artifact schemas\.
Finally,Prometheusis not meant to remove human steering\. It is designed to make steering more informed: users should be able to exclude regions, split contexts, revise canonicalization, inspect provenance, and decide which gluing tensions deserve follow\-up\.
## 16Research Roadmap
The next stage is interactive refinement\. Users should be able to enter a region, exclude unreliable sources, split a context, merge synonymous claims, and ask for a persistent\-state comparison against an earlier atlas\. Active corpus steering should let the system propose retrievals that would reduce a specific gluing tension\. Stronger claim canonicalization should combine embedding, symbolic normalization, ontology hints, and human corrections\.
On the modeling side,Prometheusshould move from deterministic overlap diagnostics toward learned restriction maps, neural or kernel PSR estimators, uncertainty\-aware gluing, and explicit sheafification procedures\. On the interface side, the Claims Atlas should become a live research surface: causal spines, local regions, provenance, drift, and repair suggestions should be visible as first\-class objects\. The grounded\-counterfactual layer adds another roadmap item: when papers include code, tables, simulations, figure\-source data, or agent\-executable action hooks,Prometheusshould learn to discover executable intervention hooks, run them, distinguish full reproductions from partial figure\-data groundings, and rebuild the affected sheaf charts automatically\.
## 17Future Directions: Substrate\-Seeking Topos Construction
The grounded case studies also suggest a broader goal forPrometheusv2 and beyond\. Scientific inquiry rarely begins with a single document\. It begins with a question: Why did the Indus Valley Civilization transform? What is the climatic effect of airborne microplastics? Which mechanisms explain a drug’s benefit or risk? Scientists then marshal heterogeneous evidence—papers, proxy records, experiments, source tables, equations, simulation outputs, model code, figures, and caveats—and attempt to assemble a coherent explanation\. In categorical terms, a scientific paper can be read as a compressed topos world model: a cover of local evidentiary charts together with an argument for how some of those charts glue into a global explanatory section\.
The charts need not glue perfectly\. A hydrological model may support a drought hypothesis while archaeological timing leaves room for social mechanisms\. A radiative\-transfer calculation may support an optical\-forcing claim while particle aging, spatial distribution, or measurement assumptions remain local sources of uncertainty\. Experiments may validate one mechanism and leave another underdetermined\. These non\-gluing regions are not failures of the research process\. They are often the most valuable output: they mark the limits of current knowledge and identify where new data, new measurements, or new interventions are needed\.
This is wherePrometheusshould move beyond AI\-enabled search engines\. Search and retrieval systems can find relevant documents, synthesize textual answers, and summarize consensus\. Some can cite sources and sketch plausible conclusions\. But they generally do not construct an explicit world model whose local sections can be compared, whose failures to glue are visible, and whose claims can be revised by running a counterfactual against data or scientific models\. They answer the question, but they do not usually expose the geometry of the evidence: which local worlds support the answer, which contradict it, which are merely compatible, and which remain unknown\.
Substrate\-seeking retrieval is therefore a central future direction\. Starting from a causal query,Prometheusshould not only retrieve text\. It should search for the surrounding research substrate: supplementary tables, figure data, notebooks, repositories, simulation inputs, package versions, model checkpoints, data DOIs, experimental protocols, and agent\-executable tools\. Each artifact becomes a candidate chart in the topos world model\. Text provides causal discourse and hypotheses; data and figures provide measured sections; code and scientific models provide executable transition rules; and the sheaf layer records how these local worlds agree, disagree, or fail to transport across regimes\.
In this view, the long\-term promise ofPrometheusis not merely better summarization\. It is the automation of a broader scientific practice: constructing, testing, and revising causal world models from heterogeneous evidence\. A mature system should be able to say not only “here is the best answer supported by the current corpus,” but also “here are the local worlds that support it, here are the worlds that obstruct it, here is the counterfactual we can actually run, and here is the boundary beyond which the present evidence does not justify a conclusion\.”
## 18Conclusion
Prometheusreframes causal research as the construction of a Topos World Model: a sheaf\-like atlas of local causal predictive states over a heterogeneous research substrate\. The point is not to produce one more flat summary\. It is to preserve locality, support, drift, contradiction, provenance, and epistemic limits while giving researchers a navigable causal structure\. Large language models can extract local causal claims;Prometheusasks how those claims live together, where they fail to glue, and how those failures can guide deeper research\. The grounded microplastics, Indus, Sachs, and singing\-mouse counterfactuals show the next step: when a research substrate contains both language and external data or scientific models, the atlas can move from internal counterfactual probes to measured interventions and then rebuild the local world around the result\.
## Code Availability
The predecessorDemocrituscodebase is publicly available as theDemocritus\_OpenAIrepository\(Mahadevan,[2025d](https://arxiv.org/html/2605.12835#bib.bib18)\)\.Prometheuscurrently builds on this released causal\-extraction lineage but adds an actively developing product layer for topos world\-model construction, Claims Atlas navigation, persistent state, and grounded counterfactual execution\. We therefore do not release thePrometheuscode with this manuscript\. A public code release is planned once the system has matured into a stable, documented research product\.
## Appendix ASystem Genealogy and GUI Modes
Prometheusinherits part of its interface genealogy from our earlier CLIFF chatbot and local research interface\(Mahadevan,[2025b](https://arxiv.org/html/2605.12835#bib.bib16)\)\. CLIFF began as a Categories\-for\-AGI companion system for interactive retrieval, teaching, and research workflows\. ThePrometheusGUI reuses several lessons from that system: a natural\-language query box, long\-running local sessions, background execution, artifact dashboards, route\-specific reports, and persistent run directories\. The conceptual boundary is different\. CLIFF remains oriented toward the Categories for AGI book, course material, and general retrieval\-conditioned chatbot workflows, whereasPrometheusis reserved for causal research artifacts, local PSR construction, gluing diagnostics, persistent world state, and Claims Atlas navigation\.
The GUI is designed to accept broad natural\-language research requests and route them to specialized workflows\. In the current implementation, route families include literature and paper\-corpus synthesis, Democritus\-style causal\-claim analysis, SEC and company\-filing workflows, product\-feedback world models, targeted\-sentiment review benchmarks, Rock–Paper–Scissors and network\-economy agent traces, and small Topos/OOM experiments\. A route may emit several artifacts: a human\-readable report, a technical dashboard, a JSON world\-model bundle, a persistent\-state file, and auxiliary provenance or Claims Atlas HTML\.
The GUI exposes three execution modes\.*Quick*mode runs the most compact version of a workflow and is useful for smoke tests or shallow artifact inspection\.*Interactive*mode keeps the local session open while background runs complete, letting a researcher launch follow\-up queries and inspect completed artifacts from the session list\.*Deep*mode allocates more work to acquisition, extraction, synthesis, and report generation, and is the intended setting for the case\-study style runs described in this paper\. The GUI also exposes an analysis\-mode choice:*standard*runs the routed workflow in its ordinary reporting mode, while*Topos World Model*attaches thePrometheuslayer when supported, producing local PSRs, restrictions, gluing diagnostics, and persistent\-state artifacts\.
Several additional controls specialize particular routes rather than changing the overall framework\. Democritus\-style claim analysis can run with full, lightweight, or mixture\-of\-experts manifold modes, optional dry\-run behavior, and optional deep\-dive report generation\. Filing workflows can use dry\-run paths for debugging\. Product\-feedback and persistent\-state workflows can take a parent state or state query, allowing a follow\-up run to compare against an earlier world model\. These options are engineering controls, not separate theoretical models; they let users trade runtime, cost, and depth while keeping the same artifact contract\.
## References
- Abramsky and Brandenburger \(2011\)Samson Abramsky and Adam Brandenburger\.The sheaf\-theoretic structure of non\-locality and contextuality\.*New Journal of Physics*, 13\(11\):113036, 2011\.
- Girju \(2003\)Roxana Girju\.Automatic detection of causal relations for question answering\.In*Proceedings of the ACL Workshop on Multilingual Summarization and Question Answering*, 2003\.
- Hassanzadeh et al\. \(2020\)Oktie Hassanzadeh, Debarun Bhattacharjya, Mark Feblowitz, Michael Perrone, Shirin Sohrabi, Kavitha Srinivas, and Michael Katz\.Causal knowledge extraction through large\-scale text mining\.In*Proceedings of the AAAI Conference on Artificial Intelligence*, volume 34, pages 13520–13527, 2020\.
- He et al\. \(2023\)Xiaomei He, Yi Guan, and Min Chen\.A survey of event causality identification: Taxonomy, resources, and techniques\.*ACM Computing Surveys*, 55\(14s\):1–35, 2023\.doi:[10\.1145/3582128](https://doi.org/10.1145/3582128)\.
- Hendrickx et al\. \(2010\)Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, and Stan Szpakowicz\.SemEval\-2010 task 8: Multi\-way classification of semantic relations between pairs of nominals\.In*Proceedings of the 5th International Workshop on Semantic Evaluation*, pages 33–38, 2010\.
- Isko et al\. \(2026a\)Emily Isko, Clifford Harpole, Xiaoyue Mike Zheng, Huiqing Zhan, Martin Davis, Anthony Zador, and Arkarup Banerjee\.Data from: Specific expansion of motor cortical projections in a singing mouse\.Dryad dataset, 2026a\.
- Isko et al\. \(2026b\)Emily C\. Isko, Clifford E\. Harpole, Xiaoyue Mike Zheng, Huiqing Zhan, Martin B\. Davis, Anthony M\. Zador, and Arkarup Banerjee\.Specific expansion of motor cortical projections in a singing mouse\.*Nature*, 2026b\.doi:[10\.1038/s41586\-026\-10458\-y](https://doi.org/10.1038/s41586-026-10458-y)\.Published May 6, 2026\.
- Jin et al\. \(2021\)Zhijing Jin, Bernhard Schölkopf, Peter Spirtes, and Kun Zhang\.Causal inference and natural language processing: A survey\.*arXiv preprint arXiv:2012\.14366*, 2021\.
- Kıcıman et al\. \(2024\)Emre Kıcıman, Robert Osazuwa Ness, Amit Sharma, and Chenhao Tan\.Causal reasoning and large language models: Opening a new frontier for causality\.*Transactions on Machine Learning Research*, 2024\.URL[https://openreview\.net/forum?id=6z4djmZK3c](https://openreview.net/forum?id=6z4djmZK3c)\.Preprint arXiv:2305\.00050\.
- Le et al\. \(2024\)Hao Duong Le, Xin Xia, and Zhang Chen\.Multi\-agent causal discovery using large language models\.*arXiv preprint arXiv:2407\.15073*, 2024\.
- Lewis et al\. \(2020\)Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Kuttler, Mike Lewis, Wen\-tau Yih, Tim Rocktaschel, Sebastian Riedel, and Douwe Kiela\.Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.In*Advances in Neural Information Processing Systems*, 2020\.
- Littman et al\. \(2001\)Michael L\. Littman, Richard S\. Sutton, and Satinder Singh\.Predictive representations of state\.In*Advances in Neural Information Processing Systems*, 2001\.
- Liu et al\. \(2026\)Yu Liu et al\.Atmospheric warming contributions from airborne microplastics and nanoplastics\.*Nature Climate Change*, 2026\.doi:[10\.1038/s41558\-026\-02620\-1](https://doi.org/10.1038/s41558-026-02620-1)\.Source data DOI: 10\.5281/zenodo\.19042838\.
- Mac Lane and Moerdijk \(1992\)Saunders Mac Lane and Ieke Moerdijk\.*Sheaves in Geometry and Logic: A First Introduction to Topos Theory*\.Springer, 1992\.
- Mahadevan \(2025a\)Sridhar Mahadevan\.Large causal models from large language models, 2025a\.URL[https://arxiv\.org/abs/2512\.07796](https://arxiv.org/abs/2512.07796)\.
- Mahadevan \(2025b\)Sridhar Mahadevan\.CLIFF\_CatAgi: Categories for AGI local research interface\.GitHub repository, 2025b\.URL[https://github\.com/sridharmahadevan/CLIFF\_CatAgi](https://github.com/sridharmahadevan/CLIFF_CatAgi)\.
- Mahadevan \(2025c\)Sridhar Mahadevan\.Categories for AGI\.Book manuscript, 2025c\.URL[https://people\.cs\.umass\.edu/~mahadeva/papers/catagi\.pdf](https://people.cs.umass.edu/~mahadeva/papers/catagi.pdf)\.
- Mahadevan \(2025d\)Sridhar Mahadevan\.Democritus\_OpenAI: Whygraphs from large language models\.GitHub repository, 2025d\.URL[https://github\.com/sridharmahadevan/Democritus\_OpenAI](https://github.com/sridharmahadevan/Democritus_OpenAI)\.
- Pearl \(2009\)Judea Pearl\.*Causality: Models, Reasoning, and Inference*\.Cambridge University Press, 2 edition, 2009\.
- Radinsky et al\. \(2012\)Kira Radinsky, Sagie Davidovich, and Shaul Markovitch\.Learning causality for news events prediction\.In*Proceedings of the 21st International Conference on World Wide Web*, pages 909–918, 2012\.doi:[10\.1145/2187836\.2187958](https://doi.org/10.1145/2187836.2187958)\.
- Sachs et al\. \(2005\)Karen Sachs, Omar Perez, Dana Pe’er, Douglas A\. Lauffenburger, and Garry P\. Nolan\.Causal protein\-signaling networks derived from multiparameter single\-cell data\.*Science*, 308\(5721\):523–529, 2005\.doi:[10\.1126/science\.1105809](https://doi.org/10.1126/science.1105809)\.
- Singh et al\. \(2004\)Satinder Singh, Michael R\. James, and Matthew R\. Rudary\.Predictive state representations: A new theory for modeling dynamical systems\.*Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence*, 2004\.
- Solanki et al\. \(2025\)Hiren Solanki, Vikrant Jain, Kaustubh Thirumalai, Balaji Rajagopalan, and Vimal Mishra\.River drought forcing of the harappan metamorphosis\.*Communications Earth & Environment*, 6:926, 2025\.doi:[10\.1038/s43247\-025\-02901\-1](https://doi.org/10.1038/s43247-025-02901-1)\.
- Yamada et al\. \(2025\)Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha\.The AI scientist\-v2: Workshop\-level automated scientific discovery via agentic tree search\.*arXiv preprint arXiv:2504\.08066*, 2025\.doi:[10\.48550/arXiv\.2504\.08066](https://doi.org/10.48550/arXiv.2504.08066)\.
- Yang et al\. \(2022\)Jie Yang, Soyeon Caren Han, and Josiah Poon\.A survey on extraction of causal relations from natural language text\.*Knowledge and Information Systems*, 64\(5\):1161–1186, 2022\.doi:[10\.1007/s10115\-022\-01665\-w](https://doi.org/10.1007/s10115-022-01665-w)\.Similar Articles
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
This paper proposes a validation framework for using Large Language Models to extract causal relations from social media posts during disasters. It evaluates the effectiveness of LLMs in identifying cause-effect relationships and compares them against expert-grounded reference graphs to assess reliability and risks.
CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives
CausalCine is a new academic framework for real-time, interactive multi-shot video generation that uses causal modeling and dynamic memory routing to improve cross-shot coherence in autoregressive models.
Causal Probing for Internal Visual Representations in Multimodal Large Language Models
This paper proposes a causal framework for probing internal visual representations in Multimodal Large Language Models, revealing differences in how entities and abstract concepts are encoded. The study highlights that increasing model depth is crucial for encoding abstract concepts and uncovers a disconnect between perception and reasoning in current MLLMs.
ReplaySCM: A Benchmark for Executable Causal Mechanism Induction from Interventions
This article introduces ReplaySCM, a benchmark designed to evaluate language models' ability to induce executable causal mechanisms from interventional evidence, focusing on semantic replay behavior rather than syntactic matches.
A Causal Language Modeling Detour Improves Encoder Continued Pretraining
This paper demonstrates that switching from Masked Language Modeling to Causal Language Modeling during encoder adaptation improves downstream performance on biomedical texts. The authors release ModernBERT-bio and ModernCamemBERT-bio as state-of-the-art biomedical encoders.