causal-analysis

Tag

Cards List
#causal-analysis

Observable Patterns Are Not Explanations: A Causal-Geometric Analysis of Latent Reasoning Models

arXiv cs.CL · 2026-06-12 Cached

This paper analyzes latent reasoning models (LRMs) and demonstrates that observable patterns in latent states are not causal explanations of reasoning; it advocates for matched controls and causal tests in interpretability research.

0 favorites 0 likes
#causal-analysis

ORCA: An End-to-End Interactive Copilot for Optimized Root Cause Analysis

arXiv cs.AI · 2026-05-27 Cached

ORCA is a copilot for end-to-end causal analysis that uses agents to guide users through workflows including causal discovery, effect estimation, and root cause analysis, with structured reports.

0 favorites 0 likes
#causal-analysis

From Correlation to Cause: A Five-Stage Methodology for Feature Analysis in Transformer Language Models

arXiv cs.CL · 2026-05-22 Cached

This paper proposes a five-stage methodology for causal feature analysis in transformer language models, demonstrated on GPT-2 small for the IOI task. It finds that features are specifically causal but not necessary, and exposes a gap between detection and causal robustness.

0 favorites 0 likes
#causal-analysis

Diagnosis Is Not Prescription: Linguistic Co-Adaptation Explains Patching Hazards in LLM Pipelines

arXiv cs.CL · 2026-05-22 Cached

This paper identifies a 'Diagnostic Paradox' in multi-module LLM agents: the module most causally responsible for failures (the routing module) is not the best place to intervene, and patching it can harm performance. The authors propose the 'Linguistic Contract' hypothesis and present empirical evidence across three agent families.

0 favorites 0 likes
#causal-analysis

Judge Circuits

arXiv cs.CL · 2026-05-18 Cached

This paper investigates the internal mechanisms of LLM-as-a-judge, finding a shared Latent Evaluator sub-graph in mid-to-late MLPs across models that handles abstract judging, while format-specific terminal branches map the judgment to output tokens, revealing the cause of format-induced inconsistency.

0 favorites 0 likes
#causal-analysis

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

arXiv cs.CL · 2026-04-20 Cached

This paper presents causal evidence that hallucination in autoregressive language models results from early trajectory commitment governed by asymmetric attractor dynamics, using same-prompt bifurcation and activation patching experiments on Qwen2.5-1.5B to show that hallucinated trajectories diverge at the first token and exhibit strong causal asymmetry across model layers.

0 favorites 0 likes
← Back to home

Submit Feedback