mechanistic-analysis

Tag

Cards List
#mechanistic-analysis

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Hugging Face Daily Papers · 4d ago Cached

SWITCH is a switchable latent reasoning framework that uses explicit boundary tokens to enable trainable and interpretable recurrent hidden-state reasoning via on-policy reinforcement learning, outperforming prior approaches.

0 favorites 0 likes
#mechanistic-analysis

Mechanistic Analysis of Alignment Algorithms in Language Models

arXiv cs.LG · 5d ago Cached

This paper presents a systematic mechanistic analysis of six preference optimization methods (PPO, DPO, SimPO, ORPO, GRPO, KTO) across three open-weight model families, using probing and sparse autoencoders to reveal how alignment algorithms reshape internal representations in qualitatively distinct ways.

0 favorites 0 likes
#mechanistic-analysis

Trust, but Don't Verify: Epistemic Blind Spots in LLM Source Evaluation

arXiv cs.LG · 2026-06-05 Cached

This paper identifies a failure mode in LLMs where they do not verify the validity of numerical statistics when synthesizing multiple sources, instead relying on the stylistic markers of analytical rigor. The authors term this 'epistemic alignment' and show that it persists across models and domains, resisting prompting-based mitigations.

0 favorites 0 likes
#mechanistic-analysis

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

arXiv cs.CL · 2026-05-27 Cached

This paper presents a mechanistic analysis of why LLMs hallucinate when reasoning over linearized structured knowledge, finding that hallucinations stem from systematic internal dynamics such as attention on shortcut cues and failures in semantic grounding in feed-forward layers, rather than random noise.

0 favorites 0 likes
#mechanistic-analysis

In-Context Learning Operates as Concept Subspace Learning

arXiv cs.LG · 2026-05-20

This paper proposes that in-context learning in LLMs operates through low-dimensional concept subspaces, where task-relevant information concentrates in a small fraction of the representation space, supported by experiments on Llama-3-8B and Qwen2.5-7B.

0 favorites 0 likes
#mechanistic-analysis

Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

arXiv cs.CL · 2026-04-20 Cached

This paper investigates prompt-induced hallucinations in vision-language models through mechanistic analysis, identifying specific attention heads responsible for the models' tendency to favor textual prompts over visual evidence. The authors demonstrate that ablating these PIH-heads reduces hallucinations by at least 40% without additional training, revealing model-specific mechanisms underlying this failure mode.

0 favorites 0 likes
← Back to home

Submit Feedback