transformer-internals

Tag

Cards List
#transformer-internals

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

arXiv cs.LG · 2026-04-20 Cached

A comprehensive spectral analysis across 11 LLMs revealing that transformers exhibit phase transitions in hidden activation spaces during reasoning versus factual recall, with seven fundamental phenomena including spectral compression, instruction-tuning reversal, and perfect correctness prediction (AUC=1.0) based solely on spectral properties.

0 favorites 0 likes
#transformer-internals

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

arXiv cs.CL · 2026-04-20 Cached

This paper presents causal evidence that hallucination in autoregressive language models results from early trajectory commitment governed by asymmetric attractor dynamics, using same-prompt bifurcation and activation patching experiments on Qwen2.5-1.5B to show that hallucinated trajectories diverge at the first token and exhibit strong causal asymmetry across model layers.

0 favorites 0 likes
#transformer-internals

Disentangling Mathematical Reasoning in LLMs: A Methodological Investigation of Internal Mechanisms

arXiv cs.CL · 2026-04-20 Cached

This paper investigates how large language models perform arithmetic operations by analyzing internal mechanisms through early decoding, revealing that proficient models exhibit a clear division of labor between attention and MLP modules in reasoning tasks.

0 favorites 0 likes
← Back to home

Submit Feedback