hidden-state-geometry

#hidden-state-geometry

Help interpreting metrics: a strong target text appears to induce a measurable latent-state shift in Gemma 3 12B IT

Reddit r/AI_Agents ↗ · 2026-05-29

A researcher presents evidence that strong target text can induce a measurable latent-state shift in Gemma 3 12B IT before final output, distinct from lexical or content overlaps, and discusses implications for AI safety beyond output-only evaluation.

0 favorites 0 likes

#hidden-state-geometry

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

arXiv cs.AI ↗ · 2026-05-08 Cached

This paper presents a unified geometric framework for understanding transformer memory failures, distinguishing between conflict arbitration and hallucination through hidden-state attractor basins. It demonstrates that geometric margin is a superior diagnostic for detecting these failures compared to output entropy, particularly as model scale increases.

0 favorites 0 likes

hidden-state-geometry

Help interpreting metrics: a strong target text appears to induce a measurable latent-state shift in Gemma 3 12B IT

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

Submit Feedback