hidden-state-geometry

Tag

Cards List
#hidden-state-geometry

Help interpreting metrics: a strong target text appears to induce a measurable latent-state shift in Gemma 3 12B IT

Reddit r/AI_Agents · 2026-05-29

A researcher presents evidence that strong target text can induce a measurable latent-state shift in Gemma 3 12B IT before final output, distinct from lexical or content overlaps, and discusses implications for AI safety beyond output-only evaluation.

0 favorites 0 likes
#hidden-state-geometry

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

arXiv cs.AI · 2026-05-08 Cached

This paper presents a unified geometric framework for understanding transformer memory failures, distinguishing between conflict arbitration and hallucination through hidden-state attractor basins. It demonstrates that geometric margin is a superior diagnostic for detecting these failures compared to output entropy, particularly as model scale increases.

0 favorites 0 likes
← Back to home

Submit Feedback