sae

Tag

Cards List
#sae

Help interpreting metrics: a strong target text appears to induce a measurable latent-state shift in Gemma 3 12B IT

Reddit r/AI_Agents · 2026-05-29

A researcher presents evidence that strong target text can induce a measurable latent-state shift in Gemma 3 12B IT before final output, distinct from lexical or content overlaps, and discusses implications for AI safety beyond output-only evaluation.

0 favorites 0 likes
← Back to home

Submit Feedback