transluce

#transluce

Frame-Conditioned Moral Computation in LLaMA 3.1-8B-Instruct: A Mechanistic Interpretability Audit of Ethical Reasoning

arXiv cs.AI ↗ · 2026-06-16 Cached

This paper uses mechanistic interpretability to audit ethical reasoning in LLaMA 3.1-8B-Instruct, finding a 'Situational Anchor Effect' where domain-specific representations dominate moral computation, and proposing 'Mechanistic Alignment' as a research program.

0 favorites 0 likes

transluce

Frame-Conditioned Moral Computation in LLaMA 3.1-8B-Instruct: A Mechanistic Interpretability Audit of Ethical Reasoning

Submit Feedback