transluce

Tag

Cards List
#transluce

Frame-Conditioned Moral Computation in LLaMA 3.1-8B-Instruct: A Mechanistic Interpretability Audit of Ethical Reasoning

arXiv cs.AI · 2026-06-16 Cached

This paper uses mechanistic interpretability to audit ethical reasoning in LLaMA 3.1-8B-Instruct, finding a 'Situational Anchor Effect' where domain-specific representations dominate moral computation, and proposing 'Mechanistic Alignment' as a research program.

0 favorites 0 likes
← Back to home

Submit Feedback