knowledge-elicitation

Tag

Cards List
#knowledge-elicitation

MechELK: A Mechanistic Interpretability Framework for Eliciting Latent Knowledge in Large Language Models

arXiv cs.CL · 2026-05-29 Cached

MechELK is a three-stage framework combining mechanistic interpretability tools (SAE, activation patching, causal probing) with representation engineering to elicit latent knowledge from LLMs, achieving 84.7% accuracy and outperforming existing methods like CCS and linear probing.

0 favorites 0 likes
← Back to home

Submit Feedback