edge-attribution-patching

#edge-attribution-patching

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

arXiv cs.CL ↗ · 2026-05-12 Cached

This paper evaluates the consistency and specificity of language model circuits, finding that while circuits are consistent within tasks, they lack task-specificity due to substantial overlap across different tasks.

0 favorites 0 likes

edge-attribution-patching

How Much Do Circuits Tell Us? Measuring the Consistency and Specificity of Language Model Circuits

Submit Feedback