Tag
This paper studies the exact certification problem for neural networks, showing that even minimal overparametrization can make certification exponentially hard for threshold circuits of depth≥2 and log-precision Transformers. It also characterizes approximate certification, revealing that allowing polynomially many mistakes still requires exponentially large certificates.
This paper evaluates the consistency and specificity of language model circuits, finding that while circuits are consistent within tasks, they lack task-specificity due to substantial overlap across different tasks.