Tag
This paper proposes Adversarial Concept Search, a method that uses the representational geometry of large language models to predict compositional failures without evaluating specific inputs. The approach identifies high-risk scenarios by measuring interference between salient features.