intervention

Tag

Cards List
#intervention

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Hugging Face Daily Papers · 6d ago Cached

This paper introduces the concept of the audit gap between behavioral safety and representation-level robustness in LLMs, proposing an intervention-based evaluation framework and the Latent Vulnerability Score (LVS) to measure hidden vulnerabilities.

0 favorites 0 likes
← Back to home

Submit Feedback