intervention

#intervention

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Hugging Face Daily Papers ↗ · 6d ago Cached

This paper introduces the concept of the audit gap between behavioral safety and representation-level robustness in LLMs, proposing an intervention-based evaluation framework and the Latent Vulnerability Score (LVS) to measure hidden vulnerabilities.

0 favorites 0 likes

intervention

When Behavioral Safety Evaluation Fails: A Representation-Level Perspective

Submit Feedback