anticipatory-detection

Tag

Cards List
#anticipatory-detection

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

arXiv cs.CL · 2026-05-26 Cached

Introduces AERIC, a lightweight hidden-state monitoring method for detecting implicit harmful content in LLM dialogue without extra forward passes, achieving improved AUROC over strong baselines with minimal latency overhead.

0 favorites 0 likes
← Back to home

Submit Feedback