Tag
Introduces AERIC, a lightweight hidden-state monitoring method for detecting implicit harmful content in LLM dialogue without extra forward passes, achieving improved AUROC over strong baselines with minimal latency overhead.