same-pass-monitoring

Tag

Cards List
#same-pass-monitoring

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

arXiv cs.CL · 2026-05-26 Cached

Introduces AERIC, a lightweight hidden-state monitoring method for detecting implicit harmful content in LLM dialogue without extra forward passes, achieving improved AUROC over strong baselines with minimal latency overhead.

0 favorites 0 likes
← Back to home

Submit Feedback