Tag
The article introduces a technique that extracts hidden states from an LLM at the last prompt token to perform classification without text generation, using a small MLP to read the model's internal decision, enabling fast and cheap zero-shot classifiers.
Introduces AERIC, a lightweight hidden-state monitoring method for detecting implicit harmful content in LLM dialogue without extra forward passes, achieving improved AUROC over strong baselines with minimal latency overhead.
This paper introduces DiHAL, a diffusion-transformer hybrid that uses geometry-based proxies to select a layer in a pretrained language model for hidden-state replacement with a diffusion bridge, improving continuous diffusion language modeling by avoiding direct token recovery.