entropy-dynamics

#entropy-dynamics

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

arXiv cs.CL ↗ · 2026-06-25 Cached

This paper investigates how jailbreak attempts are encoded in the internal representations of large language models by analyzing token-level predictive entropy trajectories across layers using the logit lens. It finds that entropy dynamics at intermediate layers are more discriminative than aggregate statistics, providing a training-free detection method consistent across multiple models.

0 favorites 0 likes

#entropy-dynamics

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

arXiv cs.LG ↗ · 2026-05-25 Cached

This paper investigates when chain-of-thought reasoning is beneficial for LLMs, showing that early-stage entropy dynamics reliably indicate reasoning utility, and introduces EDRM, a lightweight, training-free framework that adaptively selects inference strategies to achieve significant token savings while maintaining or improving accuracy.

0 favorites 0 likes

entropy-dynamics

What Intermediate Layers Know: Detecting Jailbreaks from Entropy Dynamics

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

Submit Feedback