linear-probe

#linear-probe

Rift: A Conflict Signature for Deception in Language Models

arXiv cs.LG ↗ · 2026-06-17 Cached

This paper introduces Rift, a method that uses the residual rank of hidden states to detect deceptive responses in language models. It achieves perfect separation across various deception types, model families, and languages, and demonstrates cross-family zero-shot transfer without retraining.

0 favorites 0 likes

#linear-probe

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

The paper challenges the assumption that cosine alignment between supervised latents and visual targets improves accuracy in vision-language models, finding a strong negative correlation. It introduces PRISM diagnostics revealing that answers are decoded downstream from latents, not within them, and that the auxiliary loss reshapes the language model via shared parameters.

0 favorites 0 likes

linear-probe

Rift: A Conflict Signature for Deception in Language Models

Cosine Misleads: Auxiliary Losses Reshape Vision Language Models, Not Their Latents

Submit Feedback