linear-probes

Tag

Cards List
#linear-probes

An Enigma of Artificial Reason: Investigating the Production-Evaluation Gap in Large Reasoning Models

Hugging Face Daily Papers · 2026-05-31 Cached

This paper investigates the production-evaluation gap in large reasoning models (LRMs), finding that they fail to robustly evaluate reasoning despite near-perfect solution production, due to an answer confirmation bias.

0 favorites 0 likes
#linear-probes

What are They Thinking? Delineation, Probing and Tracking of Concepts in LLMs

arXiv cs.CL · 2026-05-29 Cached

This paper presents a methodology for delineating concepts and training linear probes to detect them in LLM embeddings, using four example concepts across three models. The work aims to enable scalable monitoring of LLM internal representations.

0 favorites 0 likes
#linear-probes

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations

Hugging Face Daily Papers · 2026-05-27 Cached

This paper systematically tests linear probes for deception detection in large language models, finding they fail under distributional shifts but style-augmented probes recover performance, and revealing that deception is encoded through distributed sub-threshold features.

0 favorites 0 likes
← Back to home

Submit Feedback