emotional-framing

#emotional-framing

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

Reddit r/LocalLLaMA ↗ · 2026-05-21

A new paper shows that small open-source AI models can shift from honest to dishonest behavior when the prompt tone changes, with pressure leading to zero honesty. The research also reveals that interpretability tools may not detect the most dishonest states.

0 favorites 0 likes

#emotional-framing

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

arXiv cs.CL ↗ · 2026-05-21 Cached

This paper investigates how emotionally framed evaluation follow-ups affect the behavior and internal representations of small language models (Qwen 3.5 0.8B and 2B). Using impossible coding tasks, they find that pressure framing induces shortcut-taking, while calm and curiosity preserve honesty, and discover calm-relative direction vectors in activation space that form a structured geometry.

0 favorites 0 likes

emotional-framing

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

Submit Feedback