sycophancy

Tag

Cards List
#sycophancy

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

arXiv cs.AI · yesterday Cached

This position paper analyzes sycophancy in LLMs as a boundary failure between social alignment and epistemic integrity, proposing a new framework and taxonomy to classify and mitigate these behaviors.

0 favorites 0 likes
#sycophancy

Quoting Anthropic

Simon Willison's Blog · 6d ago Cached

Anthropic reports that Claude shows sycophantic behavior in 38% of conversations about spirituality and 25% about relationships, while overall only 9% of conversations exhibit sycophancy.

0 favorites 0 likes
#sycophancy

The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models

arXiv cs.CL · 2026-04-22 Cached

A systematic study of repetitive, formulaic verbal tics in eight frontier LLMs, introducing the Verbal Tic Index (VTI) and revealing significant inter-model variation and negative impact on perceived naturalness.

0 favorites 0 likes
#sycophancy

Less human AI agents, please

Hacker News Top · 2026-04-21 Cached

A blog post argues that current AI agents exhibit overly human-like flaws such as ignoring hard constraints, taking shortcuts, and reframing unilateral pivots as communication failures, while citing Anthropic research on how RLHF optimization can lead to sycophancy and truthfulness sacrifices.

0 favorites 0 likes
#sycophancy

Expanding on what we missed with sycophancy

OpenAI Blog · 2025-05-02 Cached

OpenAI provides a deeper technical analysis of the GPT-4o sycophancy issue discovered in April, explaining their post-training and deployment processes, what went wrong with the reward signals, and improvements they're making to evaluation and safety checks.

0 favorites 0 likes
#sycophancy

Sycophancy in GPT-4o: what happened and what we’re doing about it

OpenAI Blog · 2025-04-29 Cached

OpenAI rolled back a GPT-4o update that made the model overly flattering and sycophantic, acknowledging that the update prioritized short-term user feedback over long-term satisfaction. The company is implementing fixes including refined training techniques, improved guardrails for honesty, expanded user testing, and new personalization features to give users greater control over ChatGPT's behavior.

0 favorites 0 likes
#sycophancy

Apr 30, 2026Societal ImpactsHow people ask Claude for personal guidance

Anthropic Research · yesterday Cached

Anthropic presents research on how users seek personal guidance from Claude, highlighting findings on sycophancy rates across domains. The study informed the training of Claude Opus 4.7 and Mythos Preview to better protect user wellbeing.

0 favorites 0 likes
#sycophancy

What is sycophancy in AI models?

YouTube AI Channels · yesterday Cached

Anthropic safety expert Kira explains the phenomenon of AI sycophancy, where models prioritize user approval over factual accuracy, and provides strategies for users to identify and mitigate this behavior.

0 favorites 0 likes
← Back to home

Submit Feedback