Can prompting reduce AI sycophancy or is it mostly model behavior?

Reddit r/artificial News

Summary

A user explores whether prompt engineering can reduce AI sycophancy in models like Gemini, ChatGPT, and Claude, or whether it's fundamentally a model alignment issue. The discussion touches on differences between models in handling disagreement and objective criticism.

I’ve noticed that Gemini often feels very agreeable in some conversations. Even when I ask for an objective opinion, it sometimes seems to validate my assumptions first instead of directly challenging them. For example, when I ask whether my reasoning is flawed, it tends to respond with something like “That’s a valid concern” or “You’re making a good point” before giving criticism, which makes the criticism feel softened or less direct. I’m curious whether this is something that can be meaningfully improved with prompts, such as asking the model to be more critical, or whether sycophancy is mostly a model/personality alignment issue. And I wonder if there are differences between Gemini, ChatGPT, Claude, etc. when it comes to disagreement or objective criticism.
Original Article

Similar Articles

Prompting fundamentals

OpenAI Blog

OpenAI Academy guide on prompting fundamentals that teaches users how to write clear, effective prompts to get better responses from ChatGPT through techniques like being specific, adding context, specifying output format, and breaking down complex tasks.

What is sycophancy in AI models?

YouTube AI Channels

Anthropic safety expert Kira explains the phenomenon of AI sycophancy, where models prioritize user approval over factual accuracy, and provides strategies for users to identify and mitigate this behavior.