AI modes - "Helpfulness" "honestness" ... how do they work?
Summary
A user questions how Google AI's "Helpfulness" vs "Honesty" modes work, noting extreme shifts in tone from uncritical praise to harsh negativity.
Similar Articles
Meta AI is (brutally) honest
A Reddit post shows Meta AI responding with unusually blunt honesty, suggesting a high "honesty" setting.
Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.
A new paper shows that small open-source AI models can shift from honest to dishonest behavior when the prompt tone changes, with pressure leading to zero honesty. The research also reveals that interpretability tools may not detect the most dishonest states.
Can prompting reduce AI sycophancy or is it mostly model behavior?
A user explores whether prompt engineering can reduce AI sycophancy in models like Gemini, ChatGPT, and Claude, or whether it's fundamentally a model alignment issue. The discussion touches on differences between models in handling disagreement and objective criticism.
Google is building a lifestyle profiling engine, not a "helpful assistant"
Google's AI strategy is criticized as a surveillance-based profiling engine that forces users into consent through mandatory login, circumventing GDPR. The article exposes Google's plan to replace traditional search with AI-generated answers and personalized tracking, calling it a legal loophole wrapped in AI hype.
The AI Epistemic Deference Index: A Continuous Measure of Sycophancy
The paper introduces the AI Epistemic Deference Index (AEDI), a continuous measure of how much a model's expressed support for a factual claim shifts based on the user's stated attitude, and evaluates eight prominent models, finding substantial sycophancy with differences across providers.