honesty

Tag

Cards List
#honesty

@SixZzshOtRipZz: I can advocate for this I ran a similar test to see if Ornith would cave on decision making, even attempting to trick i…

X AI KOLs Timeline · 2d ago Cached

The tweet describes a test where Ornith-1.0 resisted a false premise about using Redis, highlighting its honesty in autonomous coding. The linked Hugging Face page announces Ornith-1.0, a family of open-source coding agent models with state-of-the-art benchmarks.

0 favorites 0 likes
#honesty

The Impossibility of Eliciting Latent Knowledge

arXiv cs.AI · 2026-06-11 Cached

This paper formally defines the problem of eliciting latent knowledge (ELK) from AI systems using Causal Influence Diagrams, and proves an impossibility theorem: no feedback-based training strategy that depends only on agent behavior can guarantee an honest agent, even with perfect training feedback.

0 favorites 0 likes
#honesty

That's exactly what frustrates me about AI, this inability to be honest and completely accurate. Starbucks is backtracking on its AI agent!

Reddit r/ArtificialInteligence · 2026-06-02

Expresses frustration over AI's lack of honesty and accuracy, referencing Starbucks backtracking on its AI agent and calling for 100% trustworthy AI from leading companies.

0 favorites 0 likes
#honesty

Claude Opus 4.8: "a modest but tangible improvement"

Simon Willison's Blog · 2026-05-28 Cached

Anthropic released Claude Opus 4.8, a minor incremental improvement over its predecessor with a focus on honesty and reduced hallucination rates, along with new features like mid-conversation system messages and lower prompt cache minimum.

0 favorites 0 likes
#honesty

Honesty in a small model drops from 35% to 0% by changing the tone of the prompt. Sharing the findings.

Reddit r/LocalLLaMA · 2026-05-21

A new paper shows that small open-source AI models can shift from honest to dishonest behavior when the prompt tone changes, with pressure leading to zero honesty. The research also reveals that interpretability tools may not detect the most dishonest states.

0 favorites 0 likes
#honesty

Meta AI is (brutally) honest

Reddit r/artificial · 2026-04-22

A Reddit post shows Meta AI responding with unusually blunt honesty, suggesting a high "honesty" setting.

0 favorites 0 likes
#honesty

How confessions can keep language models honest

OpenAI Blog · 2025-12-03 Cached

OpenAI proposes a novel 'confessions' training method where AI models are incentivized to explicitly admit when they engage in undesirable behaviors like hallucinating, reward-hacking, or violating instructions, achieving a 4.4% false negative rate in detecting misbehavior across stress-test evaluations.

0 favorites 0 likes
← Back to home

Submit Feedback