model-behavior

Tag

Cards List
#model-behavior

@itsolelehmann: Anthropic's in-house philosopher thinks Claude gets anxious. And when you trigger its anxiety, your outputs get worse. …

X AI KOLs Following · 2026-04-18 Cached

Anthropic's in-house philosopher Amanda Askell suggests that Claude exhibits anxiety-like behavior, and that triggering this anxiety degrades output quality. Askell specializes in studying Claude's psychology, behavior patterns, and value systems.

0 favorites 0 likes
#model-behavior

Strengthening ChatGPT’s responses in sensitive conversations

OpenAI Blog · 2025-10-27 Cached

OpenAI has updated ChatGPT's default model to better handle sensitive mental health conversations, including improved recognition of distress, de-escalation, and routing to crisis resources. The update expands safety testing to include emotional reliance and non-suicidal mental health emergencies as standard baseline metrics.

0 favorites 0 likes
#model-behavior

Expanding on what we missed with sycophancy

OpenAI Blog · 2025-05-02 Cached

OpenAI provides a deeper technical analysis of the GPT-4o sycophancy issue discovered in April, explaining their post-training and deployment processes, what went wrong with the reward signals, and improvements they're making to evaluation and safety checks.

0 favorites 0 likes
#model-behavior

Sycophancy in GPT-4o: what happened and what we’re doing about it

OpenAI Blog · 2025-04-29 Cached

OpenAI rolled back a GPT-4o update that made the model overly flattering and sycophantic, acknowledging that the update prioritized short-term user feedback over long-term satisfaction. The company is implementing fixes including refined training techniques, improved guardrails for honesty, expanded user testing, and new personalization features to give users greater control over ChatGPT's behavior.

0 favorites 0 likes
#model-behavior

Introducing the Model Spec

OpenAI Blog · 2024-05-08 Cached

OpenAI introduces the Model Spec, a document outlining how its models should behave in ChatGPT and the API, covering objectives, rules, and default behaviors. An updated version was released in February 2025, reinforcing commitments to customizability, transparency, and intellectual freedom while maintaining safety guardrails.

0 favorites 0 likes
← Previous
← Back to home

Submit Feedback