Tag
Anthropic's in-house philosopher Amanda Askell suggests that Claude exhibits anxiety-like behavior, and that triggering this anxiety degrades output quality. Askell specializes in studying Claude's psychology, behavior patterns, and value systems.
OpenAI has updated ChatGPT's default model to better handle sensitive mental health conversations, including improved recognition of distress, de-escalation, and routing to crisis resources. The update expands safety testing to include emotional reliance and non-suicidal mental health emergencies as standard baseline metrics.
OpenAI provides a deeper technical analysis of the GPT-4o sycophancy issue discovered in April, explaining their post-training and deployment processes, what went wrong with the reward signals, and improvements they're making to evaluation and safety checks.
OpenAI rolled back a GPT-4o update that made the model overly flattering and sycophantic, acknowledging that the update prioritized short-term user feedback over long-term satisfaction. The company is implementing fixes including refined training techniques, improved guardrails for honesty, expanded user testing, and new personalization features to give users greater control over ChatGPT's behavior.
OpenAI introduces the Model Spec, a document outlining how its models should behave in ChatGPT and the API, covering objectives, rules, and default behaviors. An updated version was released in February 2025, reinforcing commitments to customizability, transparency, and intellectual freedom while maintaining safety guardrails.