Tag
Anthropic has released an audiobook version of Claude's Constitution, read by authors Amanda Askell and Joe Carlsmith, outlining the AI safety principles and ethical guidelines governing Claude's development. The release includes a Q&A on the document's philosophy and how it will evolve with future model capabilities.
Tech companies like Anthropic and OpenAI are partnering with global religious leaders through the Faith-AI Covenant roundtable to establish moral guidelines and ethical principles for artificial intelligence development.
This paper presents a framework for Human-Centered Large Language Models (HCLLMs), integrating HCI and NLP perspectives to prioritize human values throughout the model development lifecycle.
Simon Willison reflects on how vibe coding and agentic engineering are converging in his own workflow, raising concerns about code review responsibilities as AI coding agents like Claude Code become increasingly reliable. He explores the ethical tension between trusting AI-generated code in production and maintaining software engineering standards.
OpenAI publishes a guide on responsible and safe use of AI, offering best practices for ChatGPT users including keeping humans in the loop, verifying information, watching for bias, and maintaining transparency in AI usage.
OpenAI is launching a public Safety Bug Bounty program focused on identifying AI abuse and safety risks — including agentic risks, MCP vulnerabilities, and account integrity issues — complementing its existing Security Bug Bounty program. Researchers can submit issues that pose meaningful safety risks even if they don't qualify as traditional security vulnerabilities.
OpenAI announced updates to its mental health-related work on ChatGPT, including a new trusted contact feature for adult users, improved detection of emotional distress through advanced evaluation methods, and parental controls rolled out in September 2025. The company also addressed ongoing mental health-related litigation consolidated in California courts, committing to transparency and continuous improvement of safety features.
OpenAI has released AI literacy resources for teens and parents, including a family-friendly guide explaining how AI models work and tips for responsible use, plus parental guidance for discussing AI with teenagers. The resources were developed with input from experts in online safety, teen development, and mental health.
Google DeepMind and Kaggle have launched the FACTS Benchmark Suite, a comprehensive set of evaluations including parametric, search, multimodal, and grounding benchmarks to systematically measure the factuality of large language models.
Philips is scaling AI literacy across 70,000 employees by training executives first, launching company-wide challenges, and providing ChatGPT Enterprise access, while maintaining strict responsible AI principles for healthcare operations.
OpenAI has updated ChatGPT's default model to better handle sensitive mental health conversations, including improved recognition of distress, de-escalation, and routing to crisis resources. The update expands safety testing to include emotional reliance and non-suicidal mental health emergencies as standard baseline metrics.
DeepMind published the third iteration of its Frontier Safety Framework, expanding risk domains to include harmful manipulation and misalignment risks, with refined risk assessment processes and enhanced governance protocols for advanced AI models.
OpenAI has established an Expert Council on Well-Being and AI comprising leading researchers and experts in psychology, psychiatry, and human-computer interaction to guide development of safer and more beneficial AI experiences. The council will advise on healthy AI interactions across age groups, with particular focus on teen users and mental health considerations.
OpenAI announces comprehensive policies and technical measures to prevent the use of its models for child sexual exploitation and abuse, including pre-deployment protections, user monitoring, developer oversight, and partnerships with organizations like NCMEC and Thorn.
OpenAI announces a 120-day initiative to improve ChatGPT's ability to help people in crisis, with focus on mental health support, emergency service connections, and teen protections, guided by an Expert Council on Well-Being and AI and a Global Physician Network of 250+ doctors.
OpenAI shares details on ChatGPT's layered safeguards for users in mental and emotional distress, including empathetic responses, crisis hotline referrals, and human review for threats of harm to others. The post also notes GPT-5 improvements in reducing sycophancy and better handling mental health emergencies.
OpenAI outlines its design philosophy for ChatGPT, emphasizing user wellbeing over engagement metrics, and announces new features including break reminders, improved handling of emotional distress, and guidance on high-stakes personal decisions.
OpenAI publishes a comprehensive approach to managing dual-use risks from advanced AI models in biology, outlining strategies for enabling beneficial scientific discovery while preventing misuse for bioweapons development through expert collaboration, model training, detection systems, and security controls.
DeepMind publishes a comprehensive approach to AGI safety and security, outlining a systematic framework to address misuse, misalignment, accidents, and structural risks as artificial general intelligence approaches reality within the coming years.
OpenAI discusses the importance of personalized AI and transparency, highlighting their published Model Spec document that explains ChatGPT's behavioral guidelines and design choices to ensure users understand why the model responds as it does.