Confidence-Building Measures for Artificial Intelligence: Workshop proceedings

OpenAI Blog Events

Summary

OpenAI and UC Berkeley's workshop on Confidence-Building Measures for Artificial Intelligence brought together stakeholders to develop strategies for mitigating geopolitical risks from foundation models, identifying six key CBMs including crisis hotlines, incident sharing, model transparency, content provenance, red teaming, and dataset sharing.

No content available
Original Article
View Cached Full Text

Cached at: 04/20/26, 02:54 PM

# Confidence-Building Measures for Artificial Intelligence: Workshop proceedings Source: [https://openai.com/index/confidence-building-measures-for-artificial-intelligence/](https://openai.com/index/confidence-building-measures-for-artificial-intelligence/) ## Abstract Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list\. The Confidence\-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security\. Originating in the Cold War, confidence\-building measures \(CBMs\) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties\. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape\. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1\. crisis hotlines 2\. incident sharing 3\. model, transparency, and system cards 4\. content provenance and watermarks 5\. collaborative red teaming and table\-top exercises and 6\. dataset and evaluation sharing\. Because most foundation model developers are non\-government entities, many CBMs will need to involve a wider stakeholder community\. These measures can be implemented either by AI labs or by relevant government actors\.

Similar Articles

Concrete AI safety problems

OpenAI Blog

OpenAI, Berkeley, and Stanford researchers co-authored a foundational paper identifying five concrete safety problems in modern AI systems: safe exploration, robustness to distributional shift, avoiding negative side effects, preventing reward hacking, and scalable oversight.

Preparing for malicious uses of AI

OpenAI Blog

OpenAI co-authors a comprehensive paper forecasting malicious uses of AI and proposing mitigation strategies, developed in collaboration with leading research institutions. The work emphasizes acknowledging AI's dual-use nature, learning from cybersecurity practices, and broadening stakeholder discussions around AI security risks.

OpenAI Built Intelligence. Who Will Build Trust?

Reddit r/artificial

AutoFlow discusses the critical challenge of trust in AI, proposing external verification methods such as knowledge graphs and mathematical consistency checks, and announces acceptance into the NVIDIA Inception Program to advance research into trustworthy AI systems.

OpenAI safety practices

OpenAI Blog

OpenAI outlines 10 safety practices it actively uses and improves upon, including empirical red-teaming, alignment research, abuse monitoring, and voluntary commitments shared at the AI Seoul Summit. The company emphasizes a balanced, scientific approach to safety integrated into development from the outset.

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

arXiv cs.AI

AICompanionBench introduces the first publicly available benchmark dataset of 2,123 real-world AI companion conversations annotated across nine safety risk categories, used to evaluate 20 LLMs as safety judges. Results show strong models handle explicit harmful content well but struggle with nuanced risks like manipulation and false positives on benign conversations.