Strengthening our Frontier Safety Framework

Google DeepMind Blog News

Summary

DeepMind published the third iteration of its Frontier Safety Framework, expanding risk domains to include harmful manipulation and misalignment risks, with refined risk assessment processes and enhanced governance protocols for advanced AI models.

We're strengthening the Frontier Safety Framework (FSF) to help identify and mitigate severe risks from advanced AI models.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:35 AM

# Strengthening our Frontier Safety Framework Source: https://deepmind.google/blog/strengthening-our-frontier-safety-framework/ September 22, 2025 | Responsibility & Safety We're expanding our risk domains and refining our risk assessment process. **Updated April 17, 2026** AI breakthroughs are transforming our everyday lives, from advancing mathematics, biology and astronomy to realizing the potential of personalized education. As we build increasingly powerful AI models, we're committed to responsibly developing our technologies and taking an evidence-based approach to staying ahead of emerging risks. Today, we're publishing the third iteration of our Frontier Safety Framework (FSF) — our most comprehensive approach yet to identifying and mitigating severe risks from advanced AI models. This update builds upon our ongoing collaborations with experts across industry, academia and government. We've also incorporated lessons learned from implementing previous versions and evolving best practices in frontier AI safety. ## Key updates to the Framework ### Addressing the risks of harmful manipulation With this update, we're introducing a Critical Capability Level (CCL)* focused on harmful manipulation — specifically, AI models with powerful manipulative capabilities that could be misused to systematically and substantially change beliefs and behaviors in identified high-stakes contexts over the course of interactions with the model, reasonably resulting in additional expected harm at severe scale. This addition builds on and operationalizes research we've done to identify and evaluate mechanisms that drive manipulation from generative AI. Going forward, we'll continue to invest in this domain to better understand and measure the risks associated with harmful manipulation. ### Adapting our approach to misalignment risks We've also expanded our Framework to address potential future scenarios where misaligned AI models might interfere with operators' ability to direct, modify or shut down their operations. While our previous version of the Framework included an exploratory approach centered on instrumental reasoning CCLs (i.e., warning levels specific to when an AI model starts to think deceptively), with this update we now provide further protocols for our machine learning research and development CCLs focused on models that could accelerate AI research and development to potentially destabilizing levels. In addition to the misuse risks arising from these capabilities, there are also misalignment risks stemming from a model's potential for undirected action at these capability levels, and the likely integration of such models into AI development and deployment processes. To address risks posed by CCLs, we conduct safety case reviews prior to external launches when relevant CCLs are reached. This involves performing detailed analyses demonstrating how risks have been reduced to manageable levels. For advanced machine learning research and development CCLs, large-scale internal deployments can also pose risk, so we are now expanding this approach to include such deployments. ### Sharpening our risk assessment process Our Framework is designed to address risks in proportion to their severity. We've sharpened our CCL definitions specifically to identify the critical threats that warrant the most rigorous governance and mitigation strategies. We continue to apply safety and security mitigations before specific CCL thresholds are reached and as part of our standard model development approach. Lastly, in this update, we go into more detail about our risk assessment process. Building on our core early-warning evaluations, we describe how we conduct holistic assessments that include systematic risk identification, comprehensive analyses of model capabilities and explicit determinations of risk acceptability. ### FSF 3.1: Introducing tracked capability levels As of April 17, 2026, we are adding Tracked Capability Levels (TCLs) in certain domains to our Frontier Safety Framework, introducing a new capability level to help us spot and evaluate potential less extreme risks sooner. We've also provided more detail on our full risk management process, from initial identification to mitigation. ## Advancing our commitment to frontier safety The Frontier Safety Framework represents our continued commitment to taking a scientific and evidence-based approach to tracking and staying ahead of AI risks as capabilities advance toward AGI. By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity, while minimizing potential harms. Our Framework will continue evolving based on new research, stakeholder input and lessons from implementation. We remain committed to working collaboratively across industry, academia and government. The path to beneficial AGI requires not just technical breakthroughs, but also robust frameworks to mitigate risks along the way. We hope that our updated Frontier Safety Framework contributes meaningfully to this collective effort.

Similar Articles

Updating the Frontier Safety Framework

Google DeepMind Blog

DeepMind has published an updated Frontier Safety Framework (v2.0) with stronger security protocols for frontier AI models, including new Critical Capability Level (CCL) security recommendations and enhanced approaches to deceptive alignment risks. The framework aims to prevent unauthorized model weight exfiltration and manage risks as AI systems become more powerful.

Frontier Model Forum updates

OpenAI Blog

The Frontier Model Forum announces the creation of a new AI Safety Fund with over $10 million in initial funding from major AI companies (Anthropic, Google, Microsoft, OpenAI) and philanthropic partners to support independent AI safety research. The fund will focus on developing model evaluations and red-teaming techniques to assess frontier AI systems' dangerous capabilities.

OpenAI’s Approach to Frontier Risk

OpenAI Blog

OpenAI publishes details on its approach to frontier AI risks and announces progress on voluntary safety commitments made in July 2023, including the release of DALL-E 3 system card and the development of a new Preparedness Framework to manage catastrophic risks from advanced AI systems.

Frontier AI regulation: Managing emerging risks to public safety

OpenAI Blog

OpenAI proposes a regulatory framework for 'frontier AI' models that pose potential public safety risks, advocating for standard-setting processes, registration/reporting requirements, and compliance mechanisms including pre-deployment risk assessments and post-deployment monitoring.

Frontier Model Forum

OpenAI Blog

OpenAI, Google, Microsoft, and Anthropic launch the Frontier Model Forum to coordinate on AI safety standards, research, and information sharing among industry, government, and civil society. The initiative focuses on identifying best practices, advancing AI safety research, and establishing secure mechanisms for sharing safety-related information.