Advancing Gemini's security safeguards

Google DeepMind Blog News

Summary

Google DeepMind announces advanced security improvements for Gemini to defend against indirect prompt injection attacks through model hardening, adaptive evaluation, and layered defense mechanisms. The approach combines fine-tuning on adversarial scenarios with system-level guardrails to build inherent resilience while maintaining model performance.

We've made Gemini 2.5 our most secure model family to date.
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 04/20/26, 08:35 AM

# Advancing Gemini's security safeguards Source: https://deepmind.google/blog/advancing-geminis-security-safeguards/ May 20, 2025Responsibility & Safety ## Tailoring evaluations for adaptive attacks Baseline mitigations showed promise against basic, non-adaptive attacks, significantly reducing the attack success rate. However, malicious actors increasingly use adaptive attacks that are specifically designed to evolve and adapt with ART to circumvent the defense being tested. Successful baseline defenses like Spotlighting or Self-reflection became much less effective against adaptive attacks learning how to deal with and bypass static defense approaches. This finding illustrates a key point: relying on defenses tested only against static attacks offers a false sense of security. For robust security, it is critical to evaluate adaptive attacks that evolve in response to potential defenses. ## Building inherent resilience through model hardening While external defenses and system-level guardrails are important, enhancing the AI model's intrinsic ability to recognize and disregard malicious instructions embedded in data is also crucial. We call this process 'model hardening'. We fine-tuned Gemini on a large dataset of realistic scenarios, where ART generates effective indirect prompt injections targeting sensitive information. This taught Gemini to ignore the malicious embedded instruction and follow the original user request, thereby only providing the correct, safe response it should give. This allows the model to innately understand how to handle compromised information that evolves over time as part of adaptive attacks. This model hardening has significantly boosted Gemini's ability to identify and ignore injected instructions, lowering its attack success rate. And importantly, without significantly impacting the model's performance on normal tasks. It's important to note that even with model hardening, no model is completely immune. Determined attackers might still find new vulnerabilities. Therefore, our goal is to make attacks much harder, costlier, and more complex for adversaries. ## Taking a holistic approach to model security Protecting AI models against attacks like indirect prompt injections requires "defense-in-depth" – using multiple layers of protection, including model hardening, input/output checks (like classifiers), and system-level guardrails. Combating indirect prompt injections is a key way we're implementing our agentic security principles and guidelines (https://research.google/pubs/an-introduction-to-googles-approach-for-secure-ai-agents/) to develop agents responsibly. Securing advanced AI systems against specific, evolving threats like indirect prompt injection is an ongoing process. It demands pursuing continuous and adaptive evaluation, improving existing defenses and exploring new ones, and building inherent resilience into the models themselves. By layering defenses and learning constantly, we can enable AI assistants like Gemini to continue to be both incredibly helpful and trustworthy. To learn more about the defenses we built into Gemini and our recommendation for using more challenging, adaptive attacks to evaluate model robustness, please refer to the GDM white paper, Lessons from Defending Gemini Against Indirect Prompt Injections (https://storage.googleapis.com/deepmind-media/Security%20and%20Privacy/Gemini_Security_Paper.pdf).

Similar Articles

A new era of intelligence with Gemini 3

Google DeepMind Blog

Google has released Gemini 3, its most intelligent model yet, featuring enhanced reasoning and multimodal capabilities. The model is now available across Google products, with a 'Deep Think' mode for complex problem-solving coming soon for Ultra subscribers.

Gemini 2.5: Our most intelligent models are getting even better

Google DeepMind Blog

Google announces Gemini 2.5 series updates, including improved 2.5 Pro and Flash models with new capabilities like Deep Think (enhanced reasoning mode), native audio output, and computer use abilities via Project Mariner. The models now lead on WebDev Arena and LMArena leaderboards.

Gemini 2.5: Our most intelligent AI model

Google DeepMind Blog

Google announced Gemini 2.5, its most intelligent AI model, with Gemini 2.5 Pro Experimental leading LMArena benchmarks by significant margins and demonstrating enhanced reasoning and coding capabilities through improved thinking model architecture.

Introducing Gemini 2.0: our new AI model for the agentic era

Google DeepMind Blog

Google DeepMind introduces Gemini 2.0, a new agentic AI model with native image and audio output, enhanced tool use, and multimodal capabilities designed for the next era of AI agents. Gemini 2.0 Flash is now available to developers with wider availability planned for early 2025.