BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Hugging Face Daily Papers 06/02/26, 12:00 AM Papers

safety computer-use-agents guard-models self-evolving-defense open-world-threats agent-trajectories benchmark

Summary

BraveGuard is a self-evolving defense framework that trains guard models using open-world threat signals and realistic agent trajectories to improve safety detection in computer-use agents, achieving significant accuracy gains on the AgentHazard benchmark.

Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process. We instantiate BraveGuard by training multiple guard backbones, including Qwen3-Guard and Llama-Guard variants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improves safety detection across computer-use trajectories. On AgentHazard, it substantially improves detection accuracy over off-the-shelf guard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses for computer-use agents facing evolving real-world risks.

Original Article

View Cached Full Text

Cached at: 06/04/26, 03:41 AM

Paper page - BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Source: https://huggingface.co/papers/2606.01166 Authors:

Abstract

BraveGuard is a self-evolving defense framework that trains guard models using open-world threat signals and realistic agent trajectories to improve safety detection in computer-use agents.

Computer-use agentsextend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift createssafety risksthat are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for trainingguard modelsfromopen-world threat signalsand realisticagent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them asexecutable computer-use tasks, collects agent rollouts, and derivestrajectory-level supervisionfor guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding anadaptive defense looprather than a static, benchmark-driven training process. We instantiate BraveGuard by training multipleguard backbones, includingQwen3-GuardandLlama-Guardvariants, and evaluate the resulting guards on trajectory-level agent-safety benchmarks. BraveGuard consistently improvessafety detectionacross computer-use trajectories. OnAgentHazard, it substantially improves detection accuracy over off-the-shelfguard models, with accuracy increasing from 38.79% to 82.38% under the averaged guard-model setting. These results show that guard supervision grounded in open-world threat discovery and realistic agent execution can improve safety monitoring beyond fixed taxonomies and synthetic prompt-level data. BraveGuard offers a scalable path toward adaptive defenses forcomputer-use agentsfacing evolving real-world risks.

View arXiv page View PDF GitHub27 Add to collection

Get this paper in your agent:

hf papers read 2606\.01166

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Yunhao-Feng/BraveGuard Text Generation• Updatedabout 21 hours ago • 5

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.01166 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2606.01166 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Paper page - BraveGuard: From Open-World Threats to Safer Computer-Use Agents

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

OSGuard: A Benchmark for Safety in Computer-Use Agents

OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Armorer Guard Learning Loop: local live feedback for AI-agent security

Submit Feedback

Similar Articles

OSGuard: A Benchmark for Safety in Computer-Use Agents

OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

Armorer Guard Learning Loop: local live feedback for AI-agent security