SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning
Summary
SingGuard is a policy-adaptive multimodal LLM guardrail model for text, image, and multilingual safety moderation, featuring dynamic reasoning and a new benchmark SingGuard-Bench. It achieves state-of-the-art results across multiple datasets.
View Cached Full Text
Cached at: 06/29/26, 06:01 AM
Paper page - SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning
Source: https://huggingface.co/papers/2606.22873

SingGuard is a policy-adaptive multimodal LLM guardrail model family for text, image, image-text, query-response, and multilingual safety moderation. Unlike static guardrails that rely on a fixed taxonomy, SingGuard treats the active safety policy as a runtime input and performs rule-by-rule policy-grounded judgment, enabling different products or deployment scenarios to apply customized and dynamically updated safety rules.
SingGuard supports three inference regimes: fast judgment for low-latency moderation, slow policy-grounded reasoning for complex or audit-sensitive cases, and hybrid fast-slow reasoning with early exit. It also introduces Rule Isolation Mask (RI-Mask), an inference-time acceleration method for multi-rule moderation: shared image-text content is encoded once, while different rule branches remain isolated through attention masking, enabling parallel rule checking without cross-rule interference.
We further introduce SingGuard-Bench, a 56,340-example multimodal guardrail benchmark covering 80+ fine-grained risk types, including image safety, multimodal QA safety, cross-modal hidden-intent attacks, multilingual moderation, and dynamic-rule evaluation. Across 6 benchmark families and 35 datasets, SingGuard achieves state-of-the-art average F1 and improves policy-following accuracy under runtime rule shifts.
Similar Articles
@AdinaYakup: SingGuard from Ant Group @AntLingAGI A multimodal guardrail where the safety policy is an input, not a fixed weight. - …
SingGuard is a multimodal guardrail system from Ant Group that treats safety policy as an input, allowing dynamic adaptation via natural language. It is released under Apache 2.0 and covers text and image modalities.
Do Safety Guardrails Need to Reason? LeanGuard: A Fast and Light Approach for Robust Moderation
This paper introduces LeanGuard, a lightweight bidirectional encoder-based safety guardrail that matches the accuracy of larger reasoning-based guardrails while being approximately 100x faster, challenging the assumption that chain-of-thought reasoning is necessary for effective moderation.
CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment
This paper introduces CHILLGuard, a fine-grained Chinese LLM content safety guardrail built on a new 5-macro, 31-micro category risk taxonomy and a scalable multi-stage data construction pipeline. The model achieves state-of-the-art performance, improving F1 score by 15.92% over existing baselines.
Robust and Efficient Guardrails with Latent Reasoning
CoLaGuard is a new guardrail model that transfers multi-step safety reasoning into a continuous latent space, achieving 12.9x speedup and 22.4x token reduction compared to explicit reasoning baselines while matching macro-F1 performance on ten safety benchmarks.
OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform
OpenGuardrails is an open-source platform for AI safety, offering context-aware content-safety and manipulation detection (e.g., prompt injection, jailbreaking) via a unified model, plus a separate NER pipeline for data-leakage identification. It achieves state-of-the-art performance on safety benchmarks and supports private, enterprise-grade deployment.
