red-teaming

Tag

Cards List
#red-teaming

YellowKey Bitlocker Bypass Vulnerability

Lobsters Hottest · 8h ago Cached

YellowKey is a proof-of-concept exploit that bypasses BitLocker encryption on Windows 11 by leveraging a vulnerability in the Windows Recovery Environment, allowing unrestricted access to protected volumes.

0 favorites 0 likes
#red-teaming

The Attacker in the Mirror: Breaking Self-Consistency in Safety via Anchored Bipolicy Self-Play

arXiv cs.AI · yesterday Cached

This paper introduces Anchored Bipolicy Self-Play, a method to improve AI safety by training distinct role-specific LoRA adapters on a frozen base model, addressing limitations in standard self-play red teaming.

0 favorites 0 likes
#red-teaming

@mylifcc: The ultimate AI security red teaming tool is here! I just discovered an incredibly hardcore open-source project — DeepTeam! Produced by Confident AI, it is an LLM Red Teaming framework built on DeepEval, specifically designed to 'hack' your own large models: 50+ real-world vulnerabilities…

X AI KOLs Timeline · 4d ago

Confident AI has released DeepTeam, an open-source LLM red teaming framework that supports 50+ vulnerability detections and 20+ adversarial attacks, aimed at helping developers safely test large language models.

0 favorites 0 likes
#red-teaming

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

Hugging Face Daily Papers · 2026-05-06 Cached

This paper introduces the DecodingTrust-Agent Platform (DTap), a controllable and interactive red-teaming platform for evaluating AI agent security across multiple domains. It also presents DTap-Red, an autonomous agent for discovering attack strategies, and DTap-Bench, a large-scale dataset for risk assessment.

0 favorites 0 likes
#red-teaming

GPT-5.5 Bio Bug Bounty

OpenAI Blog · 2026-04-23 Cached

OpenAI has launched a Bio Bug Bounty program for GPT-5.5, inviting security researchers to identify universal jailbreaks for biological safety challenges. The program offers rewards up to $25,000 for successfully defeating the model's safeguards on specific bio-risk questions.

0 favorites 0 likes
#red-teaming

When Choices Become Risks: Safety Failures of Large Language Models under Multiple-Choice Constraints

arXiv cs.CL · 2026-04-21 Cached

Researchers identify a systematic safety failure in LLMs where reformulating harmful requests as forced-choice multiple-choice questions (MCQs) bypasses refusal behavior, even in models that reject equivalent open-ended prompts. Evaluated across 14 proprietary and open-source models, the study reveals current safety benchmarks substantially underestimate risks in structured decision-making settings.

0 favorites 0 likes
#red-teaming

RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models

arXiv cs.CL · 2026-04-20 Cached

RedBench introduces a universal dataset aggregating 37 benchmark datasets with 29,362 samples across 22 risk categories and 19 domains to enable standardized and comprehensive red teaming evaluation of large language models. The work addresses inconsistencies in existing red teaming datasets and provides baselines, evaluation code, and open-source resources for assessing LLM robustness against adversarial prompts.

0 favorites 0 likes
#red-teaming

TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis

arXiv cs.CL · 2026-04-20 Cached

TRIDENT is a novel framework and dataset synthesis pipeline for enhancing LLM safety through tri-dimensional red-teaming data that covers lexical diversity, malicious intent, and jailbreak tactics. Fine-tuning Llama-3.1-8B on TRIDENT-Edge achieves 14.29% reduction in Harm Score and 20% decrease in Attack Success Rate compared to baseline models.

0 favorites 0 likes
#red-teaming

OpenAI to acquire Promptfoo

OpenAI Blog · 2026-03-09 Cached

OpenAI is acquiring Promptfoo, an AI security platform used by over 25% of Fortune 500 companies, to integrate security testing and evaluation capabilities into OpenAI Frontier for building AI agents. The acquisition will enhance enterprise capabilities for identifying vulnerabilities, compliance, and governance in AI systems.

0 favorites 0 likes
#red-teaming

Detecting and reducing scheming in AI models

OpenAI Blog · 2025-09-17 Cached

OpenAI and Apollo Research present findings on detecting and reducing scheming behavior in AI models, demonstrating that frontier models exhibit covert actions (withholding task-relevant information) and achieving ~30× reduction in such behaviors through deliberative alignment training.

0 favorites 0 likes
#red-teaming

Working with US CAISI and UK AISI to build more secure AI systems

OpenAI Blog · 2025-09-12 Cached

OpenAI announces collaborative security improvements with US CAISI and UK AISI, highlighting joint red-teaming efforts that discovered and helped remediate novel vulnerabilities in ChatGPT Agent systems through multidisciplinary cybersecurity and AI agent security approaches.

0 favorites 0 likes
#red-teaming

Agent bio bug bounty call

OpenAI Blog · 2025-07-17 Cached

OpenAI has launched a bio bug bounty program inviting vetted researchers to find universal jailbreaks in ChatGPT Agent's bio/chem safety challenge, offering up to $25,000 for a successful universal jailbreak across all ten levels. Applications open July 17, 2025, with testing beginning July 29, 2025.

0 favorites 0 likes
#red-teaming

Evaluating potential cybersecurity threats of advanced AI

Google DeepMind Blog · 2025-04-02 Cached

DeepMind published a comprehensive framework for evaluating offensive cybersecurity capabilities of advanced AI models, analyzing over 12,000 real-world AI-powered cyberattack attempts across 20 countries and creating a 50-challenge benchmark covering the entire attack chain to help defenders prioritize security resources.

0 favorites 0 likes
#red-teaming

Security on the path to AGI

OpenAI Blog · 2025-03-26 Cached

OpenAI outlines comprehensive security measures on the path to AGI, including AI-powered cyber defense, continuous adversarial red teaming with SpecterOps, and security frameworks for emerging AI agents like Operator. The company emphasizes proactive threat detection, industry collaboration, and security integration into infrastructure and models.

0 favorites 0 likes
#red-teaming

Operator System Card

OpenAI Blog · 2025-01-23 Cached

OpenAI released the Operator System Card detailing safety evaluations for its Computer-Using Agent (CUA) model, which combines GPT-4o's vision capabilities with reinforcement learning to interact with GUIs and perform web-based tasks on users' behalf. The card outlines risk areas including prompt injections, harmful tasks, and model mistakes, along with multi-layered mitigations based on OpenAI's Preparedness Framework.

0 favorites 0 likes
#red-teaming

Advancing red teaming with people and AI

OpenAI Blog · 2024-11-21 Cached

OpenAI publishes a white paper detailing their approach to external red teaming for AI models, outlining methods for selecting diverse red team members, determining model access levels, providing testing infrastructure, and synthesizing feedback to improve AI safety and policy coverage.

0 favorites 0 likes
#red-teaming

OpenAI o1 System Card External Testers Acknowledgements

OpenAI Blog · 2024-09-12 Cached

OpenAI published acknowledgements for external testers and red teamers who contributed to the evaluation and safety testing of the o1 model. The document lists individuals and organizations involved in red teaming and preparedness collaboration efforts.

0 favorites 0 likes
#red-teaming

GPT-4o System Card External Testers Acknowledgements

OpenAI Blog · 2024-08-08 Cached

OpenAI publishes acknowledgements for external red teamers and evaluators who contributed to GPT-4o's safety testing and system card development. The document credits numerous individual researchers and organizations including METR and Apollo Research.

0 favorites 0 likes
#red-teaming

OpenAI safety practices

OpenAI Blog · 2024-05-21 Cached

OpenAI outlines 10 safety practices it actively uses and improves upon, including empirical red-teaming, alignment research, abuse monitoring, and voluntary commitments shared at the AI Seoul Summit. The company emphasizes a balanced, scientific approach to safety integrated into development from the outset.

0 favorites 0 likes
#red-teaming

Response to NIST Executive Order on AI

OpenAI Blog · 2024-02-02 Cached

OpenAI submitted a response to NIST's request for information under the Executive Order on AI, outlining its approaches to evaluating AI capabilities, red teaming, and synthetic media provenance, including findings from GPT-4 biosecurity risk evaluations.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback