dual-use

Tag

Cards List
#dual-use

OpenSafeIntent: Evaluating Intent-Calibrated Safe Completion Across Dual-Use Prompt Sets

arXiv cs.CL · 9h ago Cached

OpenSafeIntent introduces a benchmark of controlled prompt sets that vary intent while holding tasks fixed, enabling evaluation of whether models calibrate assistance across benign, dual-use, and malicious variants rather than appearing safe on average.

0 favorites 0 likes
#dual-use

‘Dangerous’ AI Models Are Coming No Matter What

Wired · 2026-06-16 Cached

Anthropic's Claude Fable 5 and Mythos 5 AI models were taken offline due to a US government export-control directive, highlighting the dual-use nature of advanced AI and the inevitability that similar models will be developed by others.

0 favorites 1 likes
#dual-use

@Dan_Jeffries1: The most revealing thing about this AI leadership paper is that it reads less like a vision for innovation and more lik…

X AI KOLs Following · 2026-05-15 Cached

The thread critiques AI leadership for centralizing control under safety rhetoric, drawing parallels to 1990s encryption export restrictions. It argues that sanctions against China have accelerated its domestic chip and AI development, potentially leading to geopolitical escalation and fragmentation of the global software ecosystem.

0 favorites 0 likes
#dual-use

From hard refusals to safe-completions: toward output-centric safety training

OpenAI Blog · 2025-08-07 Cached

OpenAI introduced 'safe completions,' a new safety-training approach in GPT-5 that replaces binary refusal-based training with output-centric rewards, improving both safety and helpfulness—especially for dual-use prompts. The method penalizes unsafe outputs and rewards helpful responses, resulting in fewer and less severe safety violations compared to refusal-trained models like o3.

0 favorites 0 likes
#dual-use

Preparing for future AI risks in biology

OpenAI Blog · 2025-06-18 Cached

OpenAI publishes a comprehensive approach to managing dual-use risks from advanced AI models in biology, outlining strategies for enabling beneficial scientific discovery while preventing misuse for bioweapons development through expert collaboration, model training, detection systems, and security controls.

0 favorites 0 likes
#dual-use

Preparing for malicious uses of AI

OpenAI Blog · 2018-02-20 Cached

OpenAI co-authors a comprehensive paper forecasting malicious uses of AI and proposing mitigation strategies, developed in collaboration with leading research institutions. The work emphasizes acknowledging AI's dual-use nature, learning from cybersecurity practices, and broadening stakeholder discussions around AI security risks.

0 favorites 0 likes
#dual-use

Jul 2, 2026AnnouncementsMore details on Fable 5’s cyber safeguards and our jailbreak framework

Anthropic News · 11h ago Cached

Anthropic provides detailed information on the cyber safety classifiers for Claude Fable 5 and introduces a draft jailbreak severity framework developed with Glasswing, aiming to standardize communication about AI jailbreak risks. The company also launched a HackerOne program for reporting potential cyber jailbreaks.

0 favorites 0 likes
← Back to home

Submit Feedback