safety-evaluation

Tag

Cards List
#safety-evaluation

When Choices Become Risks: Safety Failures of Large Language Models under Multiple-Choice Constraints

arXiv cs.CL · 2026-04-21 Cached

Researchers identify a systematic safety failure in LLMs where reformulating harmful requests as forced-choice multiple-choice questions (MCQs) bypasses refusal behavior, even in models that reject equivalent open-ended prompts. Evaluated across 14 proprietary and open-source models, the study reveals current safety benchmarks substantially underestimate risks in structured decision-making settings.

0 favorites 0 likes
#safety-evaluation

GPT-5.1-Codex-Max System Card

OpenAI Blog · 2025-11-19 Cached

OpenAI releases GPT-5.1-Codex-Max, a frontier agentic coding model trained on software engineering tasks with native multi-context window support through compaction, designed to handle millions of tokens in a single task. The system card details comprehensive safety measures and preparedness framework evaluations across cybersecurity, biology, and AI self-improvement domains.

0 favorites 0 likes
#safety-evaluation

GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum

OpenAI Blog · 2025-11-12 Cached

OpenAI releases GPT-5.1 Instant and GPT-5.1 Thinking models with improved conversational abilities and adaptive reasoning. The system card addendum documents safety mitigations including expanded evaluations for mental health and emotional reliance.

0 favorites 0 likes
#safety-evaluation

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI Blog · 2025-08-05 Cached

OpenAI releases gpt-oss-120b and gpt-oss-20b, open-weight reasoning models under Apache 2.0 license designed for agentic workflows with strong instruction following, tool use, and chain-of-thought capabilities. The release includes comprehensive safety evaluations confirming the models do not reach high capability thresholds for biological, chemical, or cyber risks even under adversarial fine-tuning.

0 favorites 0 likes
#safety-evaluation

Deep research System Card

OpenAI Blog · 2025-02-25 Cached

OpenAI launches Deep Research, an agentic capability powered by an early version of o3 that conducts multi-step internet research for complex tasks, with comprehensive safety testing and privacy protections implemented before rollout to Pro users.

0 favorites 0 likes
#safety-evaluation

OpenAI o1 System Card

OpenAI Blog · 2024-12-05 Cached

OpenAI releases the o1 System Card detailing safety evaluations and preparedness framework assessments for the o1 and o1-mini models, which use chain-of-thought reasoning trained with large-scale reinforcement learning to improve safety and robustness.

0 favorites 0 likes
#safety-evaluation

GPT-4o System Card External Testers Acknowledgements

OpenAI Blog · 2024-08-08 Cached

OpenAI publishes acknowledgements for external red teamers and evaluators who contributed to GPT-4o's safety testing and system card development. The document credits numerous individual researchers and organizations including METR and Apollo Research.

0 favorites 0 likes
← Back to home

Submit Feedback