Tag
Researchers identify a systematic safety failure in LLMs where reformulating harmful requests as forced-choice multiple-choice questions (MCQs) bypasses refusal behavior, even in models that reject equivalent open-ended prompts. Evaluated across 14 proprietary and open-source models, the study reveals current safety benchmarks substantially underestimate risks in structured decision-making settings.
OpenAI releases GPT-5.1-Codex-Max, a frontier agentic coding model trained on software engineering tasks with native multi-context window support through compaction, designed to handle millions of tokens in a single task. The system card details comprehensive safety measures and preparedness framework evaluations across cybersecurity, biology, and AI self-improvement domains.
OpenAI releases GPT-5.1 Instant and GPT-5.1 Thinking models with improved conversational abilities and adaptive reasoning. The system card addendum documents safety mitigations including expanded evaluations for mental health and emotional reliance.
OpenAI releases gpt-oss-120b and gpt-oss-20b, open-weight reasoning models under Apache 2.0 license designed for agentic workflows with strong instruction following, tool use, and chain-of-thought capabilities. The release includes comprehensive safety evaluations confirming the models do not reach high capability thresholds for biological, chemical, or cyber risks even under adversarial fine-tuning.
OpenAI launches Deep Research, an agentic capability powered by an early version of o3 that conducts multi-step internet research for complex tasks, with comprehensive safety testing and privacy protections implemented before rollout to Pro users.
OpenAI releases the o1 System Card detailing safety evaluations and preparedness framework assessments for the o1 and o1-mini models, which use chain-of-thought reasoning trained with large-scale reinforcement learning to improve safety and robustness.
OpenAI publishes acknowledgements for external red teamers and evaluators who contributed to GPT-4o's safety testing and system card development. The document credits numerous individual researchers and organizations including METR and Apollo Research.