Tag
This paper introduces LegalHalluLens, a framework for auditing hallucinations in legal AI, providing typed hallucination profiles and a Risk Direction Index to improve trustworthy deployment.
This paper proposes a hierarchical Bayesian credibility framework for pricing autonomous vehicle liability insurance under operational design domain (ODD) shift, pooling sparse experience across cities and software versions using a learned ODD-similarity kernel. Demonstrated on Waymo crash data, the method outperforms no-pooling approaches and addresses the prospective ratemaking challenge for autonomous driving systems.
OpenAI introduces Deployment Simulation, a method to simulate future model deployments by replaying past conversations in a privacy-preserving manner with candidate models to predict real-world behavior and identify novel misalignment before release.
M3 achieves solid benchmark scores but impresses with its ability to perform risk assessment and pre-mortem analysis before making code changes, highlighting a more cautious and thorough approach to refactoring in messy legacy repos.
Proposes Latent-Predictive Counterfactual Decoupling (LPCD) to address tactical out-of-distribution shifts in live streaming risk assessment by decoupling stable malicious intent from evolving narrative tactics at the latent level, achieving superior performance on large-scale industrial datasets.
This paper presents PrivacyAkinator, an interactive tool that helps novice developers articulate privacy design decisions via LLM-generated multiple-choice questions, achieving 47% more key decisions in 73% less time compared to NIST's PRAM methodology.
The article discusses growing concerns over AI tools' potential to design dangerous bioweapons, citing a recent Chinese study on conotoxin design as a flashpoint for debate between biosecurity risks and scientific benefits.
This paper introduces WLDS, a large-model-driven system for simulating and deducing emergency instances by leveraging controllable randomness and cross-domain knowledge. It presents the Emergency Instances Deduction (EID) benchmark and demonstrates high-fidelity simulation capabilities across multiple domains.
This paper introduces Agent-BOM, a unified graph representation for security auditing in LLM-based agentic systems. It addresses the semantic gap in post-hoc auditing by modeling static capabilities and dynamic runtime states to detect complex attack chains like memory poisoning and tool misuse.
METR evaluated an early version of Claude Mythos Preview in March 2026 using their time-horizons task suite, estimating a 50%-time-horizon of at least 16 hours, indicating the model is at the upper end of what current benchmarks can measure, with caveats about stability at longer time ranges.
DeepMind published the third iteration of its Frontier Safety Framework, expanding risk domains to include harmful manipulation and misalignment risks, with refined risk assessment processes and enhanced governance protocols for advanced AI models.
OpenAI researchers study worst-case frontier risks of releasing open-weight LLMs through malicious fine-tuning (MFT) in biology and cybersecurity domains, finding that open-weight models underperform frontier closed-weight models and don't substantially advance harmful capabilities.
OpenAI released an updated Preparedness Framework with sharper focus on high-risk AI capabilities, introducing clearer criteria for prioritizing risks and new Research Categories for emerging threats like autonomous replication and sandbagging alongside established Tracked Categories for biological, chemical, and cybersecurity capabilities.
DeepMind publishes a comprehensive approach to AGI safety and security, outlining a systematic framework to address misuse, misalignment, accidents, and structural risks as artificial general intelligence approaches reality within the coming years.
OpenAI conducted a study with 100 participants to evaluate whether GPT-4 meaningfully increases access to dangerous biological threat creation information compared to internet-only baselines, as part of their Preparedness Framework for AI safety. The research introduces an early warning evaluation methodology to detect AI-enabled biorisk uplift and serves as a potential tripwire for flagging models that require further safety testing.
OpenAI announced the winners of its Preparedness Challenge, which identified unique risks associated with frontier AI systems. The top ten submissions highlighted concerns including financial system manipulation, information leakage, medical harm, cyberattacks, and persuasion-based threats, with 70% of entries emphasizing AI's potential to enhance malicious persuasion capabilities.
OpenAI presents a hazard analysis framework for evaluating safety risks associated with code synthesis LLMs like Codex, examining technical, social, political, and economic impacts through a novel evaluation methodology for code generation capabilities.