risk-assessment

Tag

Cards List
#risk-assessment

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

arXiv cs.AI · 2026-06-17 Cached

This paper introduces LegalHalluLens, a framework for auditing hallucinations in legal AI, providing typed hallucination profiles and a Risk Direction Index to improve trustworthy deployment.

0 favorites 0 likes
#risk-assessment

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

arXiv cs.LG · 2026-06-17 Cached

This paper proposes a hierarchical Bayesian credibility framework for pricing autonomous vehicle liability insurance under operational design domain (ODD) shift, pooling sparse experience across cities and software versions using a learned ODD-similarity kernel. Demonstrated on Waymo crash data, the method outperforms no-pooling approaches and addresses the prospective ratemaking challenge for autonomous driving systems.

0 favorites 0 likes
#risk-assessment

Predicting model behavior before release by simulating deployment

OpenAI Blog · 2026-06-16 Cached

OpenAI introduces Deployment Simulation, a method to simulate future model deployments by replaying past conversations in a privacy-preserving manner with candidate models to predict real-world behavior and identify novel misalignment before release.

0 favorites 0 likes
#risk-assessment

M3 scores well on SWE-Bench but that's not why Im impressed its the stuff no benchmark measures.

Reddit r/AI_Agents · 2026-06-04

M3 achieves solid benchmark scores but impresses with its ability to perform risk assessment and pre-mortem analysis before making code changes, highlighting a more cautious and thorough approach to refactoring in messy legacy repos.

0 favorites 0 likes
#risk-assessment

Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment

arXiv cs.LG · 2026-06-03 Cached

Proposes Latent-Predictive Counterfactual Decoupling (LPCD) to address tactical out-of-distribution shifts in live streaming risk assessment by decoupling stable malicious intent from evolving narrative tactics at the latent level, achieving superior performance on large-scale industrial datasets.

0 favorites 0 likes
#risk-assessment

PrivacyAkinator: Articulating Key Privacy Design Decisions by Answering LLM-Generated Multiple-choice Questions

arXiv cs.AI · 2026-05-22 Cached

This paper presents PrivacyAkinator, an interactive tool that helps novice developers articulate privacy design decisions via LLM-generated multiple-choice questions, achieving 47% more key decisions in 73% less time compared to NIST's PRAM methodology.

0 favorites 0 likes
#risk-assessment

AI can design viruses, toxins and other bioweapons. How worried should we be?

Reddit r/ArtificialInteligence · 2026-05-13 Cached

The article discusses growing concerns over AI tools' potential to design dangerous bioweapons, citing a recent Chinese study on conotoxin design as a flashpoint for debate between biosecurity risks and scientific benefits.

0 favorites 0 likes
#risk-assessment

What Will Happen Next: Large Models-Driven Deduction for Emergency Instances

arXiv cs.AI · 2026-05-12 Cached

This paper introduces WLDS, a large-model-driven system for simulating and deducing emergency instances by leveraging controllable randomness and cross-domain knowledge. It presents the Emergency Instances Deduction (EID) benchmark and demonstrates high-fidelity simulation capabilities across multiple domains.

0 favorites 0 likes
#risk-assessment

Towards Security-Auditable LLM Agents: A Unified Graph Representation

arXiv cs.AI · 2026-05-11 Cached

This paper introduces Agent-BOM, a unified graph representation for security auditing in LLM-based agentic systems. It addresses the semantic gap in post-hoc auditing by modeling static capabilities and dynamic runtime states to detect complex attack chains like memory poisoning and tool misuse.

0 favorites 0 likes
#risk-assessment

METR evaluated an early version of Claude Mythos

Reddit r/singularity · 2026-05-09

METR evaluated an early version of Claude Mythos Preview in March 2026 using their time-horizons task suite, estimating a 50%-time-horizon of at least 16 hours, indicating the model is at the upper end of what current benchmarks can measure, with caveats about stability at longer time ranges.

0 favorites 0 likes
#risk-assessment

Strengthening our Frontier Safety Framework

Google DeepMind Blog · 2025-10-23 Cached

DeepMind published the third iteration of its Frontier Safety Framework, expanding risk domains to include harmful manipulation and misalignment risks, with refined risk assessment processes and enhanced governance protocols for advanced AI models.

0 favorites 0 likes
#risk-assessment

Estimating worst case frontier risks of open weight LLMs

OpenAI Blog · 2025-08-05 Cached

OpenAI researchers study worst-case frontier risks of releasing open-weight LLMs through malicious fine-tuning (MFT) in biology and cybersecurity domains, finding that open-weight models underperform frontier closed-weight models and don't substantially advance harmful capabilities.

0 favorites 0 likes
#risk-assessment

Our updated Preparedness Framework

OpenAI Blog · 2025-04-15 Cached

OpenAI released an updated Preparedness Framework with sharper focus on high-risk AI capabilities, introducing clearer criteria for prioritizing risks and new Research Categories for emerging threats like autonomous replication and sandbagging alongside established Tracked Categories for biological, chemical, and cybersecurity capabilities.

0 favorites 0 likes
#risk-assessment

Taking a responsible path to AGI

Google DeepMind Blog · 2025-04-02 Cached

DeepMind publishes a comprehensive approach to AGI safety and security, outlining a systematic framework to address misuse, misalignment, accidents, and structural risks as artificial general intelligence approaches reality within the coming years.

0 favorites 0 likes
#risk-assessment

Building an early warning system for LLM-aided biological threat creation

OpenAI Blog · 2024-01-31 Cached

OpenAI conducted a study with 100 participants to evaluate whether GPT-4 meaningfully increases access to dangerous biological threat creation information compared to internet-only baselines, as part of their Preparedness Framework for AI safety. The research introduces an early warning evaluation methodology to detect AI-enabled biorisk uplift and serves as a potential tripwire for flagging models that require further safety testing.

0 favorites 0 likes
#risk-assessment

Frontier risk and preparedness

OpenAI Blog · 2023-10-26 Cached

OpenAI announced the winners of its Preparedness Challenge, which identified unique risks associated with frontier AI systems. The top ten submissions highlighted concerns including financial system manipulation, information leakage, medical harm, cyberattacks, and persuasion-based threats, with 70% of entries emphasizing AI's potential to enhance malicious persuasion capabilities.

0 favorites 0 likes
#risk-assessment

A hazard analysis framework for code synthesis large language models

OpenAI Blog · 2022-07-25 Cached

OpenAI presents a hazard analysis framework for evaluating safety risks associated with code synthesis LLMs like Codex, examining technical, social, political, and economic impacts through a novel evaluation methodology for code generation capabilities.

0 favorites 0 likes
← Back to home

Submit Feedback