trustworthy-ai

#trustworthy-ai

Theoria: Rewrite-Acceptability Verification over Informal Reasoning States

arXiv cs.AI ↗ · 21h ago Cached

Theoria is a verification architecture that rewrites AI solutions into auditable state transitions, achieving high precision on HLE problems and detecting subtle errors like hidden premises and fabricated citations.

0 favorites 0 likes

#trustworthy-ai

Decentralized Assessment for Trustworthy AI (DATA)

Reddit r/artificial ↗ · 6d ago

The Decentralized Assessment for Trustworthy AI (DATA) is an ethical evaluation tool that allows users and communities to objectively audit AI companies based on leading ethical frameworks like UNESCO and EU guidelines.

0 favorites 0 likes

#trustworthy-ai

Auditing Framing-Sensitive Behavioral Instability in Large Language Models for Mental Health Interactions

arXiv cs.CL ↗ · 6d ago Cached

This paper investigates how contextual framing affects LLM responses in mental health interactions, finding systematic behavioral variation and demonstrating that internal representations encode framing information throughout transformer layers.

0 favorites 0 likes

#trustworthy-ai

From Sparse Features to Trustworthy Proxies: Certifying SAE-Based Interpretability

arXiv cs.LG ↗ · 2026-06-18 Cached

This paper proposes a post-hoc certification framework for sparse autoencoder (SAE) based interpretability, deriving an upper bound on the frozen language model's risk using measurable quantities. The framework is validated on GPT-2 Small, Gemma-2B, and Llama-3-8B, showing non-vacuous bounds and revealing depth-dependent behavior.

0 favorites 0 likes

#trustworthy-ai

Upsolve AI

Product Hunt ↗ · 2026-06-17

Upsolve AI is a tool for building grounded, governed, and trustworthy data agents.

0 favorites 0 likes

#trustworthy-ai

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

arXiv cs.AI ↗ · 2026-06-17 Cached

This paper introduces LegalHalluLens, a framework for auditing hallucinations in legal AI, providing typed hallucination profiles and a Risk Direction Index to improve trustworthy deployment.

0 favorites 0 likes

#trustworthy-ai

NeuroSymbolic AI for Legal AI-TRISM: Trustworthy, Reliable, Interpretable, Safe Models

arXiv cs.AI ↗ · 2026-06-16 Cached

This position paper proposes the TRISM framework that integrates NeuroSymbolic AI with LLMs and RAG to address hallucination and interpretability issues in legal AI, introducing RASOR RAG for generating interpretable rationales and formalizing symbolic legal knowledge bases.

0 favorites 0 likes

#trustworthy-ai

The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper audits LangChain, AutoGPT, and OpenAI Agents SDK for architectural safety guarantees and finds no native compliance with containment principles, demonstrating that memory poisoning can cause persistent failures; it introduces lightweight mechanisms to eliminate such attacks.

0 favorites 0 likes

#trustworthy-ai

Google DeepMind is worried about what happens when millions of agents start to interact

MIT Technology Review ↗ · 2026-06-11 Cached

Google DeepMind, together with Schmidt Sciences, ARIA, the Cooperative AI foundation, and Google.org, has launched a $10 million funding initiative to research the safety of multi-agent AI systems, aiming to prevent risks such as scams, prompt injections, and cyberattacks as AI agents become widespread.

0 favorites 0 likes

#trustworthy-ai

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

arXiv cs.AI ↗ · 2026-06-11 Cached

This paper studies adversarial attacks on continuous data summarization under similarity-level perturbations via DR-submodular optimization, proposing multi-target attack generation as a min-max problem and robust defense as a regularized max-min problem, with theoretical guarantees and experiments.

0 favorites 0 likes

#trustworthy-ai

Supporting Europe’s work in ensuring a trustworthy AI ecosystem

OpenAI Blog ↗ · 2026-06-11 Cached

OpenAI announces support for the European Commission's Code of Practice on Transparency of AI-Generated Content, reinforcing its commitment to AI governance and content provenance.

0 favorites 0 likes

#trustworthy-ai

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

arXiv cs.LG ↗ · 2026-06-10 Cached

This paper introduces MGAP, a training-free decoding method that reduces hallucinations in Multimodal Large Language Models by adaptively suppressing only the harmful parts of language priors while preserving the model's semantic manifold. The method outperforms prior baselines on POPE and CHAIR benchmarks.

0 favorites 0 likes

#trustworthy-ai

Sequential statistical inference for Large Language Models: Representation, validity, and monitoring

arXiv cs.LG ↗ · 2026-06-09 Cached

This paper argues for a sequential inference framework to enhance LLM trustworthiness by modeling interactions as dependent stochastic processes, ensuring validity under repeated use, and enabling online monitoring for behavioral shifts.

0 favorites 0 likes

#trustworthy-ai

The new Claude scored 0% on "confidently reporting wrong answers" in testing. Here's a prompt that takes advantage of it on anything important.

Reddit r/ArtificialInteligence ↗ · 2026-05-31

Anthropic's Claude Opus 4.8 update dramatically reduces confident but incorrect answers, scoring 0% on reporting flawed results, and a prompt is provided to leverage this improvement for critical self-critique.

0 favorites 0 likes

#trustworthy-ai

A shared playbook for trustworthy third party evaluations

OpenAI Blog ↗ · 2026-05-29 Cached

OpenAI shares lessons and recommended approaches for designing trustworthy third-party evaluations of frontier models, emphasizing the critical role of evaluation harnesses and validity checks.

0 favorites 0 likes

#trustworthy-ai

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

arXiv cs.AI ↗ · 2026-05-27 Cached

This paper introduces a relevance-sensitive evaluation suite for legal AI, demonstrating that LLMs are overly sensitive to legally irrelevant perturbations, and proposes LexGuard, an adversarial multi-agent framework using formal reasoning to improve legal reasoning reliability.

0 favorites 0 likes

#trustworthy-ai

Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges

arXiv cs.CL ↗ · 2026-05-26 Cached

This paper introduces a causal framework to quantify rationalization bias in LLM judges, where verdicts and explanations are influenced by non-evidential cues rather than underlying texts. It proposes cue interventions, anchoring metrics, and the Proof-Before-Preference mitigation protocol, demonstrating improved cue invariance.

0 favorites 0 likes

#trustworthy-ai

Ontological Knowledge Blocks: Executable Compliance and Profile-Based Validation for Trustworthy AI Systems

arXiv cs.AI ↗ · 2026-05-25 Cached

This paper introduces Ontological Knowledge Blocks (OKBs), a programmable governance infrastructure that compiles regulatory obligations into machine-checkable constraints for trustworthy AI systems, with prototype evaluation in HPC resource allocation.

0 favorites 0 likes

#trustworthy-ai

The Deterministic Horizon: Impossibility Results as Design Specifications for Trustworthy AI Systems

arXiv cs.AI ↗ · 2026-05-25 Cached

This paper proposes that impossibility results can serve as design specifications for building trustworthy AI systems, presenting a theoretical framework for ensuring reliability and safety.

0 favorites 0 likes

#trustworthy-ai

Why We Build

Reddit r/artificial ↗ · 2026-05-24

An opinion piece advocating for AI systems that deliver transparent, verifiable knowledge from domain experts, enabling discovery-based learning and countering centralized propaganda.

0 favorites 0 likes

trustworthy-ai

Submit Feedback