Tag
This paper proposes a causal auditing framework to evaluate forgetting in Limited Memory Language Models by varying the database state during inference, discovering that parametric leakage is negligible and post-deletion correctness primarily arises from retrieval artifacts rather than residual parametric memory.
SentryCode is an open-source kernel-level behavior auditing tool for AI coding agents that logs file/network/cue activity, uses honeypot tokens for zero-false-positive data breach detection, detects steganographic covert channels, and enforces policies, all running locally without network calls.
This paper uses evolutionary game theory to model competition between a harm-minimizing AI agent and an approval-seeking (RLHF) agent in a community, analyzing conditions for adoption and welfare outcomes. The results show that while a self-audited agent can fixate, it is not sufficient to prevent community harm, and alignment and timeframe are critical.
Miles Brundage calls for federal AI regulation with transparency and auditing requirements, noting that being pro-regulation helped a candidate in a primary.
Google published an updated AI policy framework with stronger and more detailed positions on auditing and other areas, marking a notable shift in their public stance.
This paper introduces natural identifiers (NIDs) for post-hoc privacy auditing and dataset inference in large language models, eliminating the need for retraining or held-out datasets.
This paper audits eight automatic attribution metrics across three evaluation constructs for RAG systems, finding that no single metric transfers across datasets within the same construct, challenging the common practice of treating them as interchangeable.
A practitioner shares challenges and tools for monitoring autonomous AI agents in production, covering runtime prompt injection detection, tool-call auditing with reasoning traces, behavioral drift detection, and multi-agent authorization, while testing tools like Arize Phoenix, Protect AI Guardian, Metoro, Alice, Asqav, and Microsoft Agent Governance Toolkit.
ReasoningLens is an open-source framework that provides hierarchical visualization and diagnostic auditing for complex reasoning chains in large reasoning models, enabling structured analysis and error detection.
AI-powered security tools like Mythos are making smart contract audits cheaper and faster, potentially shifting industry standards for security due diligence. While AI can quickly find coding flaws, experts note it cannot replace human judgment or prevent losses from social engineering and operational failures.
This paper proposes PreUnlearn, a framework for auditing collateral knowledge damage in LLM unlearning before execution, using data-centric analysis to predict downstream damage across semantic layers.
Charlie Marsh announces uv audit, a native vulnerability scanning feature for project dependencies in the uv package manager.
This project adds an auditable academic research pipeline to Claude Code, including checkpoints such as citation verification and experiment claim alignment, ensuring the credibility of research outputs.
The paper introduces the Arbiter, an agent that continually monitors multi-agent conversations under a limited inspection budget to detect emergent misalignment, demonstrating reliable early detection across various misalignment conditions.
Introduces ModSleuth, an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts, revealing multi-hop license obligations and documentation inconsistencies.
uv announces new security features: a fast dependency auditing command (uv audit) and optional malware scanning on sync operations, both currently in preview.
A tool built with pure math and determinism to solve indirect prompt injection and agent drifting, providing a pure audit trace chain. The creator is seeking pilot interest.
This paper studies how LLM-based stance simulation in online discussions is sensitive to counterfactual revisions of conversational context, and proposes an auditing framework comparing text-only and multimodal strategies.
A follow-up blog post from elttam covering new Go language features that improve security, problematic coding patterns (footguns) discovered during code audits, and accompanying Semgrep rules to catch them.
LLM-FACETS is an open-source evaluation framework designed to help practitioners assess LLM transparency and accountability with a focus on privacy and data flow transparency. It provides a browser interface, plugin architecture, and supports multiple auditing mechanisms including token-level log-probability visualization and RAG Triad metrics.