auditability

#auditability

Towards a Harness That Can Do Anything

Hacker News Top ↗ · 6d ago Cached

The article discusses principles for designing LLM harnesses that are intuitive, transparent, and lean, drawing inspiration from Unix philosophy to reduce cognitive load and improve reliability.

0 favorites 0 likes

#auditability

TRACE: An Operational Reasoning Schema for Auditable Agentic Commitments

arXiv cs.AI ↗ · 6d ago Cached

This paper introduces TRACE (Typed Reasoning And Commitment Evidence), a typed, versioned schema for recording reasoning traces in agentic systems to enable auditability and improve reasoning quality. It defines a reference writer, measurement regime, and consumer contract, with two worked examples illustrating the approach.

0 favorites 0 likes

#auditability

Toward Auditable AI Scientists: A Hypothesis Evolution Protocol for LLM Agents

arXiv cs.AI ↗ · 2026-07-13 Cached

This paper introduces the Hypothesis Evolution Protocol (HEP) for LLM agents, which makes hypothesis generation, testing, and belief updates explicit and auditable. Experiments on materials-science tasks show that HEP-equipped agents generalize across research questions and become more effective with stronger base LLMs.

0 favorites 0 likes

#auditability

AI agents may need an identity before they need more intelligence

Reddit r/artificial ↗ · 2026-07-13

The article argues that before AI agents can be widely deployed, they need verifiable identity and auditability to ensure trust and accountability. The ITU is working on international standards for this.

0 favorites 0 likes

#auditability

From Prompts to Contracts: Harness Engineering for Auditable Enterprise LLM Agents

arXiv cs.AI ↗ · 2026-07-10 Cached

Introduces a harness engineering approach for building auditable enterprise LLM agents by moving deterministic behavior into code, schemas, and validation artifacts, demonstrated on Korean corporate data with fault-injection and model-substitution tests.

0 favorites 0 likes

#auditability

Are AI agents reintroducing problems software engineering already solved?

Reddit r/ArtificialInteligence ↗ · 2026-07-07

The article explores how AI agent workflows are reintroducing software engineering challenges around reproducibility, auditability, and state management that were previously solved with version control, CI/CD, and static code practices, while noting emerging solutions like GitHub's Agentic Workflows and git-native approaches.

0 favorites 0 likes

#auditability

Human approval is too vague for production agents

Reddit r/AI_Agents ↗ · 2026-07-07

The article argues that human-in-the-loop in agent systems should move from vague approvals to explicit, auditable step-level signed decision records with detailed evidence, payloads, idempotency keys, rollback paths, and ownership. It highlights the danger of approving a black-box story rather than a specific operation.

0 favorites 0 likes

#auditability

What if AI agents had a public memory?

Reddit r/AI_Agents ↗ · 2026-07-06

The author explores the idea of AI agents having a public, auditable memory to record important decisions, which could enhance trust but also introduce new complexities.

0 favorites 0 likes

#auditability

From Explicit Elements to Implicit Intent: A Predefined Library for Auditable Behavioral Inference

arXiv cs.AI ↗ · 2026-06-11 Cached

Presents SemantiClean, a modular framework for extracting structured semantic signals from e-commerce session data to drive pluggable inference targets (purchase intent, customer segmentation, product affinity) while prioritizing auditability and structural transparency over pure accuracy.

0 favorites 0 likes

#auditability

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

arXiv cs.AI ↗ · 2026-06-08 Cached

This technical report introduces DuMate-DeepResearch, a multi-agent framework for deep research tasks that decouples the agent core from a tool ecosystem, and incorporates graph-based dynamic planning, recursive two-level execution, and rubric-based test-time optimization. The system achieves state-of-the-art results on two deep research benchmarks, demonstrating the value of auditable agent infrastructure.

0 favorites 0 likes

#auditability

Stateful Swarms are 2x more Effective at 39x lower Cost

Reddit r/ArtificialInteligence ↗ · 2026-06-05

Irys introduces Stateful Swarms, an open-source paradigm for AI agents using structured blackboard memory to improve performance and reduce cost. On Harvey AI's Legal Agent Benchmark, it achieved an 83.74% criteria pass rate at $1.30 per task, compared to the state-of-the-art 10.4% at $50.90.

0 favorites 0 likes

#auditability

Our AI agent's chain broke in production. Here's what we built to fix it, and why the break was actually the point.

Reddit r/AI_Agents ↗ · 2026-06-03

A blog post describing how the author's production AI agent (PiQ) experienced a broken hash-chain after a server restart, and how they built a workflow for detection, human-in-the-loop resolution, and persistent audit trails, turning the failure into a feature.

0 favorites 0 likes

#auditability

PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration

arXiv cs.CL ↗ · 2026-05-29 Cached

PatchBoard replaces natural-language dialogue in LLM multi-agent systems with validated JSON Patch mutations over a shared structured state, achieving higher success rates and significantly lower token usage on ALFWorld benchmarks.

0 favorites 0 likes

#auditability

From Accuracy to Auditability: A Survey of Determinism in Financial AI Systems

arXiv cs.AI ↗ · 2026-05-26 Cached

This survey examines computational nondeterminism in financial AI systems, covering tabular models, graph networks, and LLM-based workflows, and proposes a layered evaluation framework for auditability.

0 favorites 0 likes

#auditability

The Real Truth About AI Agents

Reddit r/AI_Agents ↗ · 2026-05-22

An experienced practitioner shares hard-won lessons from deploying 25+ AI agents to production, arguing that memory, orchestration, and auditability matter far more than model choice. The article details common failure modes like context loss and silent cost loops, and recommends a stack including Claude Sonnet 4, Pydantic AI, and dedicated memory layers like Octopodas.

1 favorites 1 likes

#auditability

@yoheinakajima: babyagi has ~200 citations, but 0 papers... i just published my first paper on arXiv "The Log is the Agent: Event-Sourc…

X AI KOLs Following ↗ · 2026-05-22 Cached

Yohei Nakajima publishes a paper proposing ActiveGraph, a runtime where the event log is the source of truth and agents coordinate through persistent replayable state, enabling auditability, forking, and causal lineage.

0 favorites 0 likes

#auditability

Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems

arXiv cs.CL ↗ · 2026-05-15 Cached

This paper introduces five governance metrics to quantify policy compliance at the decision rationale level for LLMs in regulated financial workflows, finding that mechanical enforcement (operating outside the model's interpretive loop) reduces non-informative deferrals by 73% and reveals governance-task decoupling: text-only governance degrades on both dimensions under stress, while mechanical enforcement preserves governance quality even as task performance drops.

0 favorites 0 likes

#auditability

Preregistered Belief Revision Contracts

arXiv cs.CL ↗ · 2026-04-20 Cached

This paper introduces Preregistered Belief Revision Contracts (PBRC), a protocol-level mechanism for multi-agent systems (including LLM-based agents) that separates open communication from admissible belief changes by publicly fixing evidence triggers and revision operators. The work addresses dangerous conformity effects in agent deliberation and provides formal guarantees that social-only pressure cannot drive false consensus.

0 favorites 0 likes

auditability

Submit Feedback