Need brutal feedback: I built a recorder for AI agent runs

Reddit r/AI_Agents 05/18/26, 07:54 PM Tools

Summary

A developer built agentproof-recorder to record AI agent runs and detect rule violations, seeking feedback on whether this is a common pain point.

I have been using AI coding agents more seriously lately and one thing started annoying me. I needed something to control access of sensitive material. Being high on caffine I went to codex and built agentproof-recorder. It records an agent run and checks whether the agent violated the rules during execution. I am not sure yet if this is a real pain for others or just something I personally got annoyed by while working with agents. So I want brutal feedback: Are you facing this while using agents and how are you solving it? Would a tool like this be useful?

Original Article

Similar Articles

How are people handling long-term memory + replay/debugging for AI agents?

Reddit r/AI_Agents

A developer discusses limitations in current AI agent memory systems and proposes a new memory layer tool with episode storage and replay debugging, seeking community validation.

How do you actually debug your AI agents?

Reddit r/AI_Agents

Developer shares struggles debugging AI agents in production, highlighting issues with hallucinations, regression from prompt changes, and high API costs, asking the community for strategies.

How do you catch when an AI agent skips something it was supposed to do?

Reddit r/AI_Agents

A developer discusses challenges in detecting when AI agents silently skip actions, highlighting the difficulty of distinguishing legitimate omissions (e.g., policy blocks) from failures, and calls for collaboration on agent reliability tooling.

Wasting hundreds on API credits with runaway agents is basically a rite of passage at this point. Here's mine.

Reddit r/artificial

A developer built a real-time 3D visualization dashboard for monitoring AI agent working memory after losing $400+ to runaway agent loops, using color-coded nodes and edges to detect reasoning loops before they become costly. The post reflects on agent observability as an emerging category distinct from traditional microservice monitoring.

AI agents still suck, so I built my own

Reddit r/AI_Agents

The author built a custom AI agent application wrapping Claude Code and upcoming Codex support, focusing on composable workflows and seeking community feedback.

Similar Articles

How are people handling long-term memory + replay/debugging for AI agents?

How do you actually debug your AI agents?

How do you catch when an AI agent skips something it was supposed to do?

Wasting hundreds on API credits with runaway agents is basically a rite of passage at this point. Here's mine.

AI agents still suck, so I built my own

Submit Feedback