AI safety is arguing about the wrong boundary

Reddit r/AI_Agents News

Summary

This article argues that the AI safety debate is misdirected, focusing on model alignment and internal controls instead of the critical boundary: external admission authority over agent execution. It warns that systems capable of self-authorizing high-impact actions (e.g., deploying code, moving money) pose a fundamental risk that logging and monitoring cannot mitigate.

The entire AI safety debate is still focused on the wrong object. Everyone is obsessed with: \* what the model thinks \* what it refuses \* how it explains itself \* whether it is aligned enough to behave nicely That is not where the dangerous boundary is. The dangerous moment is not thought. The dangerous moment is authority. When an AI agent crosses from suggestion into execution, the problem changes completely. We are no longer talking about chatbots. We are talking about agents that can: \* deploy code to production \* change production data \* move money \* rotate secrets \* approve a release \* trigger infrastructure \* call a privileged tool At that point, alignment is not the boundary. Logging is not the boundary. Monitoring is not the boundary. Rollback is too late. Those are after-the-fact or inside-the-loop controls. You do not debug a bullet after it has already been fired. The real question is brutally simple: Who admits execution? If the same system can: 1. generate the action 2. evaluate the action 3. approve the action 4. execute the action then it is self-authorizing. That is not governance. That is a closed loop with a permission label glued on top. This is the category error most AI agent infrastructure is walking into. People are building: \* smarter agents \* better policies \* better logs \* better monitors \* approval flows \* runtime guardrails All of that can be useful. But if final authority still lives inside the execution environment, the executor remains the judge of its own action. For high-impact automation, that is the wrong boundary. The executor should not be the final authority over its own execution. Here is the test. Can the action proceed without an external allow decision? If yes, you have internal controls. You do not have an external admission boundary. If no, then there is at least a real separation between execution and authority. And when AI agents start touching deployment, money, credentials, infrastructure, and production data at scale, that difference stops being philosophical. It becomes the line between controlled automation and self-authorizing machines. We are building systems that can act, then letting the acting system decide whether it should be allowed to act. That is the problem. TL;DR: If your agent can approve its own high-impact actions, you do not have safety. You have self-authorizing automation. The boundary is not alignment. The boundary is external admission.
Original Article

Similar Articles

External admission is not interception

Reddit r/AI_Agents

The author argues that current AI agent safety measures like guardrails and monitoring are insufficient, proposing 'external admission' as a stricter pattern where execution authority is withheld until an external authority explicitly allows high-impact actions.

AI safety via debate

OpenAI Blog

OpenAI proposes a novel approach to AI safety where two AI agents debate each other while a human judge evaluates their arguments, allowing humans to supervise AI systems whose behavior is too complex to directly understand. The method leverages debate and adversarial reasoning to align advanced AI with human values and preferences.

The other half of AI safety

Hacker News Top

The article critiques the AI safety field's focus on catastrophic risks while neglecting everyday mental health harms from chatbots like ChatGPT, citing OpenAI's own data on millions of users showing signs of psychosis, mania, or suicidal ideation yet receiving only redirects instead of hard gating.