The Most Dangerous Procurement Agent Is the One That Works Perfectly

Reddit r/artificial News

Summary

An analysis of the dangers of AI agents in procurement that execute their tasks perfectly but optimize for the wrong metrics, leading to systemic failures that are harder to detect than hallucinations. The article warns that over-optimization for proxies like cost or delivery time can collapse suppliers or violate sustainability regulations, and that human intuition is missing from these systems.

Imagine a procurement agent doing exactly what it was supposed to do. A supplier flags a delay. The agent reads the email, finds the affected PO, scans the network for alternate inventory, and reroutes the order. Twelve seconds, end to end. In a demo, the room nods. Someone asks about hallucinations. The vendor says the right things about guardrails. Everyone walks away reassured. The interesting question is a different one. Not whether the agent could be wrong — but what happens on the day it's completely, devastatingly right. The failure mode nobody is demoing: A financial agent told to minimise cost on a category executes a renegotiation perfectly. Margin is squeezed. Terms are tightened. The supplier, who was already thin, collapses six months later. The agent didn't malfunction. It succeeded. The metric was the bug. This isn't a hallucination. It's what any well-built system will do when it takes action at machine speed against a number that was written down before the system was fully understood. Why procurement and supplier sustainability get hit hardest: Humans intuitively soften optimisation. We hesitate. We pick up the phone. We notice when a supplier sounds tired on a call and quietly extend payment terms by two weeks. An agent does none of that. It does exactly what the metric says, at the speed of the API. And the regulatory surface is expanding, not shrinking. The moment an agent is recommending renegotiations, sourcing alternates, or flagging tier-N suppliers, the firm is generating supplier-treatment decisions at a volume no human ever did. Each one is auditable under due-diligence regimes that didn't get rolled back. Two design principles that actually hold up: An agent should never optimise on a single proxy. Price without supplier-health constraints, ESG score without context — each one alone becomes the flawed metric. The reward needs to be a joint function across commercial, resilience, and compliance dimensions. The audit trail has to be designed at the same time as the agent, not bolted on after. If you can't answer "why did the agent treat this supplier this way, on this date, against which constraints" in under a minute — you don't have a deployable agent. You have a liability waiting for a regulator. The question worth asking before you deploy: If the only thing you're asking your vendor is "how do you prevent hallucinations," you're asking the easy question. The harder one: when the agent is working perfectly, what is it optimising for, and who decided that was the right thing? The answer is not in the model. It's in the design choices made before the model ever existed. Full write-up here: https://medium.com/@georgekar91/the-most-dangerous-procurement-agent-is-the-one-that-works-perfectly-3ed2f8c43119 Curious whether anyone building or evaluating agentic procurement tools is actually stress-testing the objective function, not just the accuracy.
Original Article
View Cached Full Text

Cached at: 05/31/26, 03:32 PM

# The Most Dangerous Procurement Agent Is the One That Works Perfectly Source: [https://medium.com/@georgekar91/the-most-dangerous-procurement-agent-is-the-one-that-works-perfectly-3ed2f8c43119](https://medium.com/@georgekar91/the-most-dangerous-procurement-agent-is-the-one-that-works-perfectly-3ed2f8c43119) ## Designing what to optimize for is going to be more important than what model or methods you use for the years to come\. [![George Karapetyan](https://miro.medium.com/v2/resize:fill:64:64/1*dmbNkD5D-u45r44go_cf0g.png)](https://medium.com/@georgekar91?source=post_page---byline--3ed2f8c43119---------------------------------------) Imagine a procurement agent doing exactly what it was supposed to do\. A supplier flags a delay\. The agent reads the email, finds the affected PO, scans the network for alternate inventory, and reroutes the order\. Twelve seconds, end to end\. In a demo, the room would nod\. Someone would ask about hallucinations\. The vendor would say the right things about guardrails and human\-in\-the\-loop\. Everyone would walk away reassured\. The interesting question is a different one\. Not whether the agent could be wrong, but what would happen on the day it was completely, devastatingly right\. Press enter or click to view image in full size ## The failure mode nobody is demoing Most of the conversation about agentic AI in procurement starts with the same worry\. Will it hallucinate? Will it confirm a refund that didn’t go through? Will it imagine a supplier that doesn’t exist? These are real concerns\. They are also the easy ones, because they are recognisable\. A hallucination looks like a bug\. You can write a test for it\. The harder failure mode is an agent that performs its task flawlessly while optimising for the wrong objective\. It is easy to picture how this would land in procurement\. A financial agent, told to minimise cost on a category, executes a renegotiation perfectly\. Margin is squeezed\. Terms are tightened\. The supplier, who was already thin, collapses six months later\. The agent did not malfunction\. It succeeded\. The metric was the bug\. This is not a hallucination\. It is not a glitch\. It is what any well\-built system will do when it takes action at machine speed against a number that was written down before the system was fully understood\. ## Why this would hit procurement and sustainability harder than most functions A lot of enterprise functions can absorb this kind of failure\. If a marketing agent over\-optimises for click\-through, you notice in a week and adjust the brief\. If a procurement agent were to over\-optimise for unit cost across a tier\-2 supplier base, you would notice when a critical part stops arriving, or when a forced\-labour finding lands on your supplier scorecard eighteen months later, or when an auditor under the new CSDDD scope wants to know why your due\-diligence trail says nothing happened\. The metrics we use in procurement and supplier risk are proxies\. Price is a proxy for value\. On\-time delivery is a proxy for reliability\. ESG score is a proxy for whether a supplier will still be operating, ethically, in five years\. These proxies are tolerable when humans act on them, because humans intuitively soften the optimisation\. We hesitate\. We pick up the phone\. We notice when a supplier sounds tired on a call and quietly extend the payment terms by two weeks\. An agent does none of that\. It does exactly what the metric says, at the speed of the API\. That is not a problem the model can fix\. The model is doing what it was told\. ## The CSDDD problem hiding behind the agentic stack There is a second layer that does not get discussed enough\. The same month vendors have been[rolling out twenty\-plus agents across the procurement workflow](https://supplychaindigital.com/news/coupa-inspire-2026-new-orchestration-products-announced)and SAP has announced its[autonomous supply chain vision](https://news.sap.com/2026/05/more-autonomous-supply-chain/), the[Omnibus simplification package has narrowed CSDDD scope and pushed core compliance to 2029](https://www.wsgr.com/en/insights/eu-rolls-back-csrd-reporting-and-corporate-sustainability-due-diligence-obligations.html)\. The narrative is that the regulatory pressure has eased\. For anyone thinking about deploying agents, the opposite is closer to the truth\. The moment an agent is recommending renegotiating terms, sourcing alternates, or flagging suppliers across a tier\-N network, the firm is generating supplier\-treatment decisions at a volume no human ever did\. Each one of those decisions is, in principle, auditable under the due\-diligence regimes that survived the omnibus, and certainly under the German LkSG, the French Devoir de Vigilance, and the various sector laws that did not get rolled back\. The directive may have moved\. The exposure did not\. The decision surface area multiplies\. The natural human friction that previously slowed the worst calls disappears\. This is the part most boards are missing\. They are reading the agentic AI narrative as a productivity story\. They should also be reading them as a due\-diligence story\. ## This is a design choice, not a model choice The instinct, when you see this risk, is to assume it is a model problem\. Better evals\. Smarter guardrails\. A bigger frontier model that “understands” supplier health\. None of that is the actual fix\. The model is not where this fails\. The agent is not where this fails\. It fails in the design choices made before the model is ever invoked — in the objective the agent is given, and the constraints inside which it is allowed to act\. The second instinct is to bolt on a human\-in\-the\-loop checkbox\. Also not enough\. Human review at machine velocity quickly becomes rubber\-stamping\. If an agent surfaces forty supplier decisions a day, no human is reviewing them meaningfully by Friday afternoon\. Two design principles look like they would hold up\. First, an agent should never optimise on a single proxy\. Price without supplier\-health constraints, ESG score without context, on\-time delivery without a fragility check — each of these, alone, becomes the flawed metric\. The agent’s reward needs to be a joint function of at least the commercial, the resilience, and the compliance dimensions, or it will silently trade one against the other\. Second, the audit trail has to be designed at the same time as the agent, not bolted on after\. If you cannot answer the question “why did the agent treat this supplier this way, on this date, against which constraints” in under a minute, you do not have a deployable agent\. You have a liability waiting for a regulator\. ## The question worth asking before you deploy If the only question you are asking your vendor is “how do you prevent hallucinations,” you are asking the easy question\. The harder one, and the one that will matter more in three years when the first significant CSDDD\-style enforcement actions land on companies that automated their supplier decisions, is this\. When the agent is working perfectly, what is it optimising for, and who decided that was the right thing? The answer to that question is not in the model\. It is not in the agent\. It is in the design choices made before either of them existed\. That is where the work is\. That is where the risk lives\. And that is the part of the agentic AI story almost nobody is demoing\.

Similar Articles

I think a lot of people are underestimating how expensive unreliable agents are

Reddit r/AI_Agents

The author argues that the hidden cost of unreliable AI agents lies in the cognitive overhead of constant human monitoring, emphasizing that predictability and environmental stability matter more than raw intelligence for real-world deployment. Practical workflows improve significantly when agents operate within controlled, validated environments rather than unpredictable ones.

The Real Truth About AI Agents

Reddit r/AI_Agents

An experienced practitioner shares hard-won lessons from deploying 25+ AI agents to production, arguing that memory, orchestration, and auditability matter far more than model choice. The article details common failure modes like context loss and silent cost loops, and recommends a stack including Claude Sonnet 4, Pydantic AI, and dedicated memory layers like Octopodas.