Cached at:
05/31/26, 03:32 PM
# The Most Dangerous Procurement Agent Is the One That Works Perfectly
Source: [https://medium.com/@georgekar91/the-most-dangerous-procurement-agent-is-the-one-that-works-perfectly-3ed2f8c43119](https://medium.com/@georgekar91/the-most-dangerous-procurement-agent-is-the-one-that-works-perfectly-3ed2f8c43119)
## Designing what to optimize for is going to be more important than what model or methods you use for the years to come\.
[](https://medium.com/@georgekar91?source=post_page---byline--3ed2f8c43119---------------------------------------)
Imagine a procurement agent doing exactly what it was supposed to do\. A supplier flags a delay\. The agent reads the email, finds the affected PO, scans the network for alternate inventory, and reroutes the order\. Twelve seconds, end to end\. In a demo, the room would nod\. Someone would ask about hallucinations\. The vendor would say the right things about guardrails and human\-in\-the\-loop\. Everyone would walk away reassured\.
The interesting question is a different one\. Not whether the agent could be wrong, but what would happen on the day it was completely, devastatingly right\.
Press enter or click to view image in full size
## The failure mode nobody is demoing
Most of the conversation about agentic AI in procurement starts with the same worry\. Will it hallucinate? Will it confirm a refund that didn’t go through? Will it imagine a supplier that doesn’t exist? These are real concerns\. They are also the easy ones, because they are recognisable\. A hallucination looks like a bug\. You can write a test for it\.
The harder failure mode is an agent that performs its task flawlessly while optimising for the wrong objective\. It is easy to picture how this would land in procurement\. A financial agent, told to minimise cost on a category, executes a renegotiation perfectly\. Margin is squeezed\. Terms are tightened\. The supplier, who was already thin, collapses six months later\. The agent did not malfunction\. It succeeded\. The metric was the bug\.
This is not a hallucination\. It is not a glitch\. It is what any well\-built system will do when it takes action at machine speed against a number that was written down before the system was fully understood\.
## Why this would hit procurement and sustainability harder than most functions
A lot of enterprise functions can absorb this kind of failure\. If a marketing agent over\-optimises for click\-through, you notice in a week and adjust the brief\. If a procurement agent were to over\-optimise for unit cost across a tier\-2 supplier base, you would notice when a critical part stops arriving, or when a forced\-labour finding lands on your supplier scorecard eighteen months later, or when an auditor under the new CSDDD scope wants to know why your due\-diligence trail says nothing happened\.
The metrics we use in procurement and supplier risk are proxies\. Price is a proxy for value\. On\-time delivery is a proxy for reliability\. ESG score is a proxy for whether a supplier will still be operating, ethically, in five years\. These proxies are tolerable when humans act on them, because humans intuitively soften the optimisation\. We hesitate\. We pick up the phone\. We notice when a supplier sounds tired on a call and quietly extend the payment terms by two weeks\.
An agent does none of that\. It does exactly what the metric says, at the speed of the API\. That is not a problem the model can fix\. The model is doing what it was told\.
## The CSDDD problem hiding behind the agentic stack
There is a second layer that does not get discussed enough\. The same month vendors have been[rolling out twenty\-plus agents across the procurement workflow](https://supplychaindigital.com/news/coupa-inspire-2026-new-orchestration-products-announced)and SAP has announced its[autonomous supply chain vision](https://news.sap.com/2026/05/more-autonomous-supply-chain/), the[Omnibus simplification package has narrowed CSDDD scope and pushed core compliance to 2029](https://www.wsgr.com/en/insights/eu-rolls-back-csrd-reporting-and-corporate-sustainability-due-diligence-obligations.html)\. The narrative is that the regulatory pressure has eased\. For anyone thinking about deploying agents, the opposite is closer to the truth\.
The moment an agent is recommending renegotiating terms, sourcing alternates, or flagging suppliers across a tier\-N network, the firm is generating supplier\-treatment decisions at a volume no human ever did\. Each one of those decisions is, in principle, auditable under the due\-diligence regimes that survived the omnibus, and certainly under the German LkSG, the French Devoir de Vigilance, and the various sector laws that did not get rolled back\. The directive may have moved\. The exposure did not\. The decision surface area multiplies\. The natural human friction that previously slowed the worst calls disappears\.
This is the part most boards are missing\. They are reading the agentic AI narrative as a productivity story\. They should also be reading them as a due\-diligence story\.
## This is a design choice, not a model choice
The instinct, when you see this risk, is to assume it is a model problem\. Better evals\. Smarter guardrails\. A bigger frontier model that “understands” supplier health\. None of that is the actual fix\. The model is not where this fails\. The agent is not where this fails\. It fails in the design choices made before the model is ever invoked — in the objective the agent is given, and the constraints inside which it is allowed to act\.
The second instinct is to bolt on a human\-in\-the\-loop checkbox\. Also not enough\. Human review at machine velocity quickly becomes rubber\-stamping\. If an agent surfaces forty supplier decisions a day, no human is reviewing them meaningfully by Friday afternoon\.
Two design principles look like they would hold up\. First, an agent should never optimise on a single proxy\. Price without supplier\-health constraints, ESG score without context, on\-time delivery without a fragility check — each of these, alone, becomes the flawed metric\. The agent’s reward needs to be a joint function of at least the commercial, the resilience, and the compliance dimensions, or it will silently trade one against the other\.
Second, the audit trail has to be designed at the same time as the agent, not bolted on after\. If you cannot answer the question “why did the agent treat this supplier this way, on this date, against which constraints” in under a minute, you do not have a deployable agent\. You have a liability waiting for a regulator\.
## The question worth asking before you deploy
If the only question you are asking your vendor is “how do you prevent hallucinations,” you are asking the easy question\. The harder one, and the one that will matter more in three years when the first significant CSDDD\-style enforcement actions land on companies that automated their supplier decisions, is this\. When the agent is working perfectly, what is it optimising for, and who decided that was the right thing?
The answer to that question is not in the model\. It is not in the agent\. It is in the design choices made before either of them existed\. That is where the work is\. That is where the risk lives\. And that is the part of the agentic AI story almost nobody is demoing\.