@MaximeRivest: https://x.com/MaximeRivest/status/2017688441404764174
Summary
A skeptical software engineer argues that chatbots are overhyped and that the real value of AI lies in using language models as reliable compute components in engineering systems, encouraging developers to integrate AI into practical applications beyond conversational interfaces.
View Cached Full Text
Cached at: 05/16/26, 03:21 PM
The Skeptical Software Engineer’s Guide to AI That Actually Works
Why should you listen to me? I am not a formally trained software engineer, nor a computer scientist, nor do I work for a big AI lab. I am a Scientist, a Farmer, a small business owner, a bioinformatician, a computational biologist, an R and Python library maker, a manager, a data analyst and sometimes a software engineer. If you look at that list of things you will notice that I care about solving real problems and I learn what I need when I need it. For instance, I trained deep neural networks on 25 million scientific articles when gpt-1 was not yet released. I recently created a platform for distributed AI inference and a new markdown editor. I have put code into production. Altogether, I think I have a ‘balanced’ experience of AI, software and physical pragmatic concerns. I guess some would call that experience and a no-nonsense attitude. But I am also ambitious, I want things to be better. Now, I want the world to start reaping the real strong tangible rewards that deep learning and large-language models offer and there is so much to build and software engineers have all it takes (almost) to do that building. If you want to contribute, read on.
Chatbots are not it. Chatbots are unreliable and often not the best UX for the job. This does not mean AI is not it and often not the best tool for the job.
If you are a software engineer (anywhere in the stack) and you are asked to work on some ai feature and it turns out to be for a chatbot/conversation thing, I understand that you are skeptical about AI usefulness, reliability and practicality.
We are very super early and ChatGPT is what has exploded in popularity and most (all?) big newsworthy AI products are, currently, in the form of chat, so it biases us all to: focus on, and associate, chat with AI. But, AI is deeper than that. AI more general than that, it’s more useful than that, it’s more engineering than that.
To me, chatbots are like the browsers. There will be only a few winners. They are and will stay a big deal, humans are social and like conversations and the conversation UX sooth to this part of us. With that said, humans are multi-faceted, and when they want a task to get done they happily deprioritize conversations over command and control. Riding a horse can be nice and soothing but a self-driving motorcycling is more easy to control. With the horse, everything is a conversation. Getting back to the browser, while they are important, much much more was built and engineered inside and around the browsers, that’s where the real opportunity was, and that’s where engineers of all kinds found meaningful work. Now, there is a lot to build with, in, and around the chatbots. Their are thousands of AI-powered functions, services,and systems that will run quietly inside other products—extracting data, classifying inputs, transforming documents, making decisions at scale. This is engineering work. It needs software engineers.
In this article I will try to show you what AI engineering actually looks like, why your existing skills matter more than you think, and what (if anything) you might need to learn.
AI as a New Kind of Compute; Not a New Kind of Colleague.
This is an analogy. It’s not literally true. But I find it helpful. You already work with different kinds of compute. CPUs. GPUs. Databases. Third-party APIs. They’re components in your systems. They take inputs, produce outputs, cost money, and sometimes fail.
Now there’s another kind: language models or more generally deep neural networks. They take inputs. They produce outputs. They cost money. They sometimes fail. In that sense, they’re just another component.
But there’s one real difference. With normal code, you write the function body. You write the logic that transforms input to output. With an AI program, you don’t. Instead, you specify what you want. The input type, the output type, a description of what it should do (a lite docstring), maybe some examples (either from the start or as you go) and a model figures out how to produce the output. The performance and outputs are not explainable by any other reason then:
That pattern is in the data and it was successfully modeled.
Its opaque and its probabilistic, not deterministic. You won’t get exactly the same result every time (unless under very specific set ups). That inability to mechanistically and deterministically explain and deduce the output is the new part.
Everything else, breaking problems down, defining contracts, handling errors, testing, observing how things behave in production… that’s engineering. That’s what you already do.
What You Already Have
If you’re a software engineer, you have most of what you need.
Decomposition. When you look at a big, messy prompt, you can find well-defined tasks hiding inside it. You can pull them apart. You’ve been doing this with code your whole career. Decomposing, separating concerns, making things testable. Same skill.
Contracts. You know how to define function signatures. Input types. Output types. Descriptions of expected behavior. This is API design. This is specs. You’ve done this.
Error handling. Validation. Retries. Fallbacks. Graceful degradation. You know that systems fail, and you know how to build systems that handle failure.
**Observability. **Logging. Tracing. Analytics. Understanding what’s happening when you’re not watching. Knowing where to look when something goes wrong.
Composition. Building systems out of smaller pieces. Services that call other services. Pipelines. Graphs. Orchestration. You’ve thought about this, probably a lot.
All of this applies to AI programs. The shape is the same. The components are different.
The One Thing You Might Need to Learn
There’s one piece that might be new. I want to name it clearly so it doesn’t feel bigger than it is.
It’s evaluation thinking. A little bit of data science. A little bit of the philosophy of measurement.
Here’s what I mean.
With normal code, you write tests. Assertions. assert f(x) == y. It passes or it fails. Green or red.
With AI programs, it’s different. The output is probabilistic. You can’t assert exact equality. Instead, you provide examples of correct behavior—input-output pairs that define what “working” means—and you measure how often the system matches.
This means asking questions like:
-
Is 95% accuracy good enough for this use case? Or do I need 99%? Or 99.9%?
-
How do I build a set of examples that actually represents the real inputs I’ll see?
-
When something changes, how do I know if it got better or worse?
-
How do I find the cases where it fails and understand why?
This is a real skill. It takes practice. You’re building intuitions about systems that don’t behave exactly the same way twice.
But it’s bounded. It’s not “go get a PhD in machine learning.” It’s one new discipline, and it’s learnable. You can start with ten examples and grow from there.
That’s what you need to learn. The rest you already have.
What an AI Program Looks Like
Let me make this concrete.
This is a function signature. Input type: a string of receipt text. Output type: a structured object with specific fields. A short description of what it should do.
This is a contract. A spec. You’ve written hundreds of these.
The difference is inside. The ... isn’t code you write. It’s an AI call that gets optimized. You might start with a simple prompt. Then you add examples. Then you try a different model. Then you adjust the instructions. You measure against your eval set. You keep what works.
The function signature stays stable. The implementation gets optimized from data.
Now imagine you have a harder problem: processing receipt images end-to-end. The naive approach is one big call.
This works sometimes. But when it fails, you won’t know where or why. You can’t test the pieces. You can’t improve them separately.
The engineering approach: decompose.
Now each piece can be tested. Each piece can be improved. When something fails, you know where. You can use a fast cheap model for the easy steps and a more capable one where it matters.
There’s another benefit. You’re giving the model a thinking structure. Spreading reasoning across steps often makes the whole system more accurate. The model doesn’t have to solve everything at once.
This is decomposition. Same instinct you use when code gets tangled. Same skill.
Composing Systems
Once you have AI programs, you connect them.
Sometimes it’s a straight pipeline. Sometimes there are branches—parallel paths that merge. Sometimes there are loops—retry with a different strategy, or refine until a quality threshold is met.
The system’s behavior comes from:
-
The structure: what calls what
-
The contracts: types at each node
-
The routing: which models, what retry logic
-
The optimization: how each piece is tuned
This is service orchestration. You’ve thought about systems like this before, just with different components.
And because each piece has a clear contract, you can trace through the system. You can log inputs and outputs at each step. You can see where failures cluster. You can build dashboards. You can run analytics.
You can bring engineering to this.
The Work That Needs Doing
There’s a lot of it.
Functions that extract structured data from messy inputs. Services that classify, summarize, route, transform. Programs that take something ambiguous and produce something typed and validated.
These aren’t glamorous. They won’t get breathless press coverage. But they’re useful. They’re the kind of thing that makes real systems work. They are reliable bricks from which cathedrals can be built.
And they need to be reliable. Not “works in a demo” reliable. Actually reliable. Tested against real cases. Monitored in production. Improved over time.
That’s engineering. That’s what software engineers know how to do well.
An Invitation
There’s a new kind of component. It’s powerful and weird and opaque (deep neural networks have low explanability). It opens up new possibilities. And building with it well—reliably, thoughtfully, at scale—is an engineering problem.
Your skills matter here. Decomposition. Contracts. Types. Error handling. Observability. Testing. Composition. All of it transfers.
There’s one new thing to learn: evaluation thinking. How to specify behavior with examples. How to measure systems that cannot be proven nor analyzed through logic. How to build intuitions about probabilistic outputs. It’s real, but it’s bounded. You can learn it.
The chat layer will consolidate. A few assistants will win. There might be a bubble. There might be a correction. I don’t know, but the underlying technology is real.
The engineers who focus on building things that actually work: reliable systems, real problems, measured performance, they will create tremendous values in the world.
We need software engineers here.
I hope you’ll come.
To learn more about building reliable compound AI systems see DSPy and this blog post
Similar Articles
@chamath: https://x.com/chamath/status/2054646394867364143
A detailed primer on the rise of AI agents, including statistics, failure modes, and a five-layer framework, highlighting the shift from chatbots to autonomous task-oriented AI.
What if AI systems weren't chatbots?
This paper critiques the dominance of chatbot interfaces in AI, arguing they have structural downsides and societal harms, and proposes alternative pluralistic system designs.
@karpathy: Judging by my tl there is a growing gap in understanding of AI capability. The first issue I think is around recency an…
Andrej Karpathy observes a growing gap in public understanding of AI capabilities, attributing it partly to people basing their views on outdated free-tier ChatGPT experiences from a year ago.
All my clients wanted a carousel, now it's an AI chatbot
A web developer reflects on the cyclical nature of client demands—from carousels to cookie banners to AI chatbots—arguing that chatbots have become a social signal rather than a useful tool, and that genuinely simple, fast websites are often harder to build but undervalued. No technical breakthrough is discussed; this is an opinion/commentary piece.
@SaitoWu: https://x.com/SaitoWu/status/2053101671035851216
The article summarizes a talk by Matt Pocock criticizing 'specs-to-code' approaches, arguing that solid software engineering fundamentals like TDD and modular design are more critical than ever for effectively using AI coding assistants like Claude Code.