@MaximeRivest: https://x.com/MaximeRivest/status/2055293570119065875

X AI KOLs Following Tools

Summary

MaximeRivest explains DSPy's five core components—Optimizers, Signatures, LMs, Modules, and Adapters—and argues that effective AI engineering requires mastering these elements, highlighting the often-overlooked role of rendering structured outputs.

https://t.co/7CEdeQqEpK
Original Article
View Cached Full Text

Cached at: 05/15/26, 03:05 PM

A Simple Explanation of What DSPy Can Teach You About AI Engineering

Exactly one year ago, I tried DSPy for the first time. It felt magical. It took me a whole year of wanting to look into it before I finally sat down one morning and actually ran the example snippets in the Getting Started docs. They felt too short and magical to be “enough”—but they are enough.

Anyway, today this post is not so much about why DSPy is so magical, but rather about what DSPy is doing a bit differently that makes it so important for the future of integrating AI into our society.

Why listen to me? In the last year, while I was working for a big academic publisher, I used DSPy to build a pipeline that runs on virtually all scientific publications in the world—roughly 100 million times per week—fully releasing data analysts from the tedious task of creating custom scientific classifications. That would have cost $400K per week with ChatGPT. With vLLM, Llama 8B, Qwen embeddings, and DSPy, it cost just $50. I also built a pipeline to parse millions of scanned PDFs at human-level quality while being 10× faster. I have since moved on and am now working full-time in open-source AI engineering. I’ve made several DSPy community libraries and am now a contributor to DSPy. Just this morning I pushed my first PR to DSPy, where we’re taking the first step toward formalizing DSPy’s contract between its five key components. Those five components are what I want to teach you about.

Optimizers, Signatures, LMs, Modules, and Adapters

I’ve stated them with their DSPy names and in the order people tend to encounter them.

  • Optimizers: Automatically change your prompts and/or model weights to improve performance on an eval.

  • Signatures: A high-level way to specify input and output names and types so the details can be left to automatic optimization.

  • LM: The connection between DSPy and the outside world—that’s where tokens are generated.

  • Modules: Where programming, inference strategies, and several LLM calls can be put together into a compute graph, working together as one system (a compound AI system).

  • Adapters: Where task-independent, type- and structure-related inference strategies live. These render the task inputs and the optimized instructions into text prompts and request parameters.

Any effective AI programming will need these components. Many AI frameworks have several of them; few (if any, other than DSPy) have all of them. My favorite—and the one that is most underappreciated—is the adapters.

Let’s rename them in more general terms. The work of an AI engineer will be about:

  • Evals: Evaluating and improving

  • Interface: Defining your task, its inputs and outputs at the highest level

  • Inference: Making your pipeline run on different providers and models

  • Call Graph: Considering how you decompose the task (if you do), what you do with AI, what you do in code or traditional ML, whether you’re using reasoning, whether you’re using tools

  • Rendering: How you render, format, and parse the domain-specific prompt and input/output types into the actual complete request

Rendering

That is probably the least obvious part to most readers, so let’s start here.

Rendering is about how you render your instructions and inputs to the model and how you instruct the model to render its output. The two often go together. If you tell the model to use XML tags, you’ll use XML tags in your prompt. The same goes for JSON and custom delimiters.

When you decide to ask for structured output using XML tags, you are using an inference strategy. That inference strategy is independent of your task—it’s about how you will render your prompt to show to the model and how you ask it to render its output so you can parse it.

To get structured output, XML is just one of many options. Alternatives include: JSON, native structured outputs, custom delimiters, BAML, CSV, and many more.

Structured output is only one axis of rendering. How you render reasoning, images, tool calls, videos, PDFs, and citations—these are all rendering-related, task-independent inference strategies you need to make. You can keep it simple and just use whatever is “native” from the provider, but that is rarely the best option. It’s just delegating the decision to them.

For example, JSON tool calling is the default now, but there are many other (often superior) ways of rendering a request for tool usage. You could parse and run all Markdown code cells that start with #!run. You could parse and run text inside <toolcall></toolcall> delimiters, etc.

For PDFs, you could extract the text with traditional OCR and provide an image of the document. You could provide just the text, just the image, or the binary (probably with low success), etc.

For images, if it’s like a logo, you could turn it into SVG and provide just the SVG. You could do two steps: a model that describes it, then a model that receives just the description. You could lower the resolution or tile multiple images together into one, etc.

For reasoning, you could use <thinking></thinking> at the top of the document. You could require the model to have a #REASONING: comment before any lines of code. You could have thinking tags throughout the outputs, etc.

This is simple. It’s done for you if you’re not doing it yourself. The three biggest recent advancements from the big AI providers were all related to rendering: reasoning, structured outputs, and tool calls.

Call Graph

Decomposing a task into many sub-calls to the LLM and delegating each to the appropriate model is one of the most effective ways to change the cost, performance, and latency profile of your AI pipeline.

You can call the same model many times. You can use specialized models (guards). You can call the best models and combine their responses. You can have a task done in many different languages and programs and take the majority response. You can have “specialized” model personas, each focusing on different elements. You can mix AI calls with code and traditional programming.

This is all done inside a module, and you should have an end-to-end way of calling it that is independent of your decomposition. These are compound AI systems—and they are powerful.

Inference

You will need to shop around and evolve. Open-source and commercial models are released pretty much daily now. You need all of your work on prompts, rendering, and call graphs to be easily plug-and-play with any provider and model.

The most effective way to do that is to target one specific universal format for your AI request, then map that format once to all the providers and models you want to try, and map their responses back into a universal format that your pipeline can parse, evaluate, render, etc.

Interface

To be useful and impactful, your AI program needs to interface with the world. It needs to be called by an app. It needs to run daily on some data stream, etc. That interface needs to be stable—because it is your true task.

You have to keep that separate and abstracted away from all the hacking, fiddling, optimizing, decomposing, and rendering you’re doing underneath to reach a satisfactory cost, performance, and accuracy profile. Define your system’s signature once, then fiddle inside it.

Evals

None of the above means anything if you’re not trying to improve your performance. You need to evaluate your work.

Don’t build big, beautiful evals too early, though. On many tasks, a single obvious example won’t even work. Once you’re making your program go from zero to a few working examples, just evaluate by hand: interact, look at your data and traces. Is there a bug in your rendering? In your request to the language models? In your parsing? Etc.

After that, make a small dataset—that’s enough to run automatic prompt optimization. Then run it in production, collect your inputs and outputs so you have a real data distribution, and maybe you’ll even have enough for fine-tuning!

Conclusion

AI engineering has five important components. For any given task, a subset of these will be more important to focus on, but they are all always there—you might just be delegating the decisions to others and to circumstances.

DSPy lets me geek out on any one of them without worrying too much about the others, and it lets all of us share best practices and general solutions to those problems.

Similar Articles