@dotey: Q: Our company has a dozen microservices, and we want developers to use AI Agents for system design and coding. The problem is that a user story often requires collaboration among multiple microservices, and the Agent must understand each service's responsibility boundaries and business concepts to make reasonable designs. We plan to put all microservices into a single …

X AI KOLs Timeline 06/30/26, 02:17 PM News

ai-agent microservices context-engineering best-practices monorepo contract-testing

Summary

The article discusses in a Q&A format how to enable AI Agents to perform system design and coding in a multi-microservice scenario, focusing on practical experiences with context quality (via monorepo, layered documentation) and validation loops (via contract testing, mock servers).

Q: Our company has a dozen microservices, and we want developers to use AI Agents for system design and coding. The problem is that a user story often requires collaboration among multiple microservices, and the Agent must understand each service's responsibility boundaries and business concepts to make reasonable designs. We plan to put all microservices into a single workspace, each with its own documentation, and let the AI handle it. Is this approach reasonable? Are there better practices? A: The key to using Agents effectively lies in two aspects: context quality and validation loops. Let's start with context quality. Putting everything in a single workspace is a practice currently recommended by the community. A monorepo is naturally well-suited for working with AI because the Agent can see schema definitions, API protocols, and implementation code for all services in one place. If, for historical reasons, moving to a monorepo is inconvenient, a compromise is the "virtual monorepo"—cloning multiple repositories into the same local directory. Beyond co-location, documentation is a great way to provide context to the Agent. Ideally, give the Agent a map plus on-demand loading: 1. Place a master AGENTS.md (or CLAUDE.md) in the root directory as an index, listing all services, what each is responsible for, and instructing the Agent to read a specific service's documentation when modifying it. 2. Each microservice's own directory should contain its own documentation, clearly describing its responsibility boundaries and business concepts—essentially DDD's bounded context. 3. Let the Agent first read the root index, locate the relevant services, and then load their details. However, documentation must be kept up to date, especially when microservice protocols change; otherwise, it can mislead. Anything that can be auto-generated from code or specifications should be, rather than written manually. Manually written documentation will inevitably fall out of sync with code. Machine-readable interface specifications like OpenAPI serve double duty: they are both documentation and can be used to generate mocks and tests. Besides documentation, another often overlooked context source is protocol test code. High-quality contract tests are the most accurate living documentation—they precisely describe the actual interaction protocols between services and are less likely to become outdated than human-written docs, because the tests would fail if something changed. If you already have OpenAPI specs or Pact contract files, they are highly valuable for helping the Agent understand service boundaries. Now, about validation. In microservice scenarios, validation is the trickiest part because a user story may involve multiple services collaborating. You cannot have the Agent run an end-to-end test for the entire system every time it changes a line of code. A practical approach: provide each microservice with a mock server or a simulated service auto-generated from OpenAPI specs. After writing code, the Agent can run contract tests locally to verify whether its changes break protocol agreements with other services, without relying on real online APIs or a full integration environment. This creates a "write code → run tests → self-correct" loop for the Agent, minimizing the need for human intervention during the process. To take it a step further, consider consumer-driven contract testing (common tool: Pact). The idea is that the consumer records the actual interface shape it uses, generates a contract file, and the provider then verifies whether it can satisfy that contract. In summary: a workspace provides a unified global view; layered documentation plus protocol tests offer precise context; mock servers plus contract tests form a validation loop. With these three layers in place, the Agent can handle cross-microservice system design more reliably. Some references: 1. Anthropic's Effective context engineering for AI agents, on treating context as a scarce resource and loading on demand: https://anthropic.com/engineering/effective-context-engineering-for-ai-agents… 2. Anthropic's Effective harnesses for long-running agents, on scaffolding for long tasks (e.g., using progress files and git to bridge context windows): https://anthropic.com/engineering/effective-harnesses-for-long-running-agents… 3. How to organize AGENTS.md in a monorepo for Agents; see this article on dev.to: Steering AI Agents in Monorepos with AGENTS.md: https://dev.to/datadog-frontend-dev/steering-ai-agents-in-monorepos-with-agentsmd-13g0… For an introduction to contract testing, search for Pact and consumer-driven contract testing guides.

Original Article

View Cached Full Text

Cached at: 06/30/26, 03:43 PM

Q: Our company has a dozen or so microservices, and we now want developers to use AI agents for system design and coding. The problem is that a user story often requires collaboration among multiple microservices, and the agent must understand the responsibility boundaries and business concepts of each service to make a reasonable design. We plan to put all microservices under one workspace, each with its own documentation, and let the AI handle it itself. Is this approach reasonable? Are there better practices?

A: The key to using an agent well lies in two points: the quality of context, and a closed loop for verification. Let’s talk about context quality first.

Putting everything under one workspace is currently a recommended practice in the community. Monorepos are naturally well-suited for working with AI because the agent can see schema definitions, API protocols, and implementation code for all services in one place. If due to historical reasons it’s not convenient to combine into a monorepo, there’s a compromise called a virtual monorepo, which involves cloning multiple repositories into the same local directory.

In addition to co-location, documentation is also a great way for the agent to obtain context. It’s best to give the agent a map with on-demand loading:

Place a master AGENTS.md (or CLAUDE.md) in the root directory as an index, listing all services, their responsibilities, and which directory to read for a given service.
Each microservice’s own directory should have its own document describing its responsibility boundaries and business concepts – this is essentially DDD’s bounded context.
Let the agent first look at the root index, locate the relevant services, and then load their details.

However, documentation must be kept up to date, especially when microservice protocols change; otherwise, it can mislead. Anything that can be automatically generated from code or specifications should not be written manually. Manual documentation will eventually become inconsistent with the code. Machine-readable interface specifications like OpenAPI serve as both documentation and can be used to generate mocks and tests.

Beyond documentation, there is another source of context that many overlook: protocol test code. High-quality contract tests are themselves the most accurate living documentation, precisely describing the actual interaction protocols between services, and are less likely to become outdated than human-written documentation because if they are wrong, the tests will fail. If you already have OpenAPI specs or Pact contract files, these are very valuable for the agent to understand service boundaries.

Now regarding verification. Verification is the trickiest part in a microservices scenario because a user story might involve collaboration across several services. You can’t ask the agent to run the entire system for end-to-end testing every time it changes a line of code. A practical approach is: each microservice provides a mock server or a simulated service automatically generated from its OpenAPI spec. After writing code, the agent can run contract tests locally to verify whether its changes break the protocol agreements with other services, without relying on live real APIs or a full integration environment. This way, the agent forms a closed loop of ‘write code → run tests → self-correct’, without requiring frequent human intervention.

To go further, consider learning about contract testing (consumer-driven contract testing, commonly using Pact). The idea is that the caller records the actual interface shape it uses, generates a contract file, and the callee then verifies whether it can satisfy that contract. In short: the workspace provides a unified global view; layered documentation + protocol tests provide precise context; mock server + contract tests provide a verification closed loop. With these three layers in place, the agent can handle cross-microservice system design more reliably.

Some references

Anthropic’s Effective context engineering for AI agents, discusses how to treat context as a scarce resource and load on demand: https://anthropic.com/engineering/effective-context-engineering-for-ai-agents…
Anthropic’s Effective harnesses for long-running agents, discusses how to scaffold agents for long tasks (e.g., using progress files with git records for cross-context window handoff): https://anthropic.com/engineering/effective-harnesses-for-long-running-agents…
How to organize AGENTS.md in a monorepo for agents, see this post on dev.to: Steering AI Agents in Monorepos with AGENTS.md: https://dev.to/datadog-frontend-dev/steering-ai-agents-in-monorepos-with-agentsmd-13g0…

Introduction to contract testing, just search for Pact plus consumer-driven contract testing guides.

Effective context engineering for AI agents

Source: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

After a few years of prompt engineering being the focus of attention in applied AI, a new term has come to prominence:context engineering. Building with language models is becoming less about finding the right words and phrases for your prompts, and more about answering the broader question of “what configuration of context is most likely to generate our model’s desired behavior?”

Contextrefers to the set of tokens included when sampling from a large-language model (LLM). Theengineeringproblem at hand is optimizing the utility of those tokens against the inherent constraints of LLMs in order to consistently achieve a desired outcome. Effectively wrangling LLMs often requiresthinking in context— in other words: considering the holistic state available to the LLM at any given time and what potential behaviors that state might yield.

In this post, we’ll explore the emerging art of context engineering and offer a refined mental model for building steerable, effective agents. At Anthropic, we view context engineering as the natural progression of prompt engineering. Prompt engineering refers to methods for writing and organizing LLM instructions for optimal outcomes (seeour docs for an overview and useful prompt engineering strategies).Context engineeringrefers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.

In the early days of engineering with LLMs, prompting was the biggest component of AI engineering work, as the majority of use cases outside of everyday chat interactions required prompts optimized for one-shot classification or text generation tasks. As the term implies, the primary focus of prompt engineering is how to write effective prompts, particularly system prompts. However, as we move towards engineering more capable agents that operate over multiple turns of inference and longer time horizons, we need strategies for managing the entire context state (system instructions, tools,Model Context Protocol(MCP), external data, message history, etc). An agent running in a loop generates more and more data thatcouldbe relevant for the next turn of inference, and this information must be cyclically refined. Context engineering is theart and scienceof curating what will go into the limited context window from that constantly evolving universe of possible information.

Prompt engineering vs. context engineering In contrast to the discrete task of writing a prompt, context engineering is iterative and the curation phase happens each time we decide what to pass to the model.

Why context engineering is important to building capable agents

Despite their speed and ability to manage larger and larger volumes of data, we’ve observed that LLMs, like humans, lose focus or experience confusion at a certain point. Studies on needle-in-a-haystackstyle benchmarking have uncovered the concept ofcontext rot: as the number of tokens in the context window increases, the model’s ability to accurately recall information from that context decreases. While some models exhibit more gentle degradation than others, this characteristic emerges across all models. Context, therefore, must be treated as a finite resource with diminishing marginal returns. Like humans, who havelimited working memory capacity, LLMs have an “attention budget” that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the LLM.

This attention scarcity stems from architectural constraints of LLMs. LLMs are based on thetransformer architecture, which enables every token toattend to every other tokenacross the entire context. This results in n² pairwise relationships for n tokens. As its context length increases, a model’s ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus. Additionally, models develop their attention patterns from training data distributions where shorter sequences are typically more common than longer ones. This means models have less experience with, and fewer specialized parameters for, context-wide dependencies. Techniques likeposition encoding interpolationallow models to handle longer sequences by adapting them to the originally trained smaller context, though with some degradation in token position understanding. These factors create a performance gradient rather than a hard cliff: models remain highly capable at longer contexts but may show reduced precision for information retrieval and long-range reasoning compared to their performance on shorter contexts.

These realities mean that thoughtful context engineering is essential for building capable agents.

The anatomy of effective context

Given that LLMs are constrained by a finite attention budget,goodcontext engineering means finding thesmallest**possibleset of high-signal tokens that maximize the likelihood of some desired outcome. Implementing this practice is much easier said than done, but in the following section, we outline what this guiding principle means in practice across the different components of context.

System promptsshould be extremely clear and use simple, direct language that presents ideas at theright altitudefor the agent. The right altitude is the Goldilocks zone between two common failure modes. At one extreme, we see engineers hardcoding complex, brittle logic in their prompts to elicit exact agentic behavior. This approach creates fragility and increases maintenance complexity over time. At the other extreme, engineers sometimes provide vague, high-level guidance that fails to give the LLM concrete signals for desired outputs or falsely assumes shared context. The optimal altitude strikes a balance: specific enough to guide behavior effectively, yet flexible enough to provide the model with strong heuristics to guide behavior.

Calibrating the system prompt in the process of context engineering. At one end of the spectrum, we see brittle if-else hardcoded prompts, and at the other end we see prompts that are overly general or falsely assume shared context.

We recommend organizing prompts into distinct sections (like`## Behavior`,`## Tool guidance`,`## Output description`, etc) and using techniques like XML tagging or Markdown headers to delineate these sections, although the exact formatting of prompts is likely becoming less important as models become more capable. Regardless of how you decide to structure your system prompt, you should be striving for the minimal set of information that fully outlines your expected behavior. (Note that minimal does not necessarily mean short; you still need to give the agent sufficient information up front to ensure it adheres to the desired behavior.) It’s best to start by testing a minimal prompt with the best model available to see how it performs on your task, and then add clear instructions and examples to improve performance based on failure modes found during initial testing.

Toolsallow agents to operate with their environment and pull in new, additional context as they work. Because tools define the contract between agents and their information/action space, it’s extremely important that tools promote efficiency, both by returning information that is token efficient and by encouraging efficient agent behaviors. InWriting tools for AI agents – with AI agents, we discussed building tools that are well understood by LLMs and have minimal overlap in functionality. Similar to the functions of a well-designed codebase, tools should be self-contained, robust to error, and extremely clear with respect to their intended use. Input parameters should similarly be descriptive, unambiguous, and play to the inherent strengths of the model. One of the most common failure modes we see is bloated tool sets that cover too much functionality or lead to ambiguous decision points about which tool to use. If a human engineer can’t definitively say which tool should be used in a given situation, an AI agent can’t be expected to do better. As we’ll discuss later, curating a minimal viable set of tools for the agent can also lead to more reliable maintenance and pruning of context over long interactions.

Providing examples, otherwise known as few-shot prompting, is a well known best practice that we continue to strongly advise. However, teams will often stuff a laundry list of edge cases into a prompt in an attempt to articulate every possible rule the LLM should follow for a particular task. We do not recommend this. Instead, we recommend working to curate a set of diverse, canonical examples that effectively portray the expected behavior of the agent. For an LLM, examples are the “pictures” worth a thousand words. Our overall guidance across the different components of context (system prompts, tools, examples, message history, etc) is to be thoughtful and keep your context informative, yet tight. Now let’s dive into dynamically retrieving context at runtime.

Context retrieval and agentic search

InBuilding effective AI agents, we highlighted the differences between LLM-based workflows and agents. Since we wrote that post, we’ve gravitated towards asimple definitionfor agents: LLMs autonomously using tools in a loop. Working alongside our customers, we’ve seen the field converging on this simple paradigm. As the underlying models become more capable, the level of autonomy of agents can scale: smarter models allow agents to independently navigate nuanced problem spaces and recover from errors.

We’re now seeing a shift in how engineers think about designing context for agents. Today, many AI-native applications employ some form of embedding-based pre-inference time retrieval to surface important context for the agent to reason over. As the field transitions to more agentic approaches, we increasingly see teams augmenting these retrieval systems with “just in time” context strategies. Rather than pre-processing all relevant data up front, agents built with the “just in time” approach maintain lightweight identifiers (file paths, stored queries, web links, etc.) and use these references to dynamically load data into context at runtime using tools. Anthropic’s agentic coding solutionClaude Codeuses this approach to perform complex data analysis over large databases. The model can write targeted queries, store results, and leverage Bash commands like head and tail to analyze large volumes of data without ever loading the full data objects into context. This approach mirrors human cognition: we generally don’t memorize entire corpuses of information, but rather introduce external organization and indexing systems like file systems, inboxes, and bookmarks to retrieve relevant information on demand.

Beyond storage efficiency, the metadata of these references provides a mechanism to efficiently refine behavior, whether explicitly provided or intuitive. To an agent operating in a file system, the presence of a file named`test_utils.py`in a`tests`folder implies a different purpose than a file with the same name located in`src/core_logic/`Folder hierarchies, naming conventions, and timestamps all provide important signals that help both humans and agents understand how and when to utilize information.

Letting agents navigate and retrieve data autonomously also enables progressive disclosure—in other words, allows agents to incrementally discover relevant context through exploration. Each interaction yields context that informs the next decision: file sizes suggest complexity; naming conventions hint at purpose; timestamps can be a proxy for relevance. Agents can assemble understanding layer by layer, maintaining only what’s necessary in working memory and leveraging note-taking strategies for additional persistence. This self-managed context window keeps the agent focused on relevant subsets rather than drowning in exhaustive but potentially irrelevant information.

Of course, there’s a trade-off: runtime exploration is slower than retrieving pre-computed data. Not only that, but opinionated and thoughtful engineering is required to ensure that an LLM has the right tools and heuristics for effectively navigating its information landscape. Without proper guidance, an agent can waste context by misusing tools, chasing dead-ends, or failing to identify key information. In certain settings, the most effective agents might employ a hybrid strategy, retrieving some data up front for speed, and pursuing further autonomous exploration at its discretion. The decision boundary for the ‘right’ level of autonomy depends on the task. Claude Code is an agent that employs this hybrid model:CLAUDE.mdfiles are naively dropped into context up front, while primitives like glob and grep allow it to navigate its environment and retrieve files just-in-time, effectively bypassing the need to load all files into the context window at once.

Effective context engineering for AI agents

Why context engineering is important to building capable agents

The anatomy of effective context

Context retrieval and agentic search

Similar Articles

@aiDotEngineer: The Multi-Agent Architecture That Actually Ships https://youtube.com/watch?v=ow1we5PzK-o… What does a multi-agent codin…

@grapeot: Very well said, hits the nail on the head.

@lidangzzz: I've said it many times over the years: to make an AI Agent write good code, all the secrets are in the textbooks from the 1990s: - Write tests diligently, write more tests, push test coverage as high as possible - Do CI/CD properly, avoid messing up at all costs - For a new proj…

Submit Feedback

Similar Articles

@aiDotEngineer: The Multi-Agent Architecture That Actually Ships https://youtube.com/watch?v=ow1we5PzK-o… What does a multi-agent codin…

@grapeot: Very well said, hits the nail on the head.

This article systematically reviews AI Agent architecture and engineering practices, covering control flow, context engineering, tool design, memory, multi-agent organization, evaluation, tracing, and security. It is based on the OpenClaw implementation and emphasizes the critical role of Harness (testing and validation infrastructure) for system stability.

@Xudong07452910: Open-source framework recommendation: Agency Agents — 232 professional AI agents, divided by function, covering 16 business departments. If you've used Claude Code or Codex, you may have encountered this problem: AI is very capable at coding tasks, but when it comes to front-end design, writing marketing...

@lidangzzz: I've said it many times over the years: to make an AI Agent write good code, all the secrets are in the textbooks from the 1990s: - Write tests diligently, write more tests, push test coverage as high as possible - Do CI/CD properly, avoid messing up at all costs - For a new proj…