@wsl8297: As you code more and more with AI, you'll find many problems are actually stuck on "terms not clear": context window, turn, tool call, harness, permission mode, parametric knowledge, RAG, memor…

X AI KOLs Timeline Tools

Summary

This is an open-source AI programming glossary that helps developers understand common concepts like context window, tool call, etc., organized by usage path and cross-linked.

As you code more and more with AI, you'll find many problems are actually stuck on "terms not clear": context window, turn, tool call, harness, permission mode, parametric knowledge, RAG, memory... You've heard them, but may not truly understand them. AI Coding Dictionary is an open-source AI programming glossary that explains these concepts in a way closer to developers' daily work. GitHub: https://github.com/mattpocock/dictionary-of-ai-coding… Instead of just listing terms alphabetically, it's organized by the path you take when using AI to code: - Section 1: The model itself, e.g., parameters, training, inference, token, provider - Section 2: Sessions, context windows, and turns - Section 3: Tools, environments, permission mode, harness - Then it continues with common failures, context management, memory, collaboration patterns, etc. - Terms are cross-linked, making it easy to build up your conceptual network as you look things up. If you're using tools like Claude Code, Codex, or Cursor, go through this dictionary first. Many questions of "why did it go off track again" will be easier to pinpoint.
Original Article
View Cached Full Text

Cached at: 05/26/26, 09:14 PM

As you write more code with AI, you’ll find that many problems are actually stuck on unclear terminology: context window, turn, tool call, harness, permission mode, parametric knowledge, RAG, memory… you’ve heard them, but don’t necessarily truly understand them. AI Coding Dictionary is an open-source glossary of AI programming terms, explaining these concepts in a way that’s closer to a developer’s daily work. GitHub: https://github.com/mattpocock/dictionary-of-ai-coding … It’s not a stack of terms alphabetically, but organized by the path you follow when using AI for coding: - Section 1: The model itself, e.g. parameters, training, inference, token, provider - Section 2: Sessions, context windows, and turns - Section 3: Tools, environment, permission mode, harness - Later sections cover common failure modes, context management, memory, collaboration patterns, etc. - Terms are cross-linked, so you can fill in your conceptual map as you look things up. If you’re using tools like Claude Code, Codex, or Cursor, go through this dictionary first — many of those “why did it go off the rails again” questions will be easier to diagnose.


mattpocock/dictionary-of-ai-coding

Source: https://github.com/mattpocock/dictionary-of-ai-coding

AI Coding Dictionary

AI coding can feel like it’s just for experts. Unexplained jargon. Mysterious failures. Bills that don’t seem to match the work. It isn’t, really. A lot of the confusion is manufactured: there’s a whole VC-funded economy that benefits from keeping it hard to understand. The basic terms of engagement are learnable in an afternoon. Once you have them, the whole thing stops feeling like guesswork. Why does context degrade? Why is the bill so high? Why does the same prompt behave differently from one day to the next? Each has a clean answer, once someone tells you the words to use. That’s what this dictionary is for. The vocabulary of AI coding, translated into plain English.

Want more than the vocabulary? Join 62,000+ developers at aihero.dev/newsletter (https://www.aihero.dev/s/dictionary-newsletter) for my latest skills, thinking on AI engineering, and the resources that’ll keep you ahead of the curve.


Table of contents

Section 1 — The Model

Section 2 — Sessions, Context Windows & Turns

Section 3 — Tools & Environment

Section 4 — Failure Modes

Section 5 — Handoffs

Section 6 — Memory and Steering

Section 7 — Patterns of Work

Section 1 — The Model

Model

The parameters. Stateless — does next-token prediction and nothing else. “Claude Opus 4.7” and “GPT-5” are models. On its own a model can’t do anything agentic; it has to be harnessed.

Usage: “Should we switch the model from Sonnet to Opus for the planning step?” “Try it — but the harness is doing most of the lifting on this task. The model swap won’t help if the system prompt and tools are wrong.”

Parameters

The numbers inside a model — often billions of them — tuned during training. Everything the model “knows” lives in them. Training sets them; inference uses them unchanged. Also called weights.

Usage: “Can we fine-tune it on our codebase?” “That’d update the parameters — different model afterwards. For one project it’s almost always cheaper to load the codebase as context than to retrain.”

Training

The process that sets a model’s parameters, by exposing it to vast amounts of text and adjusting parameters to improve next-token prediction. A one-time, expensive process done by the model provider. Encompasses both pre-training (the bulk run) and post-training (later refinements like instruction-following and safety); the distinction doesn’t matter at this glossary’s level.

Usage: “Can we get it to know our internal API?” “Not via training — that’s a months-long process by the model provider. Load the API docs into context instead, that’s the lever you actually have.”

Inference

Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does next-token prediction over the context it’s given. Cheap relative to training, but billed per token and the dominant cost of using a model.

Usage: “Why does the bill scale with usage instead of being a flat license?” “You’re paying for inference — every model provider request runs the model on the provider’s hardware. Training already happened, but inference costs accrue per request, and a single turn can expand into many requests when tools are called.”

Token

The atomic unit a model reads and writes. Roughly word-sized but not exactly — common words are one token, rare or long ones split into several. Context window size, cost, and latency are all counted in tokens.

Avoid: “word” — token boundaries don’t match word boundaries, and tokens-per-second / tokens-per-dollar are the units that actually matter.

Usage: “How big is this prompt going to be?” “Run it through the tokenizer — the schema’s compact but the JSON keys are weird, so they’ll split into more tokens than you think.”

Next-token prediction

What the model actually does. Given a context, it samples one next token, appends it, and runs again. Every output — a sentence, a tool call, a thousand-line file — is built one token at a time. The model has no other mode of operation.

Usage: “How does the agent ‘decide’ to call a tool?” “It doesn’t — it’s next-token prediction all the way down. The tool call is just a structured string the harness parses out of the output stream.”

Non-determinism

The same input can produce different output. Run a model twice with identical context and you may get two different answers — sometimes a word, sometimes a completely different approach. Nothing in your code has to change for this to happen. It’s a property of how models generate text, and how model providers serve requests. There’s no setting you can flip to make it go away. Expect a spread of results from an agent on the same task. Some days the model will feel sharp; some days it’ll feel like it’s lost the plot. Same task, different rolls of the dice.

Be careful not to over-narrativize this. Humans are pattern-matching machines, and a string of bad runs can feel like proof that “the model got worse this week.” Usually it’s just the distribution.

Usage: “Claude has been awful today. Did they ship a worse version?” “Probably not — model output is non-deterministic. You’re going to have good days and bad days on the same task. Try again tomorrow before you go looking for a cause.”

Model provider

Whatever serves a model for inference. Usually a remote service (Anthropic, OpenAI, Google), but can also be local — Ollama, LM Studio, llama.cpp running on your own machine. The harness doesn’t run the model itself; it asks a provider to.

Usage: “Can we run this offline for the air-gapped client?” “Swap the model provider to a local one — Ollama or llama.cpp on their box. The harness doesn’t care, it just hits a different endpoint.”

Harness

Everything around the model that turns it into an agent: tools, system prompt, context-window management, permissions, hooks. Claude.ai and Claude Code run on the same model but behave differently because their harnesses differ.

Usage: “Same model, why is Claude Code editing files and Claude.ai just answering questions?” “Different harnesses — Claude Code has filesystem tools, a different system prompt, and a permission layer. The model isn’t the variable here.”

Model provider request

One round-trip from the harness to the model provider. The harness sends the current context; the provider returns one response (a tool call or a final answer). A single user message can spawn many model provider requests if the agent calls tools — each tool result triggers another request.

Usage: “One question burned forty thousand tokens?” “Look at the tool calls — twelve grep, eight read, four edits. Each tool result spawns another model provider request, and the whole session prefix re-sends every time.”

Input tokens

Tokens the harness sends on each model provider request. Billed at a lower rate than output tokens.

Usage: “Bill’s high but the agent’s barely writing anything.” “It’s the input tokens — every turn re-sends the whole session. Without the prefix cache you re-pay for the history each request.”

Output tokens

Tokens the model generates back. Billed at a higher rate than input tokens, since they cost more compute to produce.

Usage: “The refactor session is burning through credit even though the inputs are small.” “Agent’s rewriting whole files instead of patching. Output tokens cost roughly five times the input rate — get it emitting edits and the bill drops.”

Prefix cache

The provider-side store that lets consecutive model provider requests skip re-processing a shared prefix. When the start of a request matches the start of a recent one — same system prompt, same history up to some point — the provider reuses its prior work and bills those tokens as cache tokens at a much lower rate. Anything that changes the prefix (reordering files, rewriting the system prompt mid-session, injecting a timestamp near the top) invalidates the cache from that point on, and the rest of the request bills at full input token rate.

Usage: “Why did the bill spike halfway through the session?” “Harness started injecting the current time into the system prompt every turn. Prefix cache breaks at the first changed token, so every request after that billed at full rate.”

Cache tokens

Input tokens the provider has cached from a previous model provider request so it doesn’t have to re-process them. When consecutive requests share a prefix, the provider reuses the work via its prefix cache and bills the cached portion at a much lower rate. The lever that makes long sessions affordable — without it, every turn re-pays for the whole history.

Usage: “Cost on long sessions is brutal — eight bucks for a refactor.” “Check the cache tokens. If the harness is reordering the system prompt or files between turns, the prefix breaks and you re-pay full input rate every request.”

Section 2 — Sessions, Context Windows & Turns

Stateless

Carries no information forward. The model is stateless across model provider requests — each request resends the full context window, because the model has no way to see anything else. An agent is stateless across sessions by default: a new session starts empty, with no trace of prior ones. Counterpart to stateful.

Usage: “Why does it forget the convention every time I clear?” “The model’s stateless — the new session starts empty. If you want it carried, write it to AGENTS.md or a memory file the harness loads at session start.”

Context

The relevant information the agent has access to right now. The abstract noun — not the raw input the model sees (that’s the context window), not the running history (that’s the session), but what the agent knows that’s pertinent to the task. “Loading something into context” means making it part of this set; “context engineering” is the discipline of curating it.

Usage: “It keeps inventing fields that aren’t in the type.” “The type file isn’t in context — it’s reading the call sites and guessing. Read the definition in first.”

Context window

Everything the model sees on each model provider request. Finite, model-specific, and the only surface through which the model perceives anything.

Avoid: “memory” — the context window is working state and doesn’t persist across sessions. Memory is a separate concept layered on top.

Usage: “Can I just paste the whole monorepo into the prompt?” “The context window’s 200k tokens — that’s maybe a fifth of the repo. Pick the files the task touches, leave the rest behind a tool call.”

Stateful

Carries information forward. A session is stateful across turnscontext accumulates as the session runs, which is why long sessions drift into the dumb zone. An agent can be made stateful across sessions by adding a memory system that persists information into the environment and reloads it at the start of future sessions. The model is never stateful; any apparent continuity is the harness re-feeding context. Counterpart to stateless.

Usage: “It remembered my preferences from yesterday — does that mean the model learned them?” “No, the agent’s stateful because the harness wrote them to a memory file and reloaded them at session start. The model itself saw nothing of yesterday.”

Agent

A model harnessed with tools, a system prompt, and a context window, that takes turns with a user. Claude Code is an agent. Cursor is an agent. Claude.ai is an agent. An agent is what you actually talk to — it’s the model in motion, configured for a purpose.

Avoid: “the AI”, “the bot” (too vague — they hide whether you mean the parameters or the harnessed thing).

Usage: “Which agent are you using for the migration?” “Claude Code locally, Cursor for the UI work — same model underneath, different harnesses.”

Similar Articles

@cevenif: Using Claude Code or Codex for development, but feel like AI is running wild? This course might be the missing piece you need. There's an open-source course on GitHub called Learn Harness Engineering, which teaches you to establish a controllable workflow framework for AI coding assistants, centered around five core mechanisms...

X AI KOLs Timeline

GitHub open-source course Learn Harness Engineering teaches you to build a controllable workflow framework for AI coding assistants (e.g., Claude Code, Codex). It includes 12 theory lessons and 6 hands-on projects, covering core mechanisms: instruction, state, validation, scope, and session.

@VincentLogic: AI coding assistants scan the entire project every time they modify code, and the token consumption breaks my heart. After installing CodeGraph, it no longer fumbles around like a headless fly using grep to search files. It first builds a local index graph, organizing function definitions, variable references, and call relationships. When AI needs to work, it directly queries…

X AI KOLs Timeline

CodeGraph reduces the number of times an AI coding assistant scans the entire project by building a local index graph, significantly lowering token consumption and improving speed, compatible with VS Code, Claude Code, and Cursor.

@Zesee: https://x.com/Zesee/status/2064994135602286765

X AI KOLs Timeline

Discussed how GitHub Copilot CLI integrates language server (LSP) to obtain semantic information, thereby enhancing code understanding in AI programming, and points out that AI programming needs to evolve from the text search layer to the semantic layer and runtime layer.

@Potatoloogs: When using Claude Code, Cursor, Codex to understand large projects, you often encounter a problem: every time you ask a question, it has to re-read files, find clues, and piece together context. Code is in src, docs in docs, design specs, screenshots, papers, videos scattered in other directories. Lots of material, but the relationships haven't been captured...

X AI KOLs Timeline

Graphify is a software engineering knowledge graph tool for AI coding assistants. It organizes project materials such as code, documents, and images into a queryable relationship graph, helping AI skip the step of repeatedly reading files when understanding large projects.