@AlphaSignalAI: LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for variou…

X AI KOLs Timeline 06/16/26, 03:39 PM News

llm knowledge-base open-knowledge-format google ai-agents context knowledge-management

Summary

Google's Open Knowledge Format (OKF) proposes a portable standard for organizational knowledge to help AI agents retrieve correct context, addressing fragmentation across data catalogs, wikis, and code.

LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating

Original Article

View Cached Full Text

Cached at: 06/17/26, 07:51 AM

LLM Knowledge Bases Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating

The Reason Your AI Agent Keeps Getting Context Wrong

AI agents struggle less with reasoning than with finding the right context. Google’s Open Knowledge Format proposes a portable standard for organizational knowledge.

In 6 minutes you’ll learn why AI agents struggle with fragmented organizational knowledge, how OKF works, and why Google believes formats, not platforms, will define the next layer of AI infrastructure.

Schema of a table lives in a data catalog with a proprietary API. The business definition of a metric lives in a Confluence wiki nobody updates.

The join path between two systems lives in the head of a senior engineer who left in March.

When an agent needs to answer “how do I compute weekly active users from our event stream,” it has to assemble the answer from these scattered, mutually incompatible surfaces.

The knowledge exists. The format to share it doesn’t.

Google just published a proposal for fixing that.

It’s called the Open Knowledge Format (OKF). Most teams are still solving this from scratch. OKF is an argument that they shouldn’t have to.

The problem is format fragmentation

The knowledge exists in most organizations. Schema of a table, business definition of a metric, join paths between systems, incident runbooks, deprecation notices for old APIs.

These atoms of context are what separate an agent that gives a generic answer from one that gives a correct one.

The problem is where they live.

Data catalogs with proprietary APIs. Wikis in Confluence or Notion. Code comments and docstrings. The heads of senior engineers. Every system uses its own format, its own schema, its own access model.

Every team building agents solves this assembly problem independently. The knowledge is format-locked behind whichever system created it.

Before OKF, teams were already converging on the same informal solution.

Andrej Karpathy articulated it in a widely-cited gist: give the LLM a shared markdown library that grows more useful over time, let agents read and update their own files, curate the content like code.

Andrej Karpathy@karpathy·Apr 3LLM Knowledge Bases

Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulatingShow more2.8K9.3K59K21M

The pattern kept reappearing independently: Obsidian vaults wired to coding agents, AGENTS.md and CLAUDE.md convention files, metadata-as-code repositories inside data teams.

Each instance was bespoke. They all looked alike (markdown, YAML frontmatter, cross-links) but none were designed to cooperate. No agreed-upon fields, no agreed-upon reserved filenames, no interoperability.

OKF is the formalization of this pattern.

What OKF actually is

A bundle is a directory of markdown files. Each file represents a concept: a table, a dataset, a metric, a playbook, an API endpoint, anything worth capturing. The file path is the concept’s identity.

Every concept has a YAML frontmatter block and a markdown body. The frontmatter has one required field:

type is required. Everything else is optional.

Producers can add any additional fields. Consumers must tolerate unknown fields without rejecting the document.

Concepts link to each other with standard markdown links, turning the directory into a traversable graph of relationships.

Two reserved filenames carry special meaning: index.md for directory listings and log.md for chronological history of changes.

That’s the entire format. The full v0.1 specification fits on a single page.

Three design choices worth understanding

**One required field.

**OKF requires exactly one thing of every concept: a type field. What types exist, what other fields to include, what sections the body contains, all left to the producer.

The spec defines the interoperability surface, not the content model. A deliberate bet that a minimal shared surface beats a comprehensive schema that nobody fully adopts.

**Producer and consumer independence.

**A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another.

The format is the contract. The tooling at each end is independently swappable.

**Format, not platform.

**No schema registry. No central authority. No required tooling. If you can cat a file, you can read OKF. If you can git clone a repo, you can ship it.

The value of a knowledge format comes from how many parties speak it, not from who owns it.

Google’s bet

The same way JSON gave teams a common exchange format for data, OKF is a bet that organizational knowledge needs the same thing.

JSON won because it was simple enough to adopt almost everywhere. The format was the contribution. The ecosystem followed.

OKF is making the same argument about knowledge. Not another platform. Not another catalog service. Not another knowledge graph with a proprietary API.

A format that anyone can produce without an SDK and anyone can consume without an integration.

Whether this thesis proves correct depends on adoption. OKF is a v0.1 draft from one vendor.

The field has prior attempts: DataHub, Amundsen, OpenMetadata, DCAT, Schema.org.

None are identical to OKF but the landscape is not empty. What’s different is the minimalism and the git-native distribution model. Prior catalog formats require infrastructure.

OKF requires a text editor and a directory. That’s a meaningful difference in the barrier to adoption.

Google shipped a reference enrichment agent, a static visualizer, and three sample bundles alongside the spec.

It also updated Google Cloud’s Knowledge Catalog to ingest OKF and serve it to agents.

A signal that Google is using this internally, not just publishing a standard it wants others to adopt.

Why this matters if you’re building agents

The context assembly problem is not going away. As agents become more capable, the quality of their output depends increasingly on the quality of the context they can access.

The model capability gap between frontier models is shrinking. The organizational knowledge gap is not.

If OKF gains adoption, knowledge becomes portable across teams, tools, and organizations without custom integration work.

Context assembly becomes a format problem rather than a bespoke engineering problem. If that thesis is right, formats not platforms will define how agent ecosystems share context.

Every agent framework ultimately runs into the same problem:

Before an agent can reason, it has to find the right context.

Most of the engineering effort isn’t in the reasoning step.

It’s in assembling the knowledge needed to make reasoning possible.

OKF is a proposal for standardizing that layer.

@AlphaSignalAI: LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for variou…

The Reason Your AI Agent Keeps Getting Context Wrong

AI agents struggle less with reasoning than with finding the right context. Google’s Open Knowledge Format proposes a portable standard for organizational knowledge.

The problem is format fragmentation

What OKF actually is

Three design choices worth understanding

Google’s bet

Why this matters if you’re building agents

Similar Articles

@Saboo_Shubham_: Google just introduced Open Knowledge Format. An open standard for the context AI agents need based on Karpathy's LLM w…

@wey_gu: Google Cloud 的 Open Knowledge Format 非常棒，标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构，我非常喜欢这个标准

Introducing the Open Knowledge Format (9 minute read)

LLM Wiki v2 (16 minute read)

@tom_doerr: OpenKB builds a wiki-style knowledge base using PageIndex for vectorless retrieval. https://github.com/VectifyAI/OpenKB…

Submit Feedback

Similar Articles

@Saboo_Shubham_: Google just introduced Open Knowledge Format. An open standard for the context AI agents need based on Karpathy's LLM w…

@wey_gu: Google Cloud 的 Open Knowledge Format 非常棒，标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构，我非常喜欢这个标准

Introducing the Open Knowledge Format (9 minute read)

@tom_doerr: OpenKB builds a wiki-style knowledge base using PageIndex for vectorless retrieval. https://github.com/VectifyAI/OpenKB…