@AlphaSignalAI: LLM Knowledge Bases Something I'm finding very useful recently: using LLMs to build personal knowledge bases for variou…
Summary
Google's Open Knowledge Format (OKF) proposes a portable standard for organizational knowledge to help AI agents retrieve correct context, addressing fragmentation across data catalogs, wikis, and code.
View Cached Full Text
Cached at: 06/17/26, 07:51 AM
LLM Knowledge Bases Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating
The Reason Your AI Agent Keeps Getting Context Wrong
AI agents struggle less with reasoning than with finding the right context. Google’s Open Knowledge Format proposes a portable standard for organizational knowledge.
In 6 minutes you’ll learn why AI agents struggle with fragmented organizational knowledge, how OKF works, and why Google believes formats, not platforms, will define the next layer of AI infrastructure.
Schema of a table lives in a data catalog with a proprietary API. The business definition of a metric lives in a Confluence wiki nobody updates.
The join path between two systems lives in the head of a senior engineer who left in March.
When an agent needs to answer “how do I compute weekly active users from our event stream,” it has to assemble the answer from these scattered, mutually incompatible surfaces.
The knowledge exists. The format to share it doesn’t.
Google just published a proposal for fixing that.
It’s called the Open Knowledge Format (OKF). Most teams are still solving this from scratch. OKF is an argument that they shouldn’t have to.
The problem is format fragmentation
The knowledge exists in most organizations. Schema of a table, business definition of a metric, join paths between systems, incident runbooks, deprecation notices for old APIs.
These atoms of context are what separate an agent that gives a generic answer from one that gives a correct one.
The problem is where they live.
Data catalogs with proprietary APIs. Wikis in Confluence or Notion. Code comments and docstrings. The heads of senior engineers. Every system uses its own format, its own schema, its own access model.
Every team building agents solves this assembly problem independently. The knowledge is format-locked behind whichever system created it.
Before OKF, teams were already converging on the same informal solution.
Andrej Karpathy articulated it in a widely-cited gist: give the LLM a shared markdown library that grows more useful over time, let agents read and update their own files, curate the content like code.
Andrej Karpathy@karpathy·Apr 3LLM Knowledge Bases
Something I’m finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulatingShow more2.8K9.3K59K21M
The pattern kept reappearing independently: Obsidian vaults wired to coding agents, AGENTS.md and CLAUDE.md convention files, metadata-as-code repositories inside data teams.
Each instance was bespoke. They all looked alike (markdown, YAML frontmatter, cross-links) but none were designed to cooperate. No agreed-upon fields, no agreed-upon reserved filenames, no interoperability.
OKF is the formalization of this pattern.
What OKF actually is
A bundle is a directory of markdown files. Each file represents a concept: a table, a dataset, a metric, a playbook, an API endpoint, anything worth capturing. The file path is the concept’s identity.
Every concept has a YAML frontmatter block and a markdown body. The frontmatter has one required field:
type is required. Everything else is optional.
Producers can add any additional fields. Consumers must tolerate unknown fields without rejecting the document.
Concepts link to each other with standard markdown links, turning the directory into a traversable graph of relationships.
Two reserved filenames carry special meaning: index.md for directory listings and log.md for chronological history of changes.
That’s the entire format. The full v0.1 specification fits on a single page.
Three design choices worth understanding
**One required field.
**OKF requires exactly one thing of every concept: a type field. What types exist, what other fields to include, what sections the body contains, all left to the producer.
The spec defines the interoperability surface, not the content model. A deliberate bet that a minimal shared surface beats a comprehensive schema that nobody fully adopts.
**Producer and consumer independence.
**A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another.
The format is the contract. The tooling at each end is independently swappable.
**Format, not platform.
**No schema registry. No central authority. No required tooling. If you can cat a file, you can read OKF. If you can git clone a repo, you can ship it.
The value of a knowledge format comes from how many parties speak it, not from who owns it.
Google’s bet
The same way JSON gave teams a common exchange format for data, OKF is a bet that organizational knowledge needs the same thing.
JSON won because it was simple enough to adopt almost everywhere. The format was the contribution. The ecosystem followed.
OKF is making the same argument about knowledge. Not another platform. Not another catalog service. Not another knowledge graph with a proprietary API.
A format that anyone can produce without an SDK and anyone can consume without an integration.
Whether this thesis proves correct depends on adoption. OKF is a v0.1 draft from one vendor.
The field has prior attempts: DataHub, Amundsen, OpenMetadata, DCAT, Schema.org.
None are identical to OKF but the landscape is not empty. What’s different is the minimalism and the git-native distribution model. Prior catalog formats require infrastructure.
OKF requires a text editor and a directory. That’s a meaningful difference in the barrier to adoption.
Google shipped a reference enrichment agent, a static visualizer, and three sample bundles alongside the spec.
It also updated Google Cloud’s Knowledge Catalog to ingest OKF and serve it to agents.
A signal that Google is using this internally, not just publishing a standard it wants others to adopt.
Why this matters if you’re building agents
The context assembly problem is not going away. As agents become more capable, the quality of their output depends increasingly on the quality of the context they can access.
The model capability gap between frontier models is shrinking. The organizational knowledge gap is not.
If OKF gains adoption, knowledge becomes portable across teams, tools, and organizations without custom integration work.
Context assembly becomes a format problem rather than a bespoke engineering problem. If that thesis is right, formats not platforms will define how agent ecosystems share context.
Every agent framework ultimately runs into the same problem:
Before an agent can reason, it has to find the right context.
Most of the engineering effort isn’t in the reasoning step.
It’s in assembling the knowledge needed to make reasoning possible.
OKF is a proposal for standardizing that layer.
Similar Articles
@Saboo_Shubham_: Google just introduced Open Knowledge Format. An open standard for the context AI agents need based on Karpathy's LLM w…
Google announced the Open Knowledge Format, an open standard based on Karpathy's LLM wiki concept, designed to provide context for AI agents using simple markdown files.
@wey_gu: Google Cloud 的 Open Knowledge Format 非常棒,标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构,我非常喜欢这个标准
Google Cloud introduces the Open Knowledge Format (OKF), an open specification that standardizes the LLM-wiki pattern for representing structured knowledge in markdown with YAML frontmatter, aiming to improve data sharing and interoperability for AI agents.
Introducing the Open Knowledge Format (9 minute read)
Google Cloud introduces the Open Knowledge Format (OKF), an open specification for representing metadata and curated knowledge in markdown files to improve data sharing and context for AI agents. The format aims to make knowledge from fragmented internal systems portable and interoperable.
LLM Wiki v2 (16 minute read)
This post presents a pattern for building personal knowledge bases using LLMs, offering a structured approach for leveraging large language models in knowledge management.
@tom_doerr: OpenKB builds a wiki-style knowledge base using PageIndex for vectorless retrieval. https://github.com/VectifyAI/OpenKB…
OpenKB is an open-source CLI tool that compiles documents into a wiki-style knowledge base using PageIndex for vectorless long document retrieval, enabling reasoning-based retrieval without vector databases.