@wey_gu: Google Cloud 的 Open Knowledge Format 非常棒,标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构,我非常喜欢这个标准

X AI KOLs Following Tools

Summary

Google Cloud introduces the Open Knowledge Format (OKF), an open specification that standardizes the LLM-wiki pattern for representing structured knowledge in markdown with YAML frontmatter, aiming to improve data sharing and interoperability for AI agents.

Google Cloud 的 Open Knowledge Format 非常棒,标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构,我非常喜欢这个标准 https://t.co/vHZXvjN2Vv
Original Article
View Cached Full Text

Cached at: 06/15/26, 10:59 AM

Google Cloud 的 Open Knowledge Format 非常棒,标准化了 LLM-Wiki inspired 的分层级、有关系的 textual 知识的结构,我非常喜欢这个标准

https://t.co/vHZXvjN2Vv


How the Open Knowledge Format can improve data sharing

Source: https://cloud.google.com/blog/products/data-analytics/how-the-open-knowledge-format-can-improve-data-sharing/ As foundation models continue to improve, the lack of relevant context often limits what they can do, especially as they are used to build agentic systems. While these models can help you write code, summarize documents, or analyze a dataset, they still need the right information to produce accurate and actionable results.

That’s why today, we’re introducing the Open Knowledge Format (OKF), an open specification that formalizes theLLM-wikipattern into a portable, interoperable format. This is a vendor-neutral, agent- and human-friendly standard for representing the metadata, context, and curated knowledge that modern AI systems need.

As published,OKF v0.1represents knowledge as a directory of markdown files with YAML frontmatter, with a small set of agreed-upon conventions that let wikis written by different producers be consumed by different agents without translation.

That’s it. No complex compression scheme, no new runtime, no required SDK. A bundle of OKF documents is:

  • Just markdown— readable in any editor, renderable on GitHub, indexable by any search tool
  • Just files— shippable as a tarball, hostable in any git repo, mountable on any filesystem
  • Just YAML frontmatter— for the small set of structured fields that need to be queryable:type,title,description,resource,tags, andtimestamp

If you’ve used Obsidian, Notion, Hugo, or any of the LLM wiki patterns that have emerged over the past year, the shape will feel familiar. OKF formalizes the small set of conventions needed to make these patterns interoperable.

Let’s take a look at the problem that OKF can solve for your organization, how it works, how to get started with it, and what’s next.

A fragmented context landscape

In most organizations, the information that foundation models use is overwhelmingly internal knowledge: the schema of a table, your business’ meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API, etc.

Today, these atoms of knowledge live in a variety of highly fragmented systems:

  • Metadata catalogs with their own APIs
  • Wikis, third-party systems, or in shared drives
  • Code comments, docstrings, or notebook cells
  • The heads of a few senior engineers

When an AI agent needs to answer“How do I compute weekly active users from our event stream?“it has to assemble the answer from these scattered, mutually incompatible surfaces. Every vendor offers its own catalog, its own SDK, its own knowledge-graph schema, and none of the knowledge is easily portable across products or organizations.

The result: Every agent builder is solving the same context-assembly problem from scratch, every catalog vendor is reinventing the same data models, and the knowledge itself is locked behind whichever surface created it.

Knowledge as a living wiki

Developer teams are changing how they build AI agents. Instead of using models to search the same documents for the same facts over and over, you can give your agents a shared markdown library that grows more useful over time. This lets your agents take on the drudgery of reading and updating their own files, while your team curates the content and manages it like code.

Andrej Karpathy, the prominent AI researcher and educator, articulates this idea most crisply in hisLLM Wiki gist. “LLMs don’t get bored, don’t forget to update a cross-reference, and can touch 15 files in one pass,” he writes. The bookkeeping that causes humans to abandon personal wikis is exactly what LLMs are good at.

Similar knowledge-as-Wiki pattern keeps reappearing under different names:Obsidian vaultswired to coding agents, theAGENTS.md/CLAUDE.mdfamily of convention files, repos full ofindex.mdandlog.mdartifacts that agents consult before doing real work, and “metadata as code” repositories inside data teams.

The pattern is compelling and powerful, but each instance is bespoke. Karpathy’s wiki and your team’s wiki and a vendor’s catalog export may alllookalike (markdown, frontmatter, cross-links), but none of them are intentionally designed to cooperate. There is no agreed-upon answer towhat fields every document should carry, orwhat filenames mean what. As a result, the knowledge encoded in wikis remains siloed within the original teams, leading to redundant effort whenever a new agent is built.

What’s missing is a format, not another service

The answer to this problem isn’t another knowledge service. You need aformat, a way to represent knowledge that:

  • Anyone can produce, without an SDK
  • Anyone can consume, without an integration
  • Survives moving between systems, organizations, and tools
  • Lives in version control alongside the code it describes
  • Is readable by humansandparseable by agents: the same file, no translation layer

By design, OKF is that format.

How OKF works: The design in one screen

An OKFbundleis a directory of markdown files representing**concepts:**anything you want to capture, including tables, datasets, metrics, playbooks, runbooks, and APIs. Each concept is one file. The file path is the concept’s identity:

Each concept document has a small block of YAML front matter for structured fields and a markdown body for everything else:

Concepts link to each other with normal markdown links, turning the directory into agraphof relationships that is richer than the parent/child links implied by the file system. Bundles can optionally includeindex.mdfiles (for progressive disclosure as agents navigate the hierarchy) andlog.mdfiles (for chronological history of changes).

The full v0.1 specification (including conformance criteria, cross-linking rules, and the small number of reserved filenames) fits on a single page.

Three principles behind the design

**1. Minimally opinionated.**OKF requires exactly one thing of every concept: atypefield. Everything else (e.g., what types exist, what other fields to include, what sections the body has) is left to the producer. The spec defines theinteroperability surface, not the content model.

**2. Producer/consumer independence.**OKF cleanly separateswho writes the knowledgefromwho consumes it. A bundle hand-authored by a human can be consumed by an AI agent. A bundle generated by a metadata export pipeline can be browsed in a visualizer. A bundle synthesized by one LLM can be queried by another. The format is the contract; the tooling at each end is independently swappable.

**3. Format, not platform.**OKF is not tied to any specific cloud, database, model provider, or agent framework. It will never require a proprietary account or SDK to read, write, or serve. We’re publishing it as an open standard because the value of a knowledge format comes from how many parties speak it, not from who owns it.

What we’re shipping with the spec

To make the format concrete, we’re publishingreference implementationsat both the producer and consumer ends:

  • Anenrichment agentthat walks a BigQuery dataset, drafts an OKF concept document for every table and view, then runs a second LLM pass that crawls authoritative documentation and enriches each concept with citations, schemas, and join paths.
  • Astatic HTML visualizerthat turns any OKF bundle into an interactive graph view in a single self-contained file; no backend, no install on the viewing side, no data leaves the page.
  • Three ready-to-browse sample bundles:GA4 e-commerce,Stack Overflow, andBitcoin public datasets, produced by the reference agent and committed to the repo as living examples of conformant OKF.

These are proofs of concept, deliberately. The agent demonstratesoneway to produce OKF; nothing about the format requires a specific agent framework or LLM. The visualizer demonstratesoneway to consume it; nothing about the format requires HTML or a graph view. We expect (and want!) the ecosystem of producers and consumers to grow far beyond what we’ve shipped.

Where we go from here

OKF v0.1 is a starting point, not a finished standard. The format will evolve as more producers and consumers emerge and as we collectively learn what knowledge representations agents actually need in practice.

We’re publishing in the open from day one because that’s the only way a knowledge format earns its name, whether you’re building a knowledge catalog, an enrichment pipeline, a wiki tailored to AI agents, or anything in the AI knowledge domain.

From here, we encourage you to:

  • Read the spec(it’s short!)
  • Write a producerfor your source system, your database, your documentation site
  • **Write a consumer:**a viewer, a search index, an agent that reasons over bundles
  • Try the reference implementationagainst your own data
  • **File issues, send PRs, or propose extensions:**The spec is versioned and explicitly designed for backward-compatible growth

The repo, the spec, and the sample bundles are available inGitHub. We have also updated Google Cloud’sKnowledge Catalogto be able to ingest Open Knowledge Format and serve it to our agents. You can find the relevant code and exampleshere.

The format itself is the contribution. The tools we’ve shipped exist to make it real, and to lower the cost of trying it out. Whatever shape your knowledge takes today, OKF is designed to be thelingua francait can be exchanged for tomorrow.


Published by the Google Cloud Data Cloud team. Open Knowledge Format is an open specification; contributions, alternative implementations, and adoption beyond Google products are all explicitly welcomed.

In addition to the authors, this work came together thanks to key ideas from many others at Google, and we thank them for their contributions.

Posted in- Data Analytics

Similar Articles

Introducing the Open Knowledge Format (9 minute read)

TLDR AI

Google Cloud introduces the Open Knowledge Format (OKF), an open specification for representing metadata and curated knowledge in markdown files to improve data sharing and context for AI agents. The format aims to make knowledge from fragmented internal systems portable and interoperable.

@Huanusa: The ceiling of personal knowledge bases has arrived! This GitHub LLM Wiki project has already garnered 2800+ Stars, completely leaving ordinary RAG in the dust! It's not the useless mode of "re-retrieving" every time, but lets AI directly help you incrementally build a truly structured Wiki — compile knowledge once, and it continuously evolves...

X AI KOLs Timeline

LLM Wiki is an open-source desktop application that uses LLM to incrementally build a structured knowledge base, supporting knowledge graphs, community detection, Obsidian integration, and Chrome clipping, aiming to replace traditional RAG approaches.