@wsl8297: 读一本厚书,最难的不是“让 AI 总结一下”,而是把章节之间的概念关系、论证脉络和可回顾的结构保存下来。 SpineDigest 是一个开源工具,用 LLM pipeline 把长书压缩成更像“骨架”的结构化输出。 GitHub:http…

X AI KOLs Timeline 工具

摘要

SpineDigest is an open-source tool that uses an LLM pipeline to transform long-form books into structured summaries with chapter topology and knowledge graphs, supporting EPUB, Markdown, and TXT input.

读一本厚书,最难的不是“让 AI 总结一下”,而是把章节之间的概念关系、论证脉络和可回顾的结构保存下来。 SpineDigest 是一个开源工具,用 LLM pipeline 把长书压缩成更像“骨架”的结构化输出。 GitHub:https://github.com/oomol-lab/spinedigest… 它和普通摘要工具不太一样: - 支持 EPUB、Markdown、TXT 输入 - 可以根据你的阅读目的写 prompt,比如保留角色变化、知识结构或论证主线 - 先把长文本拆成 chunks,再构建概念相关的 knowledge graph - 输出不只是文字摘要,还可以生成 `.sdpub` 归档格式 - 配套 Inkora 阅读器能打开 `.sdpub`,查看章节拓扑和知识图谱 - 也支持导出 EPUB、Markdown 或纯文本 如果你想把一本书变成可复习、可检索、可二次整理的结构化笔记,SpineDigest 比一句“总结全文”更有用。
查看原文
查看缓存全文

缓存时间: 2026/05/26 15:11

读一本厚书,最难的不是“让 AI 总结一下”,而是把章节之间的概念关系、论证脉络和可回顾的结构保存下来。

SpineDigest 是一个开源工具,用 LLM pipeline 把长书压缩成更像“骨架”的结构化输出。

GitHub:https://github.com/oomol-lab/spinedigest…

它和普通摘要工具不太一样:

  • 支持 EPUB、Markdown、TXT 输入
  • 可以根据你的阅读目的写 prompt,比如保留角色变化、知识结构或论证主线
  • 先把长文本拆成 chunks,再构建概念相关的 knowledge graph
  • 输出不只是文字摘要,还可以生成 .sdpub 归档格式
  • 配套 Inkora 阅读器能打开 .sdpub,查看章节拓扑和知识图谱
  • 也支持导出 EPUB、Markdown 或纯文本

如果你想把一本书变成可复习、可检索、可二次整理的结构化笔记,SpineDigest 比一句“总结全文”更有用。


oomol-lab/spinedigest

Source: https://github.com/oomol-lab/spinedigest

SpineDigest

English | 中文

npm version License: Apache 2.0 Node >=22.12.0

SpineDigest Terminal Demo

Distill every book down to its spine: SpineDigest feeds long-form books into an LLM pipeline and distills them into their essential content. The output isn’t just a text summary — it also builds a chapter topology and a knowledge graph so the structure of the whole book is visible at a glance.

Inkora screenshot

Inkora opening a .sdpub file

Install

Requirements:

  • Node >=22.12.0
  • For source digestion from EPUB, Markdown, or TXT: a supported LLM provider plus credentials
  • For .sdpub re-export or sdpub inspection only: no LLM access required

Try it without a global install:

npx spinedigest --help

Global install:

npm install -g spinedigest

To explore the CLI surface first, start with:

spinedigest --help
spinedigest help ai

Quick Start

The first two examples below create a new digest from source input, so they require LLM configuration first. If you need config setup details, run:

spinedigest help config

Digest an EPUB into Markdown:

spinedigest --input ./book.epub --output ./digest.md --prompt "Preserve emotional shifts for both major and supporting characters."

Save a reusable archive first, then export later:

spinedigest --input ./book.epub --output ./book.sdpub
spinedigest --input ./book.sdpub --output ./book.epub

Pipe from stdin, receive on stdout:

cat ./chapter.txt | spinedigest --input-format txt --output-format markdown

Full flag reference: CLI Reference.

Why We Built This

People say you can’t summarize a whole book with an LLM because the context window isn’t long enough. But consider this: human short-term memory holds only 7±2 items (Miller’s Law) — far shorter than any LLM context window. Humans still manage to read entire books and write summaries.

The bottleneck isn’t the window. It’s knowing what to cut.

A good summary can’t preserve everything, and deciding what to drop is harder than deciding what to keep. There’s no universal standard for what matters, either. It depends entirely on why you’re reading: “What practical advice does the author give?”, “What’s the central argument?”, “How does the protagonist change?” Each purpose leads to completely different trade-offs. Ask an AI to summarize without any direction and it genuinely doesn’t know how — there’s no single right answer that works for everyone.

SpineDigest solves this with a staged pipeline.

First, an LLM reads the source text section by section, simulating the way human attention is drawn to key ideas. It extracts a set of chunks — the term cognitive psychology uses for discrete units of information in working memory. Each chunk is an attention landing point: one independent knowledge unit from the original text.

Next, the pipeline hands off to a classical algorithm. I build a knowledge graph with chunks as nodes, connect them by conceptual relevance, then use graph traversal and community detection to cluster the semantically related ones together. Each cluster is serialized in original reading order into what I call a snake — a threaded knowledge chain that winds through the source text, linking related ideas end to end.

Finally, the summarization phase switches back to LLMs, using an adversarial Multi-Agent framework with two roles: a respondent who writes the summary, and a panel of professors who challenge it.

Every professor holds a snake.

Picture a dissertation defense. The respondent stands at the front. The professors sit around the table, each holding a section of the original text, each measuring the draft against your stated extraction goal. They take turns: you missed this point, you didn’t give that passage fair treatment. The respondent has to answer every challenge — they can’t fully ignore anyone, but they can’t fully satisfy everyone either. After several rounds, the final summary is the result of that pressure: a forced compromise where every part of the source gets some representation, even if it’s just a sentence, and nothing is erased entirely.

SpineDigest architecture

Your intent runs through the whole pipeline. During the reading phase, the AI’s attention is already shaped by what you told it to care about — your interests determine where the chunks land. During the defense phase, the professors apply that same goal as their evaluation standard. Content that aligns with your stated purpose gets protected by multiple professors at once; content that doesn’t loses its advocates and gets pushed out under sustained pressure. The one sentence you wrote at the start keeps working at both ends.

The .sdpub Format

Every time SpineDigest finishes processing, it produces a .sdpub file. Think of it as a processed archive: it holds not just the summary text but the complete knowledge structure built along the way — chunks, snakes, the full concept graph.

With that archive on hand, you can export to EPUB, Markdown, or plain text any time without re-running the LLM pipeline. The trade-off: exported formats carry the text but lose the structural data. The chapter topology, snake connections, and knowledge graph live only inside .sdpub. If you might want to re-export later, or browse the book’s structure in a visualization tool, keep the file around.

To open a .sdpub file, use Inkora — a free app built specifically for it, with chapter topology and knowledge graph views.

For the internal layout and parser guidance, see the format spec.

Inputs and Outputs

FormatInputOutput
.epub
.md
.txt
.sdpub
stdin (txt / md)
stdout

Requirements: Node >=22.12.0 and a supported LLM provider with credentials. .sdpub input does not require LLM access.

Library Usage

SpineDigest also exposes a programmatic API for embedding the pipeline in your own Node or TypeScript code. See Library Usage.

Related Projects

  • PDF Craft: If your source material is a scanned PDF, PDF Craft can convert it into EPUB or Markdown before you feed it into SpineDigest.
  • EPUB Translator: If your goal is bilingual reading rather than summarization, EPUB Translator turns an EPUB into a bilingual edition while preserving the original layout.

For AI Agents

SpineDigest’s CLI-first design makes it easy to call directly, with no extra integration code.

  • Prefer the CLI. Use the programmatic API only when code-level integration is explicitly required.
  • Use help as the discovery surface. Start with spinedigest --help as the root page, then follow spinedigest help ai, topic pages, or command-specific --help before guessing behavior.
  • Trust --help. Every command in the CLI exposes usage guidance through --help.
  • Use explicit paths. Pass --input and --output for deterministic, repeatable runs.
  • Check exit codes. Success returns 0; failure returns non-zero with a plain-text error on stderr.
  • stdin is narrow. Only txt and md are accepted, and only in non-interactive flows.
  • No LLM needed for .sdpub. Re-exporting an archive never calls an LLM provider.
  • Keep the archive. If the same digest might need re-exporting, treat .sdpub as the intermediate artifact.

Useful help entry points:

spinedigest help ai
spinedigest help task
spinedigest help config
spinedigest help env
spinedigest help config-file
spinedigest help sdpub

Full agent guidance: AI Agent Guide.

相似文章

@GitHub_Daily: GitHub 上 SpineDigest 这个开源工具,能把整本书提炼成结构化的精华内容,而且可以按自己的阅读目的来决定保留什么。 它的处理思路挺有意思的,先让 AI 逐章提取关键知识点,再用算法构建知识图谱把相关概念串联起来。 最后通过…

X AI KOLs Timeline

SpineDigest 是一个开源 CLI 工具,利用多阶段 AI 管道将长书提炼为结构化精华,生成章节拓扑图和知识图谱,并配合 Inkora 阅读器展示。

@sitinme: 不“让 AI 总结一本书”,而是更进一步:把一本书、一个文档包,整理成 AI Agent 可以反复调用的 Skill,这个思路感觉可以聊一聊。 之前书买了、读了,过一阵想找里面某个知识点,翻半天找不到;问 AI 吧,它可能瞎编;把整本 P…

X AI KOLs Timeline

介绍了一个将书籍或文档包转换为AI Agent可调用Skill的工具book-to-skill,支持PDF等格式,生成SKILL.md和章节索引,避免一次性加载全部上下文。