@wsl8297: 读一本厚书,最难的不是“让 AI 总结一下”,而是把章节之间的概念关系、论证脉络和可回顾的结构保存下来。 SpineDigest 是一个开源工具,用 LLM pipeline 把长书压缩成更像“骨架”的结构化输出。 GitHub:http…
摘要
SpineDigest is an open-source tool that uses an LLM pipeline to transform long-form books into structured summaries with chapter topology and knowledge graphs, supporting EPUB, Markdown, and TXT input.
查看缓存全文
缓存时间: 2026/05/26 15:11
读一本厚书,最难的不是“让 AI 总结一下”,而是把章节之间的概念关系、论证脉络和可回顾的结构保存下来。
SpineDigest 是一个开源工具,用 LLM pipeline 把长书压缩成更像“骨架”的结构化输出。
GitHub:https://github.com/oomol-lab/spinedigest…
它和普通摘要工具不太一样:
- 支持 EPUB、Markdown、TXT 输入
- 可以根据你的阅读目的写 prompt,比如保留角色变化、知识结构或论证主线
- 先把长文本拆成 chunks,再构建概念相关的 knowledge graph
- 输出不只是文字摘要,还可以生成
.sdpub归档格式 - 配套 Inkora 阅读器能打开
.sdpub,查看章节拓扑和知识图谱 - 也支持导出 EPUB、Markdown 或纯文本
如果你想把一本书变成可复习、可检索、可二次整理的结构化笔记,SpineDigest 比一句“总结全文”更有用。
oomol-lab/spinedigest
Source: https://github.com/oomol-lab/spinedigest

Distill every book down to its spine: SpineDigest feeds long-form books into an LLM pipeline and distills them into their essential content. The output isn’t just a text summary — it also builds a chapter topology and a knowledge graph so the structure of the whole book is visible at a glance.

Install
Requirements:
- Node
>=22.12.0 - For source digestion from EPUB, Markdown, or TXT: a supported LLM provider plus credentials
- For
.sdpubre-export orsdpubinspection only: no LLM access required
Try it without a global install:
npx spinedigest --help
Global install:
npm install -g spinedigest
To explore the CLI surface first, start with:
spinedigest --help
spinedigest help ai
Quick Start
The first two examples below create a new digest from source input, so they require LLM configuration first. If you need config setup details, run:
spinedigest help config
Digest an EPUB into Markdown:
spinedigest --input ./book.epub --output ./digest.md --prompt "Preserve emotional shifts for both major and supporting characters."
Save a reusable archive first, then export later:
spinedigest --input ./book.epub --output ./book.sdpub
spinedigest --input ./book.sdpub --output ./book.epub
Pipe from stdin, receive on stdout:
cat ./chapter.txt | spinedigest --input-format txt --output-format markdown
Full flag reference: CLI Reference.
Why We Built This
People say you can’t summarize a whole book with an LLM because the context window isn’t long enough. But consider this: human short-term memory holds only 7±2 items (Miller’s Law) — far shorter than any LLM context window. Humans still manage to read entire books and write summaries.
The bottleneck isn’t the window. It’s knowing what to cut.
A good summary can’t preserve everything, and deciding what to drop is harder than deciding what to keep. There’s no universal standard for what matters, either. It depends entirely on why you’re reading: “What practical advice does the author give?”, “What’s the central argument?”, “How does the protagonist change?” Each purpose leads to completely different trade-offs. Ask an AI to summarize without any direction and it genuinely doesn’t know how — there’s no single right answer that works for everyone.
SpineDigest solves this with a staged pipeline.
First, an LLM reads the source text section by section, simulating the way human attention is drawn to key ideas. It extracts a set of chunks — the term cognitive psychology uses for discrete units of information in working memory. Each chunk is an attention landing point: one independent knowledge unit from the original text.
Next, the pipeline hands off to a classical algorithm. I build a knowledge graph with chunks as nodes, connect them by conceptual relevance, then use graph traversal and community detection to cluster the semantically related ones together. Each cluster is serialized in original reading order into what I call a snake — a threaded knowledge chain that winds through the source text, linking related ideas end to end.
Finally, the summarization phase switches back to LLMs, using an adversarial Multi-Agent framework with two roles: a respondent who writes the summary, and a panel of professors who challenge it.
Every professor holds a snake.
Picture a dissertation defense. The respondent stands at the front. The professors sit around the table, each holding a section of the original text, each measuring the draft against your stated extraction goal. They take turns: you missed this point, you didn’t give that passage fair treatment. The respondent has to answer every challenge — they can’t fully ignore anyone, but they can’t fully satisfy everyone either. After several rounds, the final summary is the result of that pressure: a forced compromise where every part of the source gets some representation, even if it’s just a sentence, and nothing is erased entirely.
Your intent runs through the whole pipeline. During the reading phase, the AI’s attention is already shaped by what you told it to care about — your interests determine where the chunks land. During the defense phase, the professors apply that same goal as their evaluation standard. Content that aligns with your stated purpose gets protected by multiple professors at once; content that doesn’t loses its advocates and gets pushed out under sustained pressure. The one sentence you wrote at the start keeps working at both ends.
The .sdpub Format
Every time SpineDigest finishes processing, it produces a .sdpub file. Think of it as a processed archive: it holds not just the summary text but the complete knowledge structure built along the way — chunks, snakes, the full concept graph.
With that archive on hand, you can export to EPUB, Markdown, or plain text any time without re-running the LLM pipeline. The trade-off: exported formats carry the text but lose the structural data. The chapter topology, snake connections, and knowledge graph live only inside .sdpub. If you might want to re-export later, or browse the book’s structure in a visualization tool, keep the file around.
To open a .sdpub file, use Inkora — a free app built specifically for it, with chapter topology and knowledge graph views.
For the internal layout and parser guidance, see the format spec.
Inputs and Outputs
| Format | Input | Output |
|---|---|---|
.epub | ✓ | ✓ |
.md | ✓ | ✓ |
.txt | ✓ | ✓ |
.sdpub | ✓ | ✓ |
stdin (txt / md) | ✓ | — |
stdout | — | ✓ |
Requirements: Node >=22.12.0 and a supported LLM provider with credentials. .sdpub input does not require LLM access.
Library Usage
SpineDigest also exposes a programmatic API for embedding the pipeline in your own Node or TypeScript code. See Library Usage.
Related Projects
- PDF Craft: If your source material is a scanned PDF, PDF Craft can convert it into EPUB or Markdown before you feed it into SpineDigest.
- EPUB Translator: If your goal is bilingual reading rather than summarization, EPUB Translator turns an EPUB into a bilingual edition while preserving the original layout.
For AI Agents
SpineDigest’s CLI-first design makes it easy to call directly, with no extra integration code.
- Prefer the CLI. Use the programmatic API only when code-level integration is explicitly required.
- Use help as the discovery surface. Start with
spinedigest --helpas the root page, then followspinedigest help ai, topic pages, or command-specific--helpbefore guessing behavior. - Trust
--help. Every command in the CLI exposes usage guidance through--help. - Use explicit paths. Pass
--inputand--outputfor deterministic, repeatable runs. - Check exit codes. Success returns
0; failure returns non-zero with a plain-text error onstderr. - stdin is narrow. Only
txtandmdare accepted, and only in non-interactive flows. - No LLM needed for
.sdpub. Re-exporting an archive never calls an LLM provider. - Keep the archive. If the same digest might need re-exporting, treat
.sdpubas the intermediate artifact.
Useful help entry points:
spinedigest help ai
spinedigest help task
spinedigest help config
spinedigest help env
spinedigest help config-file
spinedigest help sdpub
Full agent guidance: AI Agent Guide.
相似文章
@GitHub_Daily: GitHub 上 SpineDigest 这个开源工具,能把整本书提炼成结构化的精华内容,而且可以按自己的阅读目的来决定保留什么。 它的处理思路挺有意思的,先让 AI 逐章提取关键知识点,再用算法构建知识图谱把相关概念串联起来。 最后通过…
SpineDigest 是一个开源 CLI 工具,利用多阶段 AI 管道将长书提炼为结构化精华,生成章节拓扑图和知识图谱,并配合 Inkora 阅读器展示。
@sitinme: 不“让 AI 总结一本书”,而是更进一步:把一本书、一个文档包,整理成 AI Agent 可以反复调用的 Skill,这个思路感觉可以聊一聊。 之前书买了、读了,过一阵想找里面某个知识点,翻半天找不到;问 AI 吧,它可能瞎编;把整本 P…
介绍了一个将书籍或文档包转换为AI Agent可调用Skill的工具book-to-skill,支持PDF等格式,生成SKILL.md和章节索引,避免一次性加载全部上下文。
@Moting284: https://x.com/Moting284/status/2067477785782972901
这篇文章详细介绍了如何用 AI 工具 Codex 按章节阅读难书,通过读前问题、复述补漏、应用问题和章节卡片四个步骤,提升理解和记忆效果,并提供了完整提示词模板。
@wsl8297: 如果你手里有一堆 PDF、文档、项目资料要喂给 AI,Synthadoc 这个方向很值得看。 GitHub:https://github.com/axoviq-ai/synthadoc… 它把原始资料在摄入时就编译成结构化 wiki,自动…
Synthadoc 是一个开源工具,可将 PDF、文档等项目资料编译为结构化的本地 Markdown wiki,自动建立交叉引用并检测矛盾,适合个人或小团队进行离线知识管理。
@VincentLogic: 做 RAG 最头疼的是什么? 不是AI大模型,是文档解析啊! PDF、Word、PPT 转 Markdown 转得乱七八糟,表格公式全乱套... 最近试了下 MinerU 3.1,真香了! 一键转换,格式保留完美 表格、公式、图片自动识别…
推荐MinerU 3.1文档解析工具,能完美将PDF、Word、PPT等转换为Markdown,支持表格、公式、图片自动识别,并提供三种模式(Pipeline/VLM),开源且可商用。