@GitHub_Daily: GitHub 上 SpineDigest 这个开源工具，能把整本书提炼成结构化的精华内容，而且可以按自己的阅读目的来决定保留什么。它的处理思路挺有意思的，先让 AI 逐章提取关键知识点，再用算法构建知识图谱把相关概念串联起来。最后通过…

X AI KOLs Timeline 2026/05/11 05:30 工具

摘要

SpineDigest 是一个开源 CLI 工具，利用多阶段 AI 管道将长书提炼为结构化精华，生成章节拓扑图和知识图谱，并配合 Inkora 阅读器展示。

GitHub 上 SpineDigest 这个开源工具，能把整本书提炼成结构化的精华内容，而且可以按自己的阅读目的来决定保留什么。它的处理思路挺有意思的，先让 AI 逐章提取关键知识点，再用算法构建知识图谱把相关概念串联起来。最后通过多个 AI 角色「答辩式」对抗生成最终摘要，确保不会遗漏重要内容。 GitHub：http://github.com/oomol-lab/spinedigest… 支持 EPUB、Markdown、纯文本输入，产出除了文字摘要，还有章节拓扑图和知识图谱，一眼看清全书脉络。配套还有免费的可视化阅读器 Inkora，可以浏览章节关系和知识图谱。如果你想把一本厚书快速转化成可回顾的结构化笔记，这个工具值得试试。

查看原文导出为 Word 导出为 PDF

查看缓存全文

缓存时间: 2026/05/11 08:38

oomol-lab/spinedigest

Source: https://github.com/oomol-lab/spinedigest

SpineDigest

English | 中文

SpineDigest Terminal Demo

Distill every book down to its spine: SpineDigest feeds long-form books into an LLM pipeline and distills them into their essential content. The output isn’t just a text summary — it also builds a chapter topology and a knowledge graph so the structure of the whole book is visible at a glance.

Inkora screenshot

_{Inkora opening a .sdpub file}

Install

Requirements:

Node >=22.12.0
For source digestion from EPUB, Markdown, or TXT: a supported LLM provider plus credentials
For .sdpub re-export or sdpub inspection only: no LLM access required

Try it without a global install:

npx spinedigest --help

Global install:

npm install -g spinedigest

To explore the CLI surface first, start with:

spinedigest --help
spinedigest help ai

Quick Start

The first two examples below create a new digest from source input, so they require LLM configuration first. If you need config setup details, run:

spinedigest help config

Digest an EPUB into Markdown:

spinedigest --input ./book.epub --output ./digest.md --prompt "Preserve emotional shifts for both major and supporting characters."

Save a reusable archive first, then export later:

spinedigest --input ./book.epub --output ./book.sdpub
spinedigest --input ./book.sdpub --output ./book.epub

Pipe from stdin, receive on stdout:

cat ./chapter.txt | spinedigest --input-format txt --output-format markdown

Full flag reference: CLI Reference.

Why We Built This

People say you can’t summarize a whole book with an LLM because the context window isn’t long enough. But consider this: human short-term memory holds only 7±2 items (Miller’s Law) — far shorter than any LLM context window. Humans still manage to read entire books and write summaries.

The bottleneck isn’t the window. It’s knowing what to cut.

A good summary can’t preserve everything, and deciding what to drop is harder than deciding what to keep. There’s no universal standard for what matters, either. It depends entirely on why you’re reading: “What practical advice does the author give?”, “What’s the central argument?”, “How does the protagonist change?” Each purpose leads to completely different trade-offs. Ask an AI to summarize without any direction and it genuinely doesn’t know how — there’s no single right answer that works for everyone.

SpineDigest solves this with a staged pipeline.

First, an LLM reads the source text section by section, simulating the way human attention is drawn to key ideas. It extracts a set of chunks — the term cognitive psychology uses for discrete units of information in working memory. Each chunk is an attention landing point: one independent knowledge unit from the original text.

Next, the pipeline hands off to a classical algorithm. I build a knowledge graph with chunks as nodes, connect them by conceptual relevance, then use graph traversal and community detection to cluster the semantically related ones together. Each cluster is serialized in original reading order into what I call a snake — a threaded knowledge chain that winds through the source text, linking related ideas end to end.

Finally, the summarization phase switches back to LLMs, using an adversarial Multi-Agent framework with two roles: a respondent who writes the summary, and a panel of professors who challenge it.

Every professor holds a snake.

Picture a dissertation defense. The respondent stands at the front. The professors sit around the table, each holding a section of the original text, each measuring the draft against your stated extraction goal. They take turns: you missed this point, you didn’t give that passage fair treatment. The respondent has to answer every challenge — they can’t fully ignore anyone, but they can’t fully satisfy everyone either. After several rounds, the final summary is the result of that pressure: a forced compromise where every part of the source gets some representation, even if it’s just a sentence, and nothing is erased entirely.

SpineDigest architecture

Your intent runs through the whole pipeline. During the reading phase, the AI’s attention is already shaped by what you told it to care about — your interests determine where the chunks land. During the defense phase, the professors apply that same goal as their evaluation standard. Content that aligns with your stated purpose gets protected by multiple professors at once; content that doesn’t loses its advocates and gets pushed out under sustained pressure. The one sentence you wrote at the start keeps working at both ends.

The `.sdpub` Format

Every time SpineDigest finishes processing, it produces a .sdpub file. Think of it as a processed archive: it holds not just the summary text but the complete knowledge structure built along the way — chunks, snakes, the full concept graph.

With that archive on hand, you can export to EPUB, Markdown, or plain text any time without re-running the LLM pipeline. The trade-off: exported formats carry the text but lose the structural data. The chapter topology, snake connections, and knowledge graph live only inside .sdpub. If you might want to re-export later, or browse the book’s structure in a visualization tool, keep the file around.

To open a .sdpub file, use Inkora — a free app built specifically for it, with chapter topology and knowledge graph views.

For the internal layout and parser guidance, see the format spec.

Inputs and Outputs

Format	Input	Output
`.epub`	✓	✓
`.md`	✓	✓
`.txt`	✓	✓
`.sdpub`	✓	✓
`stdin` (txt / md)	✓	—
`stdout`	—	✓

Requirements: Node >=22.12.0 and a supported LLM provider with credentials. .sdpub input does not require LLM access.

Library Usage

SpineDigest also exposes a programmatic API for embedding the pipeline in your own Node or TypeScript code. See Library Usage.

Related Projects

PDF Craft: If your source material is a scanned PDF, PDF Craft can convert it into EPUB or Markdown before you feed it into SpineDigest.
EPUB Translator: If your goal is bilingual reading rather than summarization, EPUB Translator turns an EPUB into a bilingual edition while preserving the original layout.

For AI Agents

SpineDigest’s CLI-first design makes it easy to call directly, with no extra integration code.

Prefer the CLI. Use the programmatic API only when code-level integration is explicitly required.
Use help as the discovery surface. Start with spinedigest --help as the root page, then follow spinedigest help ai, topic pages, or command-specific --help before guessing behavior.
Trust --help. Every command in the CLI exposes usage guidance through --help.
Use explicit paths. Pass --input and --output for deterministic, repeatable runs.
Check exit codes. Success returns 0; failure returns non-zero with a plain-text error on stderr.
stdin is narrow. Only txt and md are accepted, and only in non-interactive flows.
No LLM needed for .sdpub. Re-exporting an archive never calls an LLM provider.
Keep the archive. If the same digest might need re-exporting, treat .sdpub as the intermediate artifact.

Useful help entry points:

spinedigest help ai
spinedigest help task
spinedigest help config
spinedigest help env
spinedigest help config-file
spinedigest help sdpub

Full agent guidance: AI Agent Guide.

相似文章

@HowToAI_: 有人刚刚开发了一款工具，可以将任何 GitHub 仓库转换为交互式知识图谱。只需粘贴任何仓库，然后……

X AI KOLs Timeline

一款新的开源工具将 GitHub 仓库转换为交互式知识图谱，允许用户可视化代码结构，并使用自然语言 AI 代理对其进行查询。

@GitHub_Daily: 独立开发者在国内做软件开发，申请软件著作权几乎是必做的工作之一。但准备材料环节真的折腾，要申请表字段、操作手册、代码截取，格式和信息还得前后一致。偶然看到 SoftwareCopyright-Skill 这个开源技能，可以帮我们在项目…

X AI KOLs Timeline

SoftwareCopyright-Skill 是一个开源工具，旨在帮助独立开发者一键生成软件著作权申请所需的全套材料（包括申请表、操作手册和代码截取），通过本地生成和真实源码抽取确保资料真实性与可控性。

@AIExplorerTim: 有人刚刚开发了一个工具，可以将 PDF 转换为干净、结构化的 Markdown 速度达到 100 页/秒不需要 GPU。不需要 API 成本。没有混乱的解析。只有原始的、可用的数据。它可以轻松处理的内容： • 表格 → 完美提…

X AI KOLs Timeline

OpenDataLoader 是一个开源工具，可将 PDF 转换为结构化的 Markdown 和 JSON，支持 100 页/秒的本地处理速度，无需 GPU 或 API 成本，专为 RAG 管道和 PDF 无障碍自动化设计。

@DivyanshT91162: 谷歌刚刚发布了一款 AI 工具，让阅读庞大的代码库变得像“非法操作”一样高效。它叫做 CodeWiki。粘贴任何 GitHub…

X AI KOLs Timeline

谷歌发布了 CodeWiki，这是一款 AI 工具，可将 GitHub 代码库转化为交互式文档、架构图和聊天机器人，帮助开发人员快速理解庞大的代码库。

@indigox: 强烈推荐 Markdown 专用编辑器 cogito.md！简洁优雅快速，所有项目可以按文件夹组织在左侧导航，能在文件和项目维度调用 Claude Code or Codex 作为 Agent 服务集成，可视化搭建知识库的利器！比 Obs…

X AI KOLs Timeline

cogito.md 是一款简洁优雅的 Markdown 专用编辑器，支持文件夹组织项目，可集成 Claude Code 或 Codex 作为 Agent 服务，适合可视化构建知识库，被认为比 Obsidian 更适合 Agent 工作流。

oomol-lab/spinedigest

SpineDigest

Install

Quick Start

Why We Built This

The .sdpub Format

Inputs and Outputs

Library Usage

Related Projects

For AI Agents

相似文章

@HowToAI_: 有人刚刚开发了一款工具，可以将任何 GitHub 仓库转换为交互式知识图谱。只需粘贴任何仓库，然后……

@AIExplorerTim: 有人刚刚开发了一个工具，可以将 PDF 转换为 干净、结构化的 Markdown 速度达到 100 页/秒 不需要 GPU。 不需要 API 成本。 没有混乱的解析。 只有原始的、可用的数据。 它可以轻松处理的内容： • 表格 → 完美提…

@DivyanshT91162: 谷歌刚刚发布了一款 AI 工具，让阅读庞大的代码库变得像“非法操作”一样高效。它叫做 CodeWiki。粘贴任何 GitHub…

@indigox: 强烈推荐 Markdown 专用编辑器 cogito.md！简洁优雅快速，所有项目可以按文件夹组织在左侧导航，能在文件和项目维度调用 Claude Code or Codex 作为 Agent 服务集成，可视化搭建知识库的利器！比 Obs…

提交意见反馈

The `.sdpub` Format

@AIExplorerTim: 有人刚刚开发了一个工具，可以将 PDF 转换为干净、结构化的 Markdown 速度达到 100 页/秒不需要 GPU。不需要 API 成本。没有混乱的解析。只有原始的、可用的数据。它可以轻松处理的内容： • 表格 → 完美提…