@veyhon: 把一本技术书编译成 Claude Code / Amp 可加载的 Skill,先抽框架、原则和反模式,再用章节文件按需进上下文。 https://github.com/virgiliojr94/book-to-skill… book-to…

X AI KOLs Timeline 工具

摘要

book-to-skill 是一个开源工具,能将技术书籍(PDF、EPUB、DOCX 等)编译成 Claude Code 或 Amp 可加载的 Skill,自动提取框架、原则和反模式,并生成分章节的上下文文件。

把一本技术书编译成 Claude Code / Amp 可加载的 Skill,先抽框架、原则和反模式,再用章节文件按需进上下文。 https://github.com/virgiliojr94/book-to-skill… book-to-skill 先按文档类型走不同提取链:技术 PDF 用 Docling 保留表格和代码块,文本 PDF 走 pdftotext → PyPDF2 → pdfminer,EPUB / DOCX / HTML / RTF / MOBI 也有对应 fallback。提取结果写入 full_text.txt 和 metadata.json,再由 Claude 识别书名、作者、章节、目录和核心主题。 生成物落到 Amp 或 Claude Code 的 skills 目录:SKILL.md 放核心 mental models 和 topic index,chapters/ 每章控制在 800–1,200 tokens,glossary.md、patterns.md、cheatsheet.md 分别存术语、方法和速查表。大书用 grep / sed 按章节切片读取,避免把整本书反复塞进上下文。
查看原文
查看缓存全文

缓存时间: 2026/05/25 00:39

把一本技术书编译成 Claude Code / Amp 可加载的 Skill,先抽框架、原则和反模式,再用章节文件按需进上下文。

https://github.com/virgiliojr94/book-to-skill…

book-to-skill 先按文档类型走不同提取链:技术 PDF 用 Docling 保留表格和代码块,文本 PDF 走 pdftotext → PyPDF2 → pdfminer,EPUB / DOCX / HTML / RTF / MOBI 也有对应 fallback。提取结果写入 full_text.txt 和 metadata.json,再由 Claude 识别书名、作者、章节、目录和核心主题。

生成物落到 Amp 或 Claude Code 的 skills 目录:SKILL.md 放核心 mental models 和 topic index,chapters/ 每章控制在 800–1,200 tokens,glossary.md、patterns.md、cheatsheet.md 分别存术语、方法和速查表。大书用 grep / sed 按章节切片读取,避免把整本书反复塞进上下文。


virgiliojr94/book-to-skill

Source: https://github.com/virgiliojr94/book-to-skill

📚 book-to-skill

Turn any technical book or document into a Claude Code skill — ready to study, reference, and use while you work.

Claude Code Skill Formats supported Effort: high MIT License

Why · What it generates · Usage · Requirements · How it works · FAQ · Install


🤔 Why

You buy a great technical book. You read it once. Three months later you can’t remember chapter 7 existed.

The usual workarounds don’t help:

  • 📄 “Let me just search the PDF” → you get a list of pages, not answers
  • 🧠 “I’ll ask Claude about this book” → it either hallucinates or says it doesn’t have the content
  • 📝 “I’ll take notes as I read” → you end up with a 200-line doc you never open again

book-to-skill solves this by turning the book into a structured skill Claude loads on demand.

Once installed, you just type /your-book-slug replication and Claude reads the right chapter and answers from the actual content. No hallucination. No digging through PDFs. The book becomes part of your workflow.


📦 What it generates

Running /book-to-skill your-book.pdf (or .epub) creates a full skill at ~/.claude/skills/<slug>/:

FilePurposeSize
SKILL.mdCore mental models + chapter index~4,000 tokens
chapters/ch01-*.mdOne file per chapter, loaded on-demand~1,000 tokens each
glossary.mdEvery key term, alphabetically sorted with chapter refs~1,500 tokens
patterns.mdAll techniques, algorithms, and design patterns~2,000 tokens
cheatsheet.mdDecision tables and quick-reference rules~1,000 tokens

Chapter files are loaded on-demand — they don’t count against the skill budget until you ask about that topic.


🚀 Usage

/book-to-skill <path-to-document> [skill-name-slug]

Supported document formats: PDF, EPUB, DOCX, TXT, Markdown, reStructuredText, AsciiDoc, HTML, RTF, MOBI/AZW/AZW3.

Examples:

# PDF — derive skill name from filename
/book-to-skill ~/Downloads/designing-data-intensive-applications.pdf

# EPUB — specify a custom slug
/book-to-skill ~/books/clean-code.epub clean-code

# Full path with explicit name
/book-to-skill /tmp/ddd-evans.pdf domain-driven-design

After the skill is created, use it like any other Claude Code skill:

/designing-data-intensive-apps                  # load core mental models
/designing-data-intensive-apps replication      # find and explain a topic
/designing-data-intensive-apps ch05             # dive into chapter 5
/designing-data-intensive-apps "what chapters do you have?"

🔧 Requirements

The extractor tries tools in order per format and uses the first available. If nothing is installed, it tells you which command to run. Plain text, Markdown, reStructuredText and AsciiDoc need no extra deps.

PDF — choose by book type:

Book typeToolInstallSpeed
Text-heavy (prose, few tables)pdftotext (poppler)sudo apt install poppler-utils⚡ instant
Text-heavy fallbackPyPDF2pip3 install PyPDF2⚡ instant
Text-heavy fallbackpdfminer.sixpip3 install pdfminer.six⚡ instant
Technical (code, tables, formulas)doclingpip3 install docling~1.5s/page

Before extraction begins, the skill asks you whether the book is technical or text-heavy and picks the right tool automatically. Docling preserves markdown tables and code blocks; pdftotext is faster for prose-only books.

EPUB:

ToolInstallQuality
ebooklib + beautifulsoup4pip3 install ebooklib beautifulsoup4⭐⭐⭐ Best
stdlib zipfilebuilt-in — no install needed⭐⭐ Always available

Other formats:

FormatToolInstall
DOCXpython-docx (fallback: stdlib ZIP/XML)pip3 install python-docx
HTMLbeautifulsoup4 (fallback: stdlib html.parser)pip3 install beautifulsoup4
RTFstriprtf (fallback: regex)pip3 install striprtf
MOBI / AZW / AZW3Calibre ebook-convert (external app, not pip)https://calibre-ebook.com/download
TXT / Markdown / reStructuredText / AsciiDocbuilt-in

⚙️ How it works

PDF or EPUB
     │
     ▼
Step 1.5 — "Technical or text-heavy book?"
     │
     ├── technical → Docling  (tables + code blocks as markdown, ~1.5s/page)
     └── text      → pdftotext → PyPDF2 → pdfminer  (instant)
     │
     ▼
scripts/extract.py --mode <technical|text>
  EPUB → ebooklib → stdlib zipfile
     │
     ├── /tmp/book_skill_work/full_text.txt
     └── /tmp/book_skill_work/metadata.json
               │
               ▼
          Claude analyzes structure
          (title, author, chapters, ToC)
               │
               ▼
          Generates per-chapter summaries  (800–1,200 tokens each)
          technical → includes Code Examples + Reference Tables sections
          Generates glossary, patterns, cheatsheet
          Generates master SKILL.md with core mental models
               │
               ▼
          ~/.claude/skills/<slug>/  ✅ written
          /tmp/book_skill_work/     🗑️  cleaned up

Extraction benchmark (103-page technical book, CPU only):

MethodTimeTokensTablesCode blocks
pdftotext0.1s27K00
Docling164s27K (+1.2%)4836
Design principles (click to expand)
  1. Density over completeness — a 1,000-token summary beats a 10,000-token excerpt
  2. Practitioner voice — “Use X when Y”, not “The book explains X”
  3. Front-loaded SKILL.md — compaction keeps the first ~5,000 tokens; the most important content comes first
  4. On-demand chapters — the topic index tells Claude which file to read; chapters load only when needed
  5. Never raw text — always synthesize, summarize, extract signal from the source

❓ FAQ

“Can’t I just dump the PDF/EPUB into my Claude project context?”

You can — but every conversation will burn that token budget upfront. A 400-page book is ~200K tokens. With a skill, only the chapters relevant to your question load. The rest stays on disk until you need it.

More importantly: raw text injection is retrieval. A skill is reasoning. When you load a chapter file, Claude isn’t searching for keyword matches — it’s working with pre-extracted named frameworks, principles, and mental models structured for application, not for reading.


“Isn’t this just RAG?”

RAG works at query time: chunk the book → embed everything → find similar vectors → inject into prompt. It’s optimized for “find me the part that talks about X.”

book-to-skill works at compile time: one deep analysis run extracts the author’s actual frameworks, names them, describes when to use each, captures the anti-patterns. The output is structure the author spent years building — not a similarity search over their sentences.

RAG answers: “here are chunks close to your query.”
A skill answers: “here are the 12 frameworks this author built, ready to reason with.”

For searching across 50+ books, RAG wins. For going deep on one book and using its frameworks while you work, a skill wins.


“Popular books are already in Claude’s training data. Why bother?”

For widely-known books (Clean Code, DDIA, Pragmatic Programmer), Claude has general knowledge — but it’s compressed, averaged across the entire internet’s discussion of the book, and may hallucinate specific quotes or chapter locations.

book-to-skill works from your actual copy. Every framework name, every anti-pattern list, every chapter number is grounded in the text you provided. No training data drift, no hallucinated chapter titles.

It also shines for books Claude doesn’t know at all: niche technical references, internal company documentation, recent publications, translated works.


“NotebookLM handles multiple books better.”

Absolutely true — if your workflow is “I have 80 books and I want to search across all of them,” NotebookLM is the right tool.

book-to-skill is built for a different job: you want to go deep on one book and have its frameworks embedded in your coding or writing workflow, not in a separate browser tab. It’s less “library search” and more “the author is sitting next to you while you work.”


📥 Install

Copy this into your Claude Code session:

Install book-to-skill: https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md

Or manually:

mkdir -p ~/.claude/skills/book-to-skill/scripts

curl -o ~/.claude/skills/book-to-skill/SKILL.md \
  https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md

curl -o ~/.claude/skills/book-to-skill/scripts/extract.py \
  https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/scripts/extract.py

Then in any Claude Code session:

/book-to-skill ~/path/to/your-book.pdf
# or
/book-to-skill ~/path/to/your-book.epub

📁 Repository structure

book-to-skill/
├── SKILL.md              # Skill definition + step-by-step instructions
├── scripts/
│   └── extract.py        # PDF + EPUB extraction (pdftotext / PyPDF2 / pdfminer / ebooklib / zipfile)
└── README.md             # This file

License

MIT

Star History

Star History Chart

相似文章

@NFTCPS: 书虫们注意了!你家那堆买来吃灰的技术书,终于有救了。 GitHub上一个叫book-to-skill的开源神器刚火,狂揽2700多Star,玩法是真离谱: PDF或EPUB直接丢进去 自动扒目录、核心概念和套路,一键生成技能 以后敲一句 …

X AI KOLs Timeline

GitHub上开源的 book-to-skill 工具可将 PDF/EPUB 技术书转化为 Claude Code 技能,一键生成目录、核心概念和模式,让吃灰的书变身为随叫随到的私人顾问。

@sitinme: 不“让 AI 总结一本书”,而是更进一步:把一本书、一个文档包,整理成 AI Agent 可以反复调用的 Skill,这个思路感觉可以聊一聊。 之前书买了、读了,过一阵想找里面某个知识点,翻半天找不到;问 AI 吧,它可能瞎编;把整本 P…

X AI KOLs Timeline

介绍了一个将书籍或文档包转换为AI Agent可调用Skill的工具book-to-skill,支持PDF等格式,生成SKILL.md和章节索引,避免一次性加载全部上下文。