@GitHub_Daily: 最近看到一个开源项目 Flipbook Canvas,挺有意思,能把每张 AI 生成的图变成一棵可以无限点击探索的知识树。 长按图片任意位置,系统会自动识别你点的内容,联网搜索相关资料,然后生成一张全新的详细图解,层层递进。 GitHub…
摘要
Flipbook Canvas 是一个开源项目,能将 AI 生成的图像转换为可无限点击探索的知识树,支持联网搜索、实时生成和离线导出。
查看缓存全文
缓存时间: 2026/06/05 17:18
最近看到一个开源项目 Flipbook Canvas,挺有意思,能把每张 AI 生成的图变成一棵可以无限点击探索的知识树。
长按图片任意位置,系统会自动识别你点的内容,联网搜索相关资料,然后生成一张全新的详细图解,层层递进。
GitHub:http://github.com/imcuttle/flipbook-app…
每张图都带有百科风格的文字标注和说明,图上的文字还能直接框选复制。
生成过程实时可见,甚至可以分享链接让别人同步观看画面逐步呈现。
还能导出成离线静态网站,带语音朗读,不依赖服务器就能浏览。
把静态的 AI 生图变成了动态的探索游戏,适合喜欢用视觉化方式整理知识点或做演示的朋友。
imcuttle/flipbook-app
Source: https://github.com/imcuttle/flipbook-app
🎨 Flipbook Canvas
English · 中文
🔭 Live examples → imcuttle.github.io/flipbook-app
Browse fully-interactive, exported flipbooks right in your browser — click hotspots to drill in, no install needed.
✨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge — one click at a time.
💡 Inspired by and a re-implementation of the product idea behind flipbook.page — credit to the original team for the click-to-explore canvas concept.
A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.
✨ Why this is fun
Most “AI画图” demos stop at one image. This one turns each image into a playable knowledge surface:
- 🖱️ Long-press anywhere on a picture → the model reads what’s under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
- 📚 Encyclopedia-style output — every node ships with a 150–220-char caption and 20–40 in-image labels (place names, dates, numbers…), all OCR’d back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
- 🌳 Infinite tree of canvases — every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.
- ⏳ Watch it think — a node is saved and linkable the instant you click, then its title / caption / scene prompt type out live; share the link and a friend on another device watches the same stream fill in.
📸 Screenshots
Click-to-explore — long-press any region to drill in |
End-to-end pipeline — search → planner → ImageGen → drill-down |
Gallery + canvas — every canvas is persisted, shareable, replayable |
|
🚀 Highlights
- 🖱️ Click-to-explore: long-press (1 s) anywhere on a node’s image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
- ⏳ Live-streaming, linkable generating nodes: the moment you click, the child node is persisted under its final id and its parent hotspot links to it immediately — so it’s shareable / openable on any device while still generating. Its title, caption and image prompt type out live (token-streamed via SSE), the catalog shows a spinner row, and a refresh or cross-device open resumes the stream from the on-disk snapshot. On failure the half-node is auto-deleted.
- 🌫️ Progressive image loading: every PNG gets blur → thumbnail → medium → full variants (sharp). Gallery cards blur-up, the canvas swaps to full-res when ready — no broken-image flashes, fast first paint.
- 🖼️ Portrait & landscape canvases: pick orientation per canvas (mobile portrait viewports default to portrait); filter the gallery by All / Landscape / Portrait with the choice synced to the URL.
- ⚡ Per-node parallelism: up to 4 different spots in parallel per parent
(configurable). Each in-flight click streams a phase chip
(
Inferring label…→Searching the web…→Generating image…) on the hotspot. Hit the cap and the cursor turns into ⌛. - 📖 Encyclopedia register: planner produces 150–220 char captions with 20–40 in-image text fragments — like reading a richly annotated diagram in a children’s encyclopedia. Long captions clamp to 2 lines with a 查看更多 / Show more toggle.
- 🌐 Web-search augmented: a “decide-then-search” gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a 📚 hover badge over the canvas.
- 🔁 Resilient SSE: Last-Event-ID replay + per-job snapshot resume — a dropped connection or page refresh mid-generation reconnects and catches up on everything it missed, including the in-flight typewriter.
- 🎬 Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
- 🔗 Share as preview: any canvas → read-only
?s=<token>URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones. - 📺 Fullscreen casting: ⛶ requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
- 🔤 Selectable in-image text: every label baked into the diagram is OCR’d
with Apple Vision (
zh-Hans+en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth. - 🔊 Voice narration: each node’s title + caption is synthesised to speech with Microsoft Edge neural voices (msedge-tts — free, no API key). Pick a character voice per flipbook from the live Edge catalogue (filtered to the UI language); the picker reads “晓晓 · 女声” instead of raw locale IDs. Switching voices re-narrates the whole book and restarts in-flight playback. Auto-narration is on by default (toggleable) and is bundled into exports so the static site speaks offline too.
- 📱 Mobile responsive: sticky top bar that pins on scroll, single-column gallery, pinch-zoom image lightbox, smaller hotspots and pending bubbles.
🤖 Multimodal × Mainstream LLMs
Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:
| Modality | What it does | Pluggable into |
|---|---|---|
| 📝 Text / JSON LLM | planner, click-label inference, decide-then-search verdict | any chat-completion-style model |
| 🖼️ Image generation | turns a structured prompt into a 2752×1536 annotated diagram with bake-in text labels | OpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider |
| 🌐 Web search | rephrased query → top-N normalized results → planner context + 📚 sources panel | any search backend |
| 👁️ OCR (Apple Vision) | zh-Hans + en-US recognition over every generated PNG, projected as a selectable HTML overlay | local, no API keys needed |
| 🔊 TTS (Edge neural voices) | synthesises each node’s title + caption to an mp3, per-flipbook character voice | Microsoft Edge online voices via msedge-tts, no API key |
The image layer is a provider chain (IMAGE_PROVIDER=...,svg) — first
enabled provider wins, svg is always appended last as a placeholder so the
UI never breaks. Adding a new model is a single file:
// server/src/generation/providers/<name>.js
export default {
name: 'my-model',
enabled(config) { return Boolean(config.MY_API_KEY); },
async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
// call your model, write <hash>.png into outputDir, push phase events
},
};
Out of the box:
| Provider | Trigger to enable | Status |
|---|---|---|
openai | OPENAI_API_KEY set | 🔌 stub — implement in providers/openai.js |
nanobanana | NANOBANANA_API_KEY or GEMINI_API_KEY | 🔌 stub |
seeddance | SEEDDANCE_API_KEY or ARK_API_KEY | 🔌 stub |
codebuddy | ENABLE_CODEBUDDY=1 | ✅ reference impl (used in the demo gif) |
svg | always | ✅ fallback placeholder |
🎯 The reference implementation wires the
codebuddyCLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives inserver/src/codebuddyClient.jsand is a useful template if you ever shell out to any CLI-based model.
🐦 Walkthrough — generating a woodpecker flipbook from zero
Type 啄木鸟 (woodpecker) into the top bar and watch the entire pipeline run:
decide-then-search → planner → ImageGen → click to drill into the tongue
anatomy / nest cavity / ant-foraging zones, each spawning its own annotated
diagram with its own sources.
🗂️ Layout
.
├── prompts/ # system / planner / click-label / image-prompt / decide-search
├── scripts/
│ ├── sync-prompts.mjs
│ ├── serve-preview.mjs # build + serve one canvas's static preview
│ └── example-doc-publish.mjs # publish canvases to GitHub Pages
├── server/
│ └── src/
│ ├── routes/ # canvas, click, events (SSE), assets, share
│ ├── export/ # static-site exporter + viewer template
│ │ ├── buildExport.js # buildCanvasSite / buildCanvasExport (zip)
│ │ └── template/ # self-contained index.html + viewer.js/css
│ ├── lib/zip.js # dependency-free ZIP writer
│ ├── generation/
│ │ ├── pipeline.js # generateRoot + expandFromClick + per-node concurrency
│ │ ├── decideSearch.js # decide-then-search gate
│ │ ├── webSearch.js # WebSearch subprocess + result normaliser
│ │ ├── queue.js # PerCanvasQueue / Semaphore / PerKeySemaphore
│ │ ├── planner.js / clickLabel.js
│ │ ├── image.js # provider-chain orchestrator
│ │ └── providers/ # codebuddy, openai, nanobanana, seeddance, svg
│ ├── db/ # Sequelize models + hydrateFromDisk
│ ├── store/ # filesystem layer
│ ├── sse/ # event hub
│ └── codebuddyClient.js # reference CLI-subprocess wrapper
└── web/ # Vite + React + TS
💾 Storage
- 📁 Filesystem (source of truth for big artifacts):
server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}. - 🗃️ SQLite (
server/data/flipbook.sqlite, via Sequelize): metadata index — Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runshydrateFromDisk()to rebuild this index if it’s missing.
🛠️ Develop
npm install
npm run dev # server on :8787 + Vite on :5173 in parallel
Open http://127.0.0.1:5173.
By default ENABLE_CODEBUDDY=0 (stub mode — fast, SVG placeholders, no LLM).
Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner +
ImageGen + WebSearch:
ENABLE_CODEBUDDY=1 npm run dev:server
⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs). ImageGen produces 2752×1536 PNG (~6 MB).
Per-node parallelism
Up to 4 click expansions per parent node run in parallel; excess clicks
queue. Different parents and different canvases run independently. A
per-parent write lock serializes only the short read-modify-write of the
parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).
🔍 Web search
A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the
LLM with the proposed subject and asks: do recent / authoritative sources
materially improve this node? The default leans yes — only clearly
abstract / timeless subjects skip search. When yes:
- The web-search backend runs with the rephrased query.
- Results are normalised into
{title, url, snippet, source}. - Top results are passed into the planner prompt.
- Sources are persisted both into
nodes/<hash>.jsonand into the SQLiteSourcestable. - The frontend renders a 📚 badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).
📦 Export as a standalone static site
Any canvas can be exported as a fully self-contained static site — a
read-only replica of the preview with all data and images inlined, openable
directly from file:// with zero network requests.
-
In-app: the
···More menu → Export preview downloads a.zip(index.html/viewer.js/viewer.css/data.js+images/). -
Serve one locally for quick viewing in a browser:
npm run serve-preview -- <canvasId> [--lang en] [--port 8088]Builds the static site to a temp dir, starts a tiny static HTTP server, prints the URL. Ctrl-C cleans up.
-
Publish to GitHub Pages (one or more canvases → a routed gallery landing page at
/, each example at/<canvasId>/):npm run example:publish -- <canvasId> [<canvasId> ...] [--lang en] [--no-push]Builds each canvas, regenerates the landing index, and pushes to the
gh-pagesbranch (accumulating — re-publishing a new id keeps the others). → see the result at https://imcuttle.github.io/flipbook-app/.
The exported viewer mirrors the live read-only preview: image stage with collision-avoiding hotspot labels, leader lines, selectable OCR text overlay, caption, breadcrumb, catalog and sources — plus progressive image loading, scene transitions, and next-layer image prefetch. Per-node narration mp3s are bundled too, so the static site auto-narrates offline (toggleable in the top bar). It never calls the server.
🔗 Share / preview links
POST /api/canvas/:id/share→{token, url}. Reuses an existing token for the same canvas.GET /api/share/:token→{canvasId, topic, readOnly:true}.- Frontend: opening
…?s=<token>puts the UI in read-only preview mode — no topic input, no clicks on the image, “👁 Preview” badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.
📺 Fullscreen / casting
⛶button in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn’t supported.👁/🚫button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.- Long-press hint is suppressed in fullscreen by default; the press still works.
🧹 Cleaning local state
npm run clean:data # reset server/data (all canvases)
npm run clean:dist # reset web/dist
npm run clean # both
📦 Build for production
npm run build # builds web/dist
npm start # serves web/dist + API from :8787
🌐 LAN access via a fixed domain (macOS)
Give the app a stable hostname (e.g. http://flipbook.lan) reachable from any
device on your LAN — no port number needed. Uses dnsmasq (resolves the
domain → this machine’s LAN IP) + Caddy (reverse-proxies :80 to the app).
npm run lan:up # flipbook.lan → dev :5173 (preferred), falls back to prod :8787
npm run lan:down # tear it down
# custom: scripts/lan-domain-setup.sh <domain> <devPort> <prodPort>
bash scripts/lan-domain-setup.sh studio.lan 5173 8787
The proxy tries the dev port (5173) first and automatically falls back to
the prod port (8787) when dev isn’t running (passive health check, 3s
blacklist). So npm run dev and npm start both work behind the same domain.
lan:up installs dnsmasq/caddy via Homebrew if missing and needs sudo
(dnsmasq binds 53, Caddy binds 80). It only configures this machine; to
reach the domain from other devices, point their DNS at this machine’s LAN IP
(router DHCP DNS, per-device DNS, or a hosts entry — the script prints the
exact options and your IP).
⚙️ Configuration (env)
| Var | Default | Purpose |
|---|---|---|
PORT | 8787 | server port |
HOST | 127.0.0.1 | server bind |
DATA_DIR | server/data | canvas state on disk |
PROMPTS_DIR | prompts | prompt files |
DB_PATH | <DATA_DIR>/flipbook.sqlite | SQLite file |
MAX_PARALLEL_CLICKS_PER_NODE | 4 | concurrent click expansions per parent |
MAX_PARALLEL_CODEBUDDY | 20 | concurrent planner/LLM subprocesses |
MAX_PARALLEL_IMAGE | 20 | concurrent image-generation jobs (separate pool from the LLM limit) |
PLANNER_TIMEOUT_MS | 90000 | per-call planner timeout |
IMAGE_TIMEOUT_MS | 180000 | per-call ImageGen timeout |
WEB_SEARCH_TIMEOUT_MS | 60000 | per-call WebSearch timeout |
IMAGE_PROVIDER | codebuddy | provider chain (e.g. openai,nanobanana,svg) |
IMAGE_SIZE | 1920x1080 | requested size (provider may pick its own) |
ENABLE_CODEBUDDY | 0 | flip to 1 to enable the reference CLI provider |
ENABLE_WEB_SEARCH | follows ENABLE_CODEBUDDY | force-disable with 0 |
ENABLE_OCR | 1 | run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to 0 to skip |
OCR_TIMEOUT_MS | 25000 | per-call OCR timeout |
OCR_MIN_CONFIDENCE | 0.4 | drop OCR spans below this confidence |
ENABLE_AUDIO | 1 | synthesise Edge neural-voice narration (mp3) for each node; set to 0 to skip. Non-blocking — failures never stop image generation |
AUDIO_TIMEOUT_MS | 30000 | per-call TTS synthesis timeout |
English · 中文
相似文章
@BTCqzy1: 分享一个超实用的开源项目:Next AI Draw io(GitHub 2.8万+) 一句话就能生成复杂架构图! 再也不用手动拖框框画图了!用自然语言跟 AI 聊天,就能瞬间生成专业 draw io 图表: · 系统架构图、RAG 流程、…
一个基于AI的开源图表生成工具,通过自然语言创建 draw.io 图表,支持多模型,GitHub 星数 2.8 万。
@GitHub_Daily: 刚接手一个新项目,面对几十万行代码,光是理清文件之间的调用关系和整体架构,就得花上好几天,效率很低。 于是找到 Understand Anything 这个开源项目,把整个代码库生成一张可交互的知识图谱,直观地看清每个模块之间的关系。 通…
Understand Anything 是一个开源项目,通过多智能体流水线自动分析代码库,生成可交互的知识图谱,帮助开发者快速理清代码结构和模块关系,支持与 Claude Code、Cursor 等主流 AI 编程工具集成。
@Luckyjudy666: 这个名为 Understand-Anything 的开源项目,正成为Github热度榜第一,狂揽2.2万颗星。 它是一个强大的 AI 辅助工具,能够将任何代码库、知识库或文档转化为可交互的、可视化的知识图谱。 1. 功能亮点: 多智能体协…
Understand-Anything 是一个开源的 AI 辅助工具,能将代码库、知识库或文档转化为交互式可视化知识图谱,支持多智能体协作与主流 AI 工具集成,已在 GitHub 获得 2.2 万星。
@GitHub_Daily: 在 GitHub 上发现一个开源的学习工具:Get It,可帮助我们通过多种方式深度学习 PDF 文件内容。 自动在 PDF 文件上标注关键概念,还可转化为 3D 模型、动画演示、公式推导等可视化内容,同时生成一张知识图谱。 GitHub…
Get It 是一个开源学习工具,能够自动标注 PDF 中的关键概念并将其转化为 3D 模型、动画等可视化内容,同时生成知识图谱,支持对话问答、闪卡记忆等学习方法。
@GitHub_Daily: GitHub 上 SpineDigest 这个开源工具,能把整本书提炼成结构化的精华内容,而且可以按自己的阅读目的来决定保留什么。 它的处理思路挺有意思的,先让 AI 逐章提取关键知识点,再用算法构建知识图谱把相关概念串联起来。 最后通过…
SpineDigest 是一个开源 CLI 工具,利用多阶段 AI 管道将长书提炼为结构化精华,生成章节拓扑图和知识图谱,并配合 Inkora 阅读器展示。