@GitHub_Daily: 最近看到一个开源项目 Flipbook Canvas,挺有意思,能把每张 AI 生成的图变成一棵可以无限点击探索的知识树。 长按图片任意位置,系统会自动识别你点的内容,联网搜索相关资料,然后生成一张全新的详细图解,层层递进。 GitHub…

X AI KOLs Timeline 工具

摘要

Flipbook Canvas 是一个开源项目,能将 AI 生成的图像转换为可无限点击探索的知识树,支持联网搜索、实时生成和离线导出。

最近看到一个开源项目 Flipbook Canvas,挺有意思,能把每张 AI 生成的图变成一棵可以无限点击探索的知识树。 长按图片任意位置,系统会自动识别你点的内容,联网搜索相关资料,然后生成一张全新的详细图解,层层递进。 GitHub:http://github.com/imcuttle/flipbook-app… 每张图都带有百科风格的文字标注和说明,图上的文字还能直接框选复制。 生成过程实时可见,甚至可以分享链接让别人同步观看画面逐步呈现。 还能导出成离线静态网站,带语音朗读,不依赖服务器就能浏览。 把静态的 AI 生图变成了动态的探索游戏,适合喜欢用视觉化方式整理知识点或做演示的朋友。
查看原文
查看缓存全文

缓存时间: 2026/06/05 17:18

最近看到一个开源项目 Flipbook Canvas,挺有意思,能把每张 AI 生成的图变成一棵可以无限点击探索的知识树。

长按图片任意位置,系统会自动识别你点的内容,联网搜索相关资料,然后生成一张全新的详细图解,层层递进。

GitHub:http://github.com/imcuttle/flipbook-app…

每张图都带有百科风格的文字标注和说明,图上的文字还能直接框选复制。

生成过程实时可见,甚至可以分享链接让别人同步观看画面逐步呈现。

还能导出成离线静态网站,带语音朗读,不依赖服务器就能浏览。

把静态的 AI 生图变成了动态的探索游戏,适合喜欢用视觉化方式整理知识点或做演示的朋友。


imcuttle/flipbook-app

Source: https://github.com/imcuttle/flipbook-app

🎨 Flipbook Canvas

English · 中文

Node React Vite Express TypeScript SQLite Multimodal PRs Welcome GitHub stars

🔭 Live examples → imcuttle.github.io/flipbook-app

Browse fully-interactive, exported flipbooks right in your browser — click hotspots to drill in, no install needed.

✨ Click anywhere on a generated image. The backend infers what you clicked, searches the web when useful, generates a child diagram, and links it back. A flipbook of explorable knowledge — one click at a time.

💡 Inspired by and a re-implementation of the product idea behind flipbook.page — credit to the original team for the click-to-explore canvas concept.

A long-running web product: Express + SSE backend, Vite + React + TS frontend, a pluggable multi-model image pipeline, web-search augmented planning, per-node concurrency, read-only share links, fullscreen casting and a fully responsive mobile layout.


✨ Why this is fun

Most “AI画图” demos stop at one image. This one turns each image into a playable knowledge surface:

  • 🖱️ Long-press anywhere on a picture → the model reads what’s under your finger, decides whether the topic needs fresh sources, optionally hits the web, then paints a brand new annotated diagram zoomed into that concept.
  • 📚 Encyclopedia-style output — every node ships with a 150–220-char caption and 20–40 in-image labels (place names, dates, numbers…), all OCR’d back into a transparent text layer so you can drag-select and copy any fragment straight off the picture.
  • 🌳 Infinite tree of canvases — every click spawns a child node; the whole exploration tree is persisted, shareable, and replayable.
  • Watch it think — a node is saved and linkable the instant you click, then its title / caption / scene prompt type out live; share the link and a friend on another device watches the same stream fill in.

📸 Screenshots

Click-to-explore demo
Click-to-explore — long-press any region to drill in
Woodpecker walkthrough
End-to-end pipeline — search → planner → ImageGen → drill-down
Gallery and canvas
Gallery + canvas — every canvas is persisted, shareable, replayable

🚀 Highlights

  • 🖱️ Click-to-explore: long-press (1 s) anywhere on a node’s image. The backend infers the label, decides whether to web-search, then generates a child node. Spatial + semantic dedup means clicking the same region again jumps straight in.
  • Live-streaming, linkable generating nodes: the moment you click, the child node is persisted under its final id and its parent hotspot links to it immediately — so it’s shareable / openable on any device while still generating. Its title, caption and image prompt type out live (token-streamed via SSE), the catalog shows a spinner row, and a refresh or cross-device open resumes the stream from the on-disk snapshot. On failure the half-node is auto-deleted.
  • 🌫️ Progressive image loading: every PNG gets blur → thumbnail → medium → full variants (sharp). Gallery cards blur-up, the canvas swaps to full-res when ready — no broken-image flashes, fast first paint.
  • 🖼️ Portrait & landscape canvases: pick orientation per canvas (mobile portrait viewports default to portrait); filter the gallery by All / Landscape / Portrait with the choice synced to the URL.
  • Per-node parallelism: up to 4 different spots in parallel per parent (configurable). Each in-flight click streams a phase chip (Inferring label…Searching the web…Generating image…) on the hotspot. Hit the cap and the cursor turns into ⌛.
  • 📖 Encyclopedia register: planner produces 150–220 char captions with 20–40 in-image text fragments — like reading a richly annotated diagram in a children’s encyclopedia. Long captions clamp to 2 lines with a 查看更多 / Show more toggle.
  • 🌐 Web-search augmented: a “decide-then-search” gate asks the LLM whether a topic benefits from up-to-date sources. When yes, results are fetched and fed into the planner; sources are persisted to disk + DB and rendered as a 📚 hover badge over the canvas.
  • 🔁 Resilient SSE: Last-Event-ID replay + per-job snapshot resume — a dropped connection or page refresh mid-generation reconnects and catches up on everything it missed, including the in-flight typewriter.
  • 🎬 Scene transitions: drill-in / drill-out / fade animations make navigation feel like a zooming flipbook rather than a page swap.
  • 🔗 Share as preview: any canvas → read-only ?s=<token> URL. Viewers can navigate and watch live SSE updates from in-flight generations, but cannot trigger new ones.
  • 📺 Fullscreen casting: ⛶ requests browser fullscreen; toggle the chrome (breadcrumb + caption + hint) on/off for a clean projection view.
  • 🔤 Selectable in-image text: every label baked into the diagram is OCR’d with Apple Vision (zh-Hans + en-US) and overlaid as invisible HTML, so users can drag-select and Cmd-C copy any text directly off the picture while the painted pixels remain the visual ground truth.
  • 🔊 Voice narration: each node’s title + caption is synthesised to speech with Microsoft Edge neural voices (msedge-tts — free, no API key). Pick a character voice per flipbook from the live Edge catalogue (filtered to the UI language); the picker reads “晓晓 · 女声” instead of raw locale IDs. Switching voices re-narrates the whole book and restarts in-flight playback. Auto-narration is on by default (toggleable) and is bundled into exports so the static site speaks offline too.
  • 📱 Mobile responsive: sticky top bar that pins on scroll, single-column gallery, pinch-zoom image lightbox, smaller hotspots and pending bubbles.

🤖 Multimodal × Mainstream LLMs

Flipbook Canvas is built around a pluggable multimodal pipeline. Three modalities are wired end-to-end:

ModalityWhat it doesPluggable into
📝 Text / JSON LLMplanner, click-label inference, decide-then-search verdictany chat-completion-style model
🖼️ Image generationturns a structured prompt into a 2752×1536 annotated diagram with bake-in text labelsOpenAI, Nano Banana (Gemini), Seedream/Seeddance, or your own provider
🌐 Web searchrephrased query → top-N normalized results → planner context + 📚 sources panelany search backend
👁️ OCR (Apple Vision)zh-Hans + en-US recognition over every generated PNG, projected as a selectable HTML overlaylocal, no API keys needed
🔊 TTS (Edge neural voices)synthesises each node’s title + caption to an mp3, per-flipbook character voiceMicrosoft Edge online voices via msedge-tts, no API key

The image layer is a provider chain (IMAGE_PROVIDER=...,svg) — first enabled provider wins, svg is always appended last as a placeholder so the UI never breaks. Adding a new model is a single file:

// server/src/generation/providers/<name>.js
export default {
  name: 'my-model',
  enabled(config) { return Boolean(config.MY_API_KEY); },
  async generate({ imagePrompt, outputDir, size, title, hash, onEvent }) {
    // call your model, write <hash>.png into outputDir, push phase events
  },
};

Out of the box:

ProviderTrigger to enableStatus
openaiOPENAI_API_KEY set🔌 stub — implement in providers/openai.js
nanobananaNANOBANANA_API_KEY or GEMINI_API_KEY🔌 stub
seeddanceSEEDDANCE_API_KEY or ARK_API_KEY🔌 stub
codebuddyENABLE_CODEBUDDY=1✅ reference impl (used in the demo gif)
svgalways✅ fallback placeholder

🎯 The reference implementation wires the codebuddy CLI as a subprocess driver for planner / ImageGen / WebSearch. Subprocess lifecycle (concurrency cap, per-call timeouts, single retry, file-size sanity check on generated PNGs, graceful degradation) lives in server/src/codebuddyClient.js and is a useful template if you ever shell out to any CLI-based model.


🐦 Walkthrough — generating a woodpecker flipbook from zero

Type 啄木鸟 (woodpecker) into the top bar and watch the entire pipeline run: decide-then-search → planner → ImageGen → click to drill into the tongue anatomy / nest cavity / ant-foraging zones, each spawning its own annotated diagram with its own sources.


🗂️ Layout

.
├── prompts/                        # system / planner / click-label / image-prompt / decide-search
├── scripts/
│   ├── sync-prompts.mjs
│   ├── serve-preview.mjs           # build + serve one canvas's static preview
│   └── example-doc-publish.mjs     # publish canvases to GitHub Pages
├── server/
│   └── src/
│       ├── routes/                 # canvas, click, events (SSE), assets, share
│       ├── export/                 # static-site exporter + viewer template
│       │   ├── buildExport.js      # buildCanvasSite / buildCanvasExport (zip)
│       │   └── template/           # self-contained index.html + viewer.js/css
│       ├── lib/zip.js              # dependency-free ZIP writer
│       ├── generation/
│       │   ├── pipeline.js         # generateRoot + expandFromClick + per-node concurrency
│       │   ├── decideSearch.js     # decide-then-search gate
│       │   ├── webSearch.js        # WebSearch subprocess + result normaliser
│       │   ├── queue.js            # PerCanvasQueue / Semaphore / PerKeySemaphore
│       │   ├── planner.js / clickLabel.js
│       │   ├── image.js            # provider-chain orchestrator
│       │   └── providers/          # codebuddy, openai, nanobanana, seeddance, svg
│       ├── db/                     # Sequelize models + hydrateFromDisk
│       ├── store/                  # filesystem layer
│       ├── sse/                    # event hub
│       └── codebuddyClient.js      # reference CLI-subprocess wrapper
└── web/                            # Vite + React + TS

💾 Storage

  • 📁 Filesystem (source of truth for big artifacts): server/data/canvases/<id>/{data/tree.json, data/nodes/<hash>.json, images/<hash>.{png,svg}, manifest.json}.
  • 🗃️ SQLite (server/data/flipbook.sqlite, via Sequelize): metadata index — Canvases / Nodes / Hotspots / ShareLinks / Sources tables. Drives the gallery, spatial dedup, share lookup, and sources hover panel. On boot the server runs hydrateFromDisk() to rebuild this index if it’s missing.

🛠️ Develop

npm install
npm run dev           # server on :8787 + Vite on :5173 in parallel

Open http://127.0.0.1:5173.

By default ENABLE_CODEBUDDY=0 (stub mode — fast, SVG placeholders, no LLM). Set ENABLE_CODEBUDDY=1 to use the reference CLI provider for planner + ImageGen + WebSearch:

ENABLE_CODEBUDDY=1 npm run dev:server

⏱️ With the reference provider, each node takes ~70–95 s end-to-end (planner ~25 s + ImageGen ~50–60 s including cold start; +5–15 s if web search runs). ImageGen produces 2752×1536 PNG (~6 MB).

Per-node parallelism

Up to 4 click expansions per parent node run in parallel; excess clicks queue. Different parents and different canvases run independently. A per-parent write lock serializes only the short read-modify-write of the parent node JSON. Tunable via MAX_PARALLEL_CLICKS_PER_NODE (default 4).

🔍 Web search

A pre-planner gate (decideSearch.js + prompts/decide-search.md) calls the LLM with the proposed subject and asks: do recent / authoritative sources materially improve this node? The default leans yes — only clearly abstract / timeless subjects skip search. When yes:

  1. The web-search backend runs with the rephrased query.
  2. Results are normalised into {title, url, snippet, source}.
  3. Top results are passed into the planner prompt.
  4. Sources are persisted both into nodes/<hash>.json and into the SQLite Sources table.
  5. The frontend renders a 📚 badge near the breadcrumb. Hover to see a popover with the source list (220 ms grace period so the popover is reachable with the mouse).

📦 Export as a standalone static site

Any canvas can be exported as a fully self-contained static site — a read-only replica of the preview with all data and images inlined, openable directly from file:// with zero network requests.

  • In-app: the ··· More menu → Export preview downloads a .zip (index.html / viewer.js / viewer.css / data.js + images/).

  • Serve one locally for quick viewing in a browser:

    npm run serve-preview -- <canvasId> [--lang en] [--port 8088]
    

    Builds the static site to a temp dir, starts a tiny static HTTP server, prints the URL. Ctrl-C cleans up.

  • Publish to GitHub Pages (one or more canvases → a routed gallery landing page at /, each example at /<canvasId>/):

    npm run example:publish -- <canvasId> [<canvasId> ...] [--lang en] [--no-push]
    

    Builds each canvas, regenerates the landing index, and pushes to the gh-pages branch (accumulating — re-publishing a new id keeps the others). → see the result at https://imcuttle.github.io/flipbook-app/.

The exported viewer mirrors the live read-only preview: image stage with collision-avoiding hotspot labels, leader lines, selectable OCR text overlay, caption, breadcrumb, catalog and sources — plus progressive image loading, scene transitions, and next-layer image prefetch. Per-node narration mp3s are bundled too, so the static site auto-narrates offline (toggleable in the top bar). It never calls the server.

🔗 Share / preview links

  • POST /api/canvas/:id/share{token, url}. Reuses an existing token for the same canvas.
  • GET /api/share/:token{canvasId, topic, readOnly:true}.
  • Frontend: opening …?s=<token> puts the UI in read-only preview mode — no topic input, no clicks on the image, “👁 Preview” badge in the corner. SSE stays connected, so a viewer watching mid-generation sees images stream in real-time.

📺 Fullscreen / casting

  • button in TopBar requests browser fullscreen; uses CSS-only fullscreen on iOS Safari where the API isn’t supported.
  • 👁 / 🚫 button (visible while in fullscreen) toggles the breadcrumb + caption + hint. Useful for clean projection.
  • Long-press hint is suppressed in fullscreen by default; the press still works.

🧹 Cleaning local state

npm run clean:data    # reset server/data (all canvases)
npm run clean:dist    # reset web/dist
npm run clean         # both

📦 Build for production

npm run build         # builds web/dist
npm start             # serves web/dist + API from :8787

🌐 LAN access via a fixed domain (macOS)

Give the app a stable hostname (e.g. http://flipbook.lan) reachable from any device on your LAN — no port number needed. Uses dnsmasq (resolves the domain → this machine’s LAN IP) + Caddy (reverse-proxies :80 to the app).

npm run lan:up        # flipbook.lan → dev :5173 (preferred), falls back to prod :8787
npm run lan:down      # tear it down

# custom: scripts/lan-domain-setup.sh <domain> <devPort> <prodPort>
bash scripts/lan-domain-setup.sh studio.lan 5173 8787

The proxy tries the dev port (5173) first and automatically falls back to the prod port (8787) when dev isn’t running (passive health check, 3s blacklist). So npm run dev and npm start both work behind the same domain.

lan:up installs dnsmasq/caddy via Homebrew if missing and needs sudo (dnsmasq binds 53, Caddy binds 80). It only configures this machine; to reach the domain from other devices, point their DNS at this machine’s LAN IP (router DHCP DNS, per-device DNS, or a hosts entry — the script prints the exact options and your IP).

⚙️ Configuration (env)

VarDefaultPurpose
PORT8787server port
HOST127.0.0.1server bind
DATA_DIRserver/datacanvas state on disk
PROMPTS_DIRpromptsprompt files
DB_PATH<DATA_DIR>/flipbook.sqliteSQLite file
MAX_PARALLEL_CLICKS_PER_NODE4concurrent click expansions per parent
MAX_PARALLEL_CODEBUDDY20concurrent planner/LLM subprocesses
MAX_PARALLEL_IMAGE20concurrent image-generation jobs (separate pool from the LLM limit)
PLANNER_TIMEOUT_MS90000per-call planner timeout
IMAGE_TIMEOUT_MS180000per-call ImageGen timeout
WEB_SEARCH_TIMEOUT_MS60000per-call WebSearch timeout
IMAGE_PROVIDERcodebuddyprovider chain (e.g. openai,nanobanana,svg)
IMAGE_SIZE1920x1080requested size (provider may pick its own)
ENABLE_CODEBUDDY0flip to 1 to enable the reference CLI provider
ENABLE_WEB_SEARCHfollows ENABLE_CODEBUDDYforce-disable with 0
ENABLE_OCR1run Apple Vision OCR on each generated PNG to produce a selectable text overlay; set to 0 to skip
OCR_TIMEOUT_MS25000per-call OCR timeout
OCR_MIN_CONFIDENCE0.4drop OCR spans below this confidence
ENABLE_AUDIO1synthesise Edge neural-voice narration (mp3) for each node; set to 0 to skip. Non-blocking — failures never stop image generation
AUDIO_TIMEOUT_MS30000per-call TTS synthesis timeout

English · 中文

相似文章

@GitHub_Daily: 刚接手一个新项目,面对几十万行代码,光是理清文件之间的调用关系和整体架构,就得花上好几天,效率很低。 于是找到 Understand Anything 这个开源项目,把整个代码库生成一张可交互的知识图谱,直观地看清每个模块之间的关系。 通…

X AI KOLs Timeline

Understand Anything 是一个开源项目,通过多智能体流水线自动分析代码库,生成可交互的知识图谱,帮助开发者快速理清代码结构和模块关系,支持与 Claude Code、Cursor 等主流 AI 编程工具集成。

@Luckyjudy666: 这个名为 Understand-Anything 的开源项目,正成为Github热度榜第一,狂揽2.2万颗星。 它是一个强大的 AI 辅助工具,能够将任何代码库、知识库或文档转化为可交互的、可视化的知识图谱。 1. 功能亮点: 多智能体协…

X AI KOLs Timeline

Understand-Anything 是一个开源的 AI 辅助工具,能将代码库、知识库或文档转化为交互式可视化知识图谱,支持多智能体协作与主流 AI 工具集成,已在 GitHub 获得 2.2 万星。

@GitHub_Daily: 在 GitHub 上发现一个开源的学习工具:Get It,可帮助我们通过多种方式深度学习 PDF 文件内容。 自动在 PDF 文件上标注关键概念,还可转化为 3D 模型、动画演示、公式推导等可视化内容,同时生成一张知识图谱。 GitHub…

X AI KOLs Timeline

Get It 是一个开源学习工具,能够自动标注 PDF 中的关键概念并将其转化为 3D 模型、动画等可视化内容,同时生成知识图谱,支持对话问答、闪卡记忆等学习方法。

@GitHub_Daily: GitHub 上 SpineDigest 这个开源工具,能把整本书提炼成结构化的精华内容,而且可以按自己的阅读目的来决定保留什么。 它的处理思路挺有意思的,先让 AI 逐章提取关键知识点,再用算法构建知识图谱把相关概念串联起来。 最后通过…

X AI KOLs Timeline

SpineDigest 是一个开源 CLI 工具,利用多阶段 AI 管道将长书提炼为结构化精华,生成章节拓扑图和知识图谱,并配合 Inkora 阅读器展示。