@geekbb: Go 编写的 CLI 工具,集成了 Web 搜索(Brave/DDG/SearXNG/Exa)、代码搜索(Grep/Sourcegraph/GitHub)和库文档查询(Context7)三种搜索能力,同时支持网页抓取和站点爬取。 为 AI…
摘要
一个用 Go 编写的极速、无状态的 CLI 工具,集成 Web 搜索、代码搜索和库文档查询功能,支持网页抓取和站点爬取,专为 AI Agent 和终端使用设计。
查看缓存全文
缓存时间: 2026/06/30 15:43
Go 编写的 CLI 工具,集成了 Web 搜索(Brave/DDG/SearXNG/Exa)、代码搜索(Grep/Sourcegraph/GitHub)和库文档查询(Context7)三种搜索能力,同时支持网页抓取和站点爬取。
为 AI Agent 提供极速、无状态的 Web 搜索/代码搜索/库文档与网页抓取终端工具
https://t.co/6Ouv4AfSM4
1broseidon/ketch
Source: https://github.com/1broseidon/ketch
ketch
Fast, stateless CLI for web search, code search, library docs, and scraping. Three search surfaces (web, code, docs), one binary, no daemon. Designed to be called by AI agents or directly from your terminal.
Install
Homebrew:
brew install 1broseidon/tap/ketch
Or with Go:
go install github.com/1broseidon/ketch@latest
Or grab a binary from releases.
Quick start
# Search the web
ketch search "golang error handling"
# Search and fetch full content from each result
ketch search "golang error handling" --scrape
# Search real OSS code (Grep by default, or Sourcegraph / GitHub)
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex
ketch code "rate limit middleware" --lang go -b github
# Search library docs (Context7)
ketch docs "how to render with word wrap" --library /charmbracelet/glamour
# Scrape a URL to clean markdown
ketch scrape https://go.dev/doc/effective_go
# Scrape multiple URLs concurrently
ketch scrape https://example.com https://go.dev
# Crawl a site
ketch crawl https://example.com --depth 2
# Crawl a sitemap in the background
ketch crawl https://example.com/sitemap.xml --sitemap --background
# JSON output for piping
ketch search "query" --json
ketch code "query" --json
ketch scrape https://example.com --json
Commands
| Command | What it does |
|---|---|
search | Web search via Brave, DuckDuckGo, SearXNG, or Exa, optional --scrape for full content |
code | Code search across OSS via Grep (default), Sourcegraph, or GitHub Code Search |
docs | Library/framework docs via Context7 (curated, version-aware snippets) |
scrape | Fetch URLs and extract clean markdown, concurrent batch support |
crawl | BFS or sitemap crawl with background execution and status tracking |
browser | Manage headless Chrome for JS-rendered pages |
config | Show effective configuration and available backends as JSON |
cache | Show cache stats or clear cached pages |
version | Print version, commit, and build date |
All commands support --json for structured output. --json is the only global flag; -b/--backend is local to search, code, and docs.
Browser rendering
JS-rendered pages (React SPAs, Salesforce Lightning, etc.) are automatically detected and re-fetched via headless Chrome. No extra setup if Chrome is already installed:
# Point ketch to your Chrome installation
ketch config set browser chrome
# Or install Chromium to ketch's cache dir
ketch browser install
# Check browser status
ketch browser status
Once configured, browser rendering is transparent — ketch scrape and ketch crawl automatically detect JS-rendered pages and use the browser when needed. Static pages are always fetched via plain HTTP (fast path).
Detection (extract/detect.go) covers both classic SPAs and modern hydration/streaming frameworks: Next.js App Router (RSC streaming via self.__next_f), React 18 streaming hydration, Vue 3 (data-v-app), SvelteKit, Qwik, and Astro islands, plus empty mount nodes. Pages whose server-rendered chrome carries enough visible text to look “static” but whose actual content streams in client-side are caught by a content-is-client-rendered override (strong framework marker and a script payload that dwarfs the visible text).
For the long tail, add your own substrings with spa_markers — a page whose HTML contains any of them is treated as JS-rendered (matched alongside the built-in markers):
# Force browser rendering for pages carrying these markers
ketch config set spa_markers '["__next_f","data-v-app"]'
# Clear the list
ketch config set spa_markers '[]'
Crawling
Crawl entire sites via BFS link discovery or sitemaps:
# BFS crawl from a seed URL
ketch crawl https://example.com --depth 3
# Sitemap-based crawl
ketch crawl https://example.com/sitemap.xml --sitemap
# Run in background with status tracking
ketch crawl https://example.com/sitemap.xml --sitemap --background
ketch crawl status # list all crawls
ketch crawl status c_a1b2c3d4 # check specific crawl
ketch crawl stop c_a1b2c3d4 # stop a running crawl
Crawled pages are cached — re-running the same crawl returns instantly from cache. Use --no-cache to force re-fetch.
Code search
ketch code searches real source code across open-source repositories. Three backends:
# Grep (default) — zero config, no token, literal/regex over 1M+ public repos
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex
# Sourcegraph — zero config, ~1M OSS repos, exact line matches
ketch code "http.NewRequestWithContext" --lang go -b sourcegraph
# GitHub Code Search — uses your gh CLI token automatically if installed
ketch code "rate limit middleware" --lang go -b github --limit 10
Each result shows the matched line, repo, file path, star count, and a permalink. Sourcegraph results are filtered to non-archived, non-fork repos by default. --regex interprets the query as a regular expression (Grep and Sourcegraph only).
GitHub auth resolution chain (for -b github): explicit config (ketch config set github_token <tok>) → $GITHUB_TOKEN → $GH_TOKEN → gh auth token (if gh CLI is installed). Run ketch config to see which source is active. Stargazer counts come from a single batched GraphQL call after the REST search.
Library docs
ketch docs fetches curated, version-aware documentation snippets from Context7:
ketch config set context7_api_key ctx7sk_...
# Auto-resolve library from query
ketch docs "middleware authentication"
# Skip resolve, fetch directly from a known library ID
ketch docs "how to render with word wrap" --library /charmbracelet/glamour
# List matching library IDs without fetching docs
ketch docs --resolve "glamour"
Flags
| Flag | Scope | Default | Description |
|---|---|---|---|
--json | global | false | JSON output (the only global flag) |
--backend, -b | search, code, docs | cfg value | Backend for that surface |
--limit, -l | search, code, docs | 5 | Max results |
--scrape | search | false | Fetch full content from each result |
--minimal | search, code, docs | false | One result per line, tab-separated |
--searxng-url | search | http://localhost:8081 | SearXNG instance URL |
--raw | scrape | false | Raw HTML instead of markdown |
--select | scrape | — | CSS selector to extract (skips readability) |
--trim | search, scrape | false | Strip markdown formatting, keep text |
--max-chars | search, scrape | 0 | Truncate markdown to N chars (0 = off) |
--no-llms-txt | scrape | false | Disable /llms.txt detection for bare domains |
--force-browser | scrape | false | Always render via the configured browser (skips JS-shell detection; composes with --raw/--select) |
--concurrency | scrape | 5 | Max concurrent requests (multi-URL scrape) |
--no-cache | scrape, crawl | false | Bypass page cache |
--depth | crawl | 3 | Max BFS depth |
--concurrency | crawl | 8 | Worker pool size |
--sitemap | crawl | false | Treat seed URL as sitemap |
--background | crawl | false | Run in background, return crawl ID |
--allow | crawl | — | Path substring filters (any match passes) |
--deny | crawl | — | Regex deny patterns |
--regex | code | false | Interpret query as regex (grep, sourcegraph) |
--lang | code | — | Language qualifier (appended to query) |
--library | docs | — | Context7 library ID, skips resolve |
--tokens | docs | 4000 | Context7 token budget |
--resolve | docs | false | Resolve library name instead of searching |
Configuration
ketch reads defaults from ~/.config/ketch/config.json. Flags always override config values.
# Create a default config file
ketch config init
# Set a default backend
ketch config set backend searxng
# Set your SearXNG URL
ketch config set searxng_url http://my-searxng:8080
# Configure browser for JS-rendered pages
ketch config set browser chrome
# View effective config + available backends
ketch config
{
"config_path": "/home/user/.config/ketch/config.json",
"backend": "brave",
"searxng_url": "http://localhost:8081",
"limit": 5,
"cache_ttl": "72h",
"code_backend": "grepapp",
"docs_backend": "context7",
"sourcegraph_url": "https://sourcegraph.com",
"github_token_source": "gh-cli",
"available_backends": ["brave", "ddg", "searxng", "exa"],
"available_code_backends": ["grepapp", "sourcegraph", "github"],
"available_doc_backends": ["context7", "local"]
}
Search Backends (ketch search)
| Backend | Setup | Notes |
|---|---|---|
brave (default) | Free API key from brave.com/search/api | Stable JSON API |
ddg | Zero config | Rate-limited by DDG currently |
searxng | Self-hosted instance | Most reliable for heavy use |
exa | Zero config via hosted MCP; optional ketch config set exa_api_key <key> | AI-oriented search with snippets/content from Exa |
Code Backends (ketch code)
| Backend | Setup | Notes |
|---|---|---|
grepapp (default) | Zero config | Grep MCP (mcp.grep.app), no token, literal/regex over 1M+ public repos |
sourcegraph | Zero config | Grep-style, ~1M OSS repos, exact line matches, SSE stream, archived/fork filters |
github | gh auth login or ketch config set github_token <tok> or $GITHUB_TOKEN | REST /search/code + GraphQL stars batch. 30 req/min cap. Token must have repo scope. |
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex
ketch code "rate limit middleware" --lang go -b github --limit 10
ketch config set sourcegraph_url https://sourcegraph.com # optional, for self-hosted
ketch config set github_token ghp_xxx # explicit token
Docs Backends (ketch docs)
| Backend | Setup | Notes |
|---|---|---|
context7 (default) | Free key: ketch config set context7_api_key <key> | Curated snippets + prose, version-aware |
local | planned | FTS5 SQLite for offline/private docs (not yet implemented) |
ketch config set context7_api_key ctx7sk_...
ketch docs "how to render with word wrap" --library /charmbracelet/glamour
ketch docs "middleware authentication" # context7 auto-resolves library
ketch docs --resolve "glamour" # list matching library IDs
What’s Next
- Local FTS5 SQLite docs backend (
-b local) for offline/private docs
Agent integration
ketch is built to be called by AI agents. The operator configures the backend once; the agent just calls ketch search and ketch scrape without needing to know the infrastructure details.
Add this to your agent’s system prompt (CLAUDE.md, AGENTS.md, or equivalent):
## Web, Code, and Docs Research
Use `ketch` CLI for all external research — web pages, OSS code, library docs.
- Web search: `ketch search "query"` — titles, URLs, snippets
- Web search + full content: `ketch search "query" --scrape`
- Scrape: `ketch scrape <url>` — fetches a URL and returns clean markdown
- Batch scrape: `ketch scrape <url1> <url2> ...` — concurrent fetch
- Crawl: `ketch crawl <url> --sitemap --background` — crawl a site, poll with `ketch crawl status`
- Code search: `ketch code "query" --lang go` — real OSS code with line + repo + stars
- Library docs: `ketch docs "query" --library /org/repo` — version-aware curated snippets
- JS-rendered pages are handled automatically — if a page returns a loading shell, ketch re-fetches it with a headless browser.
- All commands support `--json` for structured output.
- Discovery: `ketch config` — returns effective config and available backends as JSON.
- The operator has already configured the search/code/docs backends and browser. Do not override unless you have a specific reason.
Why this works
An agent calling a web search API typically needs to know which provider to use, manage API keys, and handle provider-specific response formats. ketch collapses that: the operator runs ketch config set backend searxng (or ketch config set code_backend github, ketch config set docs_backend context7) once, and every agent invocation uses the right backend automatically. The agent’s system prompt doesn’t mention backends at all — it just says “use ketch.”
ketch config returns the full discovery payload as JSON — including which search, code, and docs backends are active and which token source is in effect — so an agent that needs to inspect capabilities can do so in one call without parsing help text.
License
相似文章
@QingQ77: 一句话告诉 Agent 要操控哪个网站,它自动帮你生成 CLI 工具,直接走你已登录的 Chrome 浏览器,不用调 API 或搞 token。 https://github.com/better-world-ai/x-cli… x-cl…
x-cli 是一个开源项目,通过 AI agent 自动生成操控网页的 CLI 工具,利用 Chrome 已登录会话,无需 API 或 token。支持安装 Kimi WebBridge 插件后一句话生成 CLI,已内置百度搜索、Google 搜索等示例工具。
@0xQiYan: 谷歌刚发布了官方 CLI,专为 AI 代理开发打造。 它不是新代理,而是给 Claude Code、Codex 等任何编码代理加一层“技能包”。装完后,你的代理自动学会从项目搭建、编码(ADK)、评估、部署到可观测的全套流程。 7 项官方…
谷歌发布了官方 CLI(google-agents-cli),专为 AI 代理开发打造,可增强 Claude Code、Codex 等编码代理的全流程能力,覆盖项目搭建、编码、评估、部署和可观测,开源且一行命令安装。
@gkxspace: 你的 AI Agent 现在能搜推特了。 Grok CLI 终于把 Twitter 的搜索能力给内置了。 X 官方搜索 API 很贵,Grok 这次加了 4 个搜索工具: 1、x_keyword_search:最强的一个,完整支持所有高级…
Grok CLI 最新版本内置了四个 Twitter 搜索工具(关键词搜索、语义搜索、用户搜索、线程抓取),允许本地 AI Agent 直接调用 Twitter 实时数据,大幅降低了 X 官方 API 的高昂成本。
@geekbb: 一个专为 AI 编码设计的 macOS 终端,把工作区管理、分屏和 AI agent 启动流程整合到一起。支持横竖分屏,一键启动 Claude Code、Codex、Gemini CLI 等七个 AI agent,右键选中内容直接提交给 …
kooky 是一个专为 AI 编码设计的 macOS 终端,整合工作区管理、分屏和 AI agent 启动流程,支持一键启动多个 AI agent 和右键提交内容。
@daweifs: AI Agent 必备神器:CLI-Anything Hub 一句话:任何 GUI/SaaS/桌面软件,都能一键变成 AI 原生 CLI! Agent 再也不用“帮我点按钮”了,直接一条命令搞定! 核心亮点: • 76+ 现成 CLI,覆…
CLI-Anything Hub 是一个开源工具,能将任意 GUI 或 SaaS 软件一键转化为 AI 原生 CLI,让 AI Agent 直接通过命令行操作,无需人工点击。目前已提供 76+ 现成 CLI,覆盖 31 类软件,并兼容 Claude Code、Cursor 等主流 Agent 框架。