@geekbb: A CLI tool written in Go that integrates three search capabilities: Web search (Brave/DDG/SearXNG/Exa), code search (Grep/Sourcegraph/GitHub), and library documentation query (Context7). It also supports web scraping and site crawling. For AI...

X AI KOLs Timeline Tools

Summary

A blazing-fast, stateless CLI tool written in Go that integrates Web search, code search, and library documentation query. It supports web scraping and site crawling, designed for AI agents and terminal use.

A CLI tool written in Go that integrates three search capabilities: Web search (Brave/DDG/SearXNG/Exa), code search (Grep/Sourcegraph/GitHub), and library documentation query (Context7). It also supports web scraping and site crawling. Provides blazing-fast, stateless terminal tools for AI agents: Web search / code search / library documentation and web scraping. https://t.co/6Ouv4AfSM4
Original Article
View Cached Full Text

Cached at: 06/30/26, 03:43 PM

A CLI tool written in Go that integrates three search capabilities: web search (Brave/DDG/SearXNG/Exa), code search (Grep/Sourcegraph/GitHub), and library documentation queries (Context7), along with web scraping and site crawling. Provides a fast, stateless terminal tool for AI agents for web search, code search, library docs, and scraping. https://t.co/6Ouv4AfSM4 — # 1broseidon/ketch Source: https://github.com/1broseidon/ketch

ketch

Fast, stateless CLI for web search, code search, library docs, and scraping.

Three search surfaces (web, code, docs), one binary, no daemon. Designed to be called by AI agents or directly from your terminal.

Install

Homebrew:

brew install 1broseidon/tap/ketch

Or with Go:

go install github.com/1broseidon/ketch@latest

Or grab a binary from releases.

Quick start

# Search the web
ketch search "golang error handling"

# Search and fetch full content from each result
ketch search "golang error handling" --scrape

# Search real OSS code (Grep by default, or Sourcegraph / GitHub)
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex
ketch code "rate limit middleware" --lang go -b github

# Search library docs (Context7)
ketch docs "how to render with word wrap" --library /charmbracelet/glamour

# Scrape a URL to clean markdown
ketch scrape https://go.dev/doc/effective_go

# Scrape multiple URLs concurrently
ketch scrape https://example.com https://go.dev

# Crawl a site
ketch crawl https://example.com --depth 2

# Crawl a sitemap in the background
ketch crawl https://example.com/sitemap.xml --sitemap --background

# JSON output for piping
ketch search "query" --json
ketch code "query" --json
ketch scrape https://example.com --json

Commands

CommandWhat it does
searchWeb search via Brave, DuckDuckGo, SearXNG, or Exa, optional --scrape for full content
codeCode search across OSS via Grep (default), Sourcegraph, or GitHub Code Search
docsLibrary/framework docs via Context7 (curated, version-aware snippets)
scrapeFetch URLs and extract clean markdown, concurrent batch support
crawlBFS or sitemap crawl with background execution and status tracking
browserManage headless Chrome for JS-rendered pages
configShow effective configuration and available backends as JSON
cacheShow cache stats or clear cached pages
versionPrint version, commit, and build date

All commands support --json for structured output. --json is the only global flag; -b/--backend is local to search, code, and docs.

Browser rendering

JS-rendered pages (React SPAs, Salesforce Lightning, etc.) are automatically detected and re-fetched via headless Chrome. No extra setup if Chrome is already installed:

# Point ketch to your Chrome installation
ketch config set browser chrome

# Or install Chromium to ketch's cache dir
ketch browser install

# Check browser status
ketch browser status

Once configured, browser rendering is transparent — ketch scrape and ketch crawl automatically detect JS-rendered pages and use the browser when needed. Static pages are always fetched via plain HTTP (fast path).

Detection (extract/detect.go) covers both classic SPAs and modern hydration/streaming frameworks: Next.js App Router (RSC streaming via self.__next_f), React 18 streaming hydration, Vue 3 (data-v-app), SvelteKit, Qwik, and Astro islands, plus empty mount nodes. Pages whose server-rendered chrome carries enough visible text to look “static” but whose actual content streams in client-side are caught by a content-is-client-rendered override (strong framework marker and a script payload that dwarfs the visible text).

For the long tail, add your own substrings with spa_markers — a page whose HTML contains any of them is treated as JS-rendered (matched alongside the built-in markers):

# Force browser rendering for pages carrying these markers
ketch config set spa_markers '["__next_f","data-v-app"]'

# Clear the list
ketch config set spa_markers '[]'

Crawling

Crawl entire sites via BFS link discovery or sitemaps:

# BFS crawl from a seed URL
ketch crawl https://example.com --depth 3

# Sitemap-based crawl
ketch crawl https://example.com/sitemap.xml --sitemap

# Run in background with status tracking
ketch crawl https://example.com/sitemap.xml --sitemap --background
ketch crawl status # list all crawls
ketch crawl status c_a1b2c3d4 # check specific crawl
ketch crawl stop c_a1b2c3d4 # stop a running crawl

Crawled pages are cached — re-running the same crawl returns instantly from cache. Use --no-cache to force re-fetch.

Code search

ketch code searches real source code across open-source repositories. Three backends:

# Grep (default) — zero config, no token, literal/regex over 1M+ public repos
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex

# Sourcegraph — zero config, ~1M OSS repos, exact line matches
ketch code "http.NewRequestWithContext" --lang go -b sourcegraph

# GitHub Code Search — uses your gh CLI token automatically if installed
ketch code "rate limit middleware" --lang go -b github --limit 10

Each result shows the matched line, repo, file path, star count, and a permalink. Sourcegraph results are filtered to non-archived, non-fork repos by default. --regex interprets the query as a regular expression (Grep and Sourcegraph only).

GitHub auth resolution chain (for -b github): explicit config (ketch config set github_token <token>) → $GITHUB_TOKEN$GH_TOKENgh auth token (if gh CLI is installed). Run ketch config to see which source is active.

Stargazer counts come from a single batched GraphQL call after the REST search.

Library docs

ketch docs fetches curated, version-aware documentation snippets from Context7:

ketch config set context7_api_key ctx7sk_...

# Auto-resolve library from query
ketch docs "middleware authentication"

# Skip resolve, fetch directly from a known library ID
ketch docs "how to render with word wrap" --library /charmbracelet/glamour

# List matching library IDs without fetching docs
ketch docs --resolve "glamour"

Flags

FlagScopeDefaultDescription
--jsonglobalfalseJSON output (the only global flag)
--backend, -bsearch, code, docscfg valueBackend for that surface
--limit, -lsearch, code, docs5Max results
--scrapesearchfalseFetch full content from each result
--minimalsearch, code, docsfalseOne result per line, tab-separated
--searxng-urlsearchhttp://localhost:8081SearXNG instance URL
--rawscrapefalseRaw HTML instead of markdown
--selectscrapeCSS selector to extract (skips readability)
--trimsearch, scrapefalseStrip markdown formatting, keep text
--max-charssearch, scrape0Truncate markdown to N chars (0 = off)
--no-llms-txtscrapefalseDisable /llms.txt detection for bare domains
--force-browserscrapefalseAlways render via the configured browser (skips JS-shell detection; composes with --raw/--select)
--concurrencyscrape5Max concurrent requests (multi-URL scrape)
--no-cachescrape, crawlfalseBypass page cache
--depthcrawl3Max BFS depth
--concurrencycrawl8Worker pool size
--sitemapcrawlfalseTreat seed URL as sitemap
--backgroundcrawlfalseRun in background, return crawl ID
--allowcrawlPath substring filters (any match passes)
--denycrawlRegex deny patterns
--regexcodefalseInterpret query as regex (grep, sourcegraph)
--langcodeLanguage qualifier (appended to query)
--librarydocsContext7 library ID, skips resolve
--tokensdocs4000Context7 token budget
--resolvedocsfalseResolve library name instead of searching

Configuration

ketch reads defaults from ~/.config/ketch/config.json. Flags always override config values.

# Create a default config file
ketch config init

# Set a default backend
ketch config set backend searxng

# Set your SearXNG URL
ketch config set searxng_url http://my-searxng:8080

# Configure browser for JS-rendered pages
ketch config set browser chrome

# View effective config + available backends
ketch config
{
  "config_path": "/home/user/.config/ketch/config.json",
  "backend": "brave",
  "searxng_url": "http://localhost:8081",
  "limit": 5,
  "cache_ttl": "72h",
  "code_backend": "grepapp",
  "docs_backend": "context7",
  "sourcegraph_url": "https://sourcegraph.com",
  "github_token_source": "gh-cli",
  "available_backends": ["brave", "ddg", "searxng", "exa"],
  "available_code_backends": ["grepapp", "sourcegraph", "github"],
  "available_doc_backends": ["context7", "local"]
}

Search Backends (ketch search)

BackendSetupNotes
brave (default)Free API key from brave.com/search/apiStable JSON API
ddgZero configRate-limited by DDG currently
searxngSelf-hosted instanceMost reliable for heavy use
exaZero config via hosted MCP; optional ketch config set exa_api_key <key>AI-oriented search with snippets/content from Exa

Code Backends (ketch code)

BackendSetupNotes
grepapp (default)Zero configGrep MCP (mcp.grep.app), no token, literal/regex over 1M+ public repos
sourcegraphZero configGrep-style, ~1M OSS repos, exact line matches, SSE stream, archived/fork filters
githubgh auth login or ketch config set github_token <token> or $GITHUB_TOKENREST /search/code + GraphQL stars batch. 30 req/min cap. Token must have repo scope.
ketch code "http.NewRequestWithContext" --lang go
ketch code "NewRequestWith.*Context" --regex
ketch code "rate limit middleware" --lang go -b github --limit 10
ketch config set sourcegraph_url https://sourcegraph.com # optional, for self-hosted
ketch config set github_token ghp_xxx # explicit token

Docs Backends (ketch docs)

BackendSetupNotes
context7 (default)Free key: ketch config set context7_api_key <key>Curated snippets + prose, version-aware
localplannedFTS5 SQLite for offline/private docs (not yet implemented)
ketch config set context7_api_key ctx7sk_...
ketch docs "how to render with word wrap" --library /charmbracelet/glamour
ketch docs "middleware authentication" # context7 auto-resolves library
ketch docs --resolve "glamour" # list matching library IDs

What’s Next

  1. Local FTS5 SQLite docs backend (-b local) for offline/private docs

Agent integration

ketch is built to be called by AI agents. The operator configures the backend once; the agent just calls ketch search and ketch scrape without needing to know the infrastructure details.

Add this to your agent’s system prompt (CLAUDE.md, AGENTS.md, or equivalent):

## Web, Code, and Docs Research

Use `ketch` CLI for all external research — web pages, OSS code, library docs.

- Web search: `ketch search "query"` — titles, URLs, snippets
- Web search + full content: `ketch search "query" --scrape`
- Scrape: `ketch scrape <url>` — fetches a URL and returns clean markdown
- Batch scrape: `ketch scrape <url1> <url2> ...` — concurrent fetch
- Crawl: `ketch crawl <url> --sitemap --background` — crawl a site, poll with `ketch crawl status`
- Code search: `ketch code "query" --lang go` — real OSS code with line + repo + stars
- Library docs: `ketch docs "query" --library /org/repo` — version-aware curated snippets
- JS-rendered pages are handled automatically — if a page returns a loading shell, ketch re-fetches it with a headless browser.
- All commands support `--json` for structured output.
- Discovery: `ketch config` — returns effective config and available backends as JSON.
- The operator has already configured the search/code/docs backends and browser. Do not override unless you have a specific reason.

Why this works

An agent calling a web search API typically needs to know which provider to use, manage API keys, and handle provider-specific response formats. ketch collapses that: the operator runs ketch config set backend searxng (or ketch config set code_backend github, ketch config set docs_backend context7) once, and every agent invocation uses the right backend automatically. The agent’s system prompt doesn’t mention backends at all — it just says “use ketch.”

ketch config returns the full discovery payload as JSON — including which search, code, and docs backends are active and which token source is in effect — so an agent that needs to inspect capabilities can do so in one call without parsing help text.

License

MIT

Similar Articles

@QingQ77: Tell the Agent which website to control in one sentence, and it automatically generates a CLI tool for you, directly using your logged-in Chrome browser, no API or token required. https://github.com/better-world-ai/x-cli… x-cl…

X AI KOLs Timeline

x-cli is an open-source project that uses an AI agent to automatically generate CLI tools for controlling web pages, leveraging the logged-in session of Chrome without needing an API or token. It supports generating a CLI in one sentence after installing the Kimi WebBridge plugin, and comes with built-in example tools like Baidu Search and Google Search.

@0xQiYan: Google just released an official CLI purpose-built for AI agent development. It’s not a new agent — it’s a “skill pack” you add on top of any coding agent like Claude Code, Codex, etc. Once installed, your agent automatically learns the full workflow: project bootstrapping, coding (ADK), evaluation, deployment, and observability. 7 official…

X AI KOLs Timeline

Google released the official CLI (google-agents-cli), designed for AI agent development, enhancing the full workflow capabilities of coding agents like Claude Code and Codex — covering project setup, coding, evaluation, deployment, and observability. It is open-source and installable with a single command.

@gkxspace: Your AI Agent Can Now Search Twitter. Grok CLI Finally Built in Twitter Search Capabilities. X's Official Search API Is Expensive. Grok Added 4 Search Tools This Time: 1. x_keyword_search: The Most Powerful, Fully Supports All Advanced Search Operators

X AI KOLs Timeline

The latest version of Grok CLI includes four Twitter search tools (keyword search, semantic search, user search, thread scraping), allowing local AI agents to directly access real-time Twitter data, significantly reducing the high cost of X's official API.

@geekbb: A macOS terminal designed for AI coding, integrating workspace management, split-screen, and AI agent startup workflows. Supports horizontal and vertical split screens, one-click launch of seven AI agents like Claude Code, Codex, Gemini CLI, and more. Right-click selected content to directly submit to...

X AI KOLs Timeline

kooky is a macOS terminal designed for AI coding, integrating workspace management, split-screen, and AI agent startup workflows. It supports one-click launch of multiple AI agents and right-click content submission.

@daweifs: The Essential Tool for AI Agents: CLI-Anything Hub. In a nutshell: Turn any GUI/SaaS/desktop software into an AI-native CLI with one click! Agents no longer need to 'click buttons'; just one command gets it done! Key highlights: • 76+ ready-made CLI covering...

X AI KOLs Timeline

CLI-Anything Hub is an open-source tool that can convert any GUI or SaaS software into an AI-native CLI with one click, allowing AI agents to operate directly via command line without manual clicks. It currently offers 76+ ready-made CLIs covering 31 categories of software, and is compatible with mainstream Agent frameworks such as Claude Code and Cursor.