Tag
A blazing-fast, stateless CLI tool written in Go that integrates Web search, code search, and library documentation query. It supports web scraping and site crawling, designed for AI agents and terminal use.
cocoindex-code is an AST-based semantic code search tool that can be quickly integrated into coding agents, saving up to 70% tokens and improving search efficiency.
This paper benchmarks 17 deep learning models for first-stage recall in large-scale code-to-code retrieval, evaluating their precision, efficiency, and scalability across multiple programming languages and datasets. It introduces LLM-based code normalization and query rewriting schemes that improve precision for lower-performing models.
Headroom is an open-source tool that compresses token usage in code search results and AI conversations by up to 92% (e.g., from 17k to 1,400 tokens) while maintaining answer quality. It supports multiple platforms and runs locally for free.
The author built Nice Coding Agent, an open-source coding workbench with a visible and editable context stack, allowing users to curate exactly what the LLM sees. It features local-first retrieval, sandboxed execution, and hybrid code search, aiming to give developers control and visibility over context assembly.
Semble 是一个面向 AI 代理的高效代码搜索库,使用模型如 Model2Vec 或 BM25 实现快速索引和检索,比 grep+read 节省约 98% 的 token,支持 MCP 服务器和 CLI 集成。
Semble is an Agent-oriented code search tool that supports natural language queries, accurately returns semantically complete code snippets, saves 98% token consumption compared to traditional grep+read methods, and features intelligent chunking, dual-path retrieval, and code-aware re-ranking.
Argyph is an open-source MCP server that provides AI coding agents with structured codebase understanding via a symbol graph and semantic search, running entirely locally with no cloud dependencies.
Semble is a fast code search library for AI agents that uses ~98% fewer tokens than grep+read, runs on CPU with no external dependencies, and integrates via MCP or CLI.
This paper introduces CoREB, a contamination-limited multitask benchmark for code search that evaluates text-to-code, code-to-text, and code-to-code retrieval with fine-tuned reranking capabilities.
OpenAI released text-embedding-ada-002, a unified embedding model that consolidates five previous models into one with superior performance, 4x longer context (8192 tokens), smaller dimensionality (1536), and 99.8% lower pricing than previous Davinci embeddings.
OpenAI introduces a new embeddings API endpoint that converts text and code into numerical vector representations for semantic search, clustering, and classification tasks. The models achieve state-of-the-art results on standard benchmarks including a 20% relative improvement in code search performance.
OpenAI presents a contrastive pre-training approach for generating high-quality text and code embeddings at scale without supervision, achieving state-of-the-art results on linear-probe classification, semantic search, and code search benchmarks.
Sourcebot has launched an open-source MCP (Model Context Protocol) server that connects AI coding agents like Cursor, Claude Code, and Copilot to an entire codebase for search, file reading, and reference resolution. It supports OAuth 2.0 and API key authorization with a quick 1-minute install.