Semble: CPU-only code search MCP server cuts agent token use by 98%

Semble is an open-source code search library aimed at coding agents like Claude Code, Cursor, and Codex. Instead of having an agent grep for keywords and read entire files, Semble accepts natural-language queries and returns only the relevant chunks, which the project claims reduces token consumption by roughly 98% compared to grep-and-read workflows. It runs entirely on CPU with no API keys, GPUs, or external services, and ships as an MCP server, a bash CLI for AGENTS.md/CLAUDE.md integration, and a Python library.

Under the hood, Semble chunks files with Chonkie, then combines static Model2Vec embeddings from the code-specialized potion-code-16M model with BM25 lexical scoring, fused via Reciprocal Rank Fusion. A reranking stage applies code-aware signals: queries that look like symbols get weighted toward lexical matches, chunks that define a queried symbol rank above chunks that merely reference it, and identifier stems are matched against query stems. Indexing a typical repo takes around 250 ms and queries return in roughly 1.5 ms, reportedly 200x faster indexing and 10x faster queries than transformer-based alternatives while retaining about 99% of their retrieval quality.

The project frames itself less as a developer search tool and more as agent infrastructure: an attempt to push back on the runaway context costs of agentic coding by replacing brute-force file reads with targeted retrieval. A built-in semble savings command tracks cumulative tokens avoided, reinforcing the pitch that retrieval quality and cost-per-call are now first-class metrics in the agent tooling stack.