Semble

Semble is a code search library built for agents. It returns the exact code snippets they need instantly, cutting both token usage and waiting time on every step. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see benchmarks). Everything runs on CPU with no API keys, GPU, or external services.

Run it as an MCP server and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo, cloned and indexed on demand.

Quick Start

Install Semble:

pip install semble  # Install with pip
uv add semble       # Install with uv

Index a repo and search it:

from semble import SembleIndex

# Index a local directory
index = SembleIndex.from_path("./my-project")

# Index a remote git repository
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

# Search with a natural-language or code query
results = index.search("save model to disk", top_k=3)

# Find code similar to a specific result
related = index.find_related(results[0], top_k=3)

# Each result exposes the matched chunk
result = results[0]
result.chunk.file_path   # "model2vec/model.py"
result.chunk.start_line  # 127
result.chunk.end_line    # 150
result.chunk.content     # "def save_pretrained(self, path: PathLike, ..."

Main Features

Fast: indexes a repo in ~250 ms and answers queries in ~1.5 ms, all on CPU.
Accurate: NDCG@10 of 0.854 on the benchmarks, on par with code-specialized transformer models at a fraction of the size and cost.
Local and remote: pass a local path or a git URL; indexes are cached for the session.
MCP server: drop-in tool for Claude Code, Cursor, Codex, OpenCode, and any other MCP-compatible agent.
Zero setup: runs on CPU with no API keys, GPU, or external services required.

How It Works

Semble splits each file into code-aware chunks using Chonkie, then scores every query with two complementary retrievers:

Semantic: static Model2Vec embeddings from the code-specialized potion-code-16M model.
Lexical: BM25 for exact matches on identifiers and API names.

The two score lists are fused with Reciprocal Rank Fusion (RRF) and then reranked with a set of code-aware signals:

Adaptive weighting — symbol-like queries (Foo::bar, getUserById) get more lexical weight; natural-language queries stay balanced.
Definition boosts — a chunk that defines the queried symbol (class, def, func) ranks above chunks that merely reference it.
Identifier stems — query tokens are stemmed and matched against identifier stems, so parse config boosts chunks containing parseConfig, ConfigParser, or config_parser.
File coherence — when multiple chunks from the same file match, the file is boosted so the top result reflects broad file-level relevance.
Noise penalties — test files, compat/legacy shims, example code, and .d.ts stubs are down-ranked so canonical implementations surface first.

Because the embedding model is static with no transformer forward pass at query time, all of this runs in milliseconds on CPU.