Skip to content

Usage

Indexing

Create an index from a local directory or a remote git repository:

from semble import SembleIndex
# Index a local directory
index = SembleIndex.from_path("./my-project")
# Index a remote git repository (cloned and cached locally)
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

Indexing a full repo typically takes under 300 ms. Remote repos are cloned on first use and cached for the lifetime of the process.

Searching

Search the index with a natural-language description or a code snippet:

results = index.search("save model to disk", top_k=5)
for result in results:
print(result.chunk.file_path, result.chunk.start_line)
print(result.chunk.content)
print()

Given any search result, find other chunks that are semantically similar to it:

results = index.search("tokenizer encode", top_k=1)
related = index.find_related(results[0], top_k=5)
for r in related:
print(r.chunk.file_path, r.chunk.start_line)

This is useful for exploring implementations. Start from one function and surface the code that uses or resembles it.

Search Modes

The mode parameter controls the retrieval strategy:

# Default: hybrid (BM25 + semantic, recommended)
results = index.search("parse config", mode="hybrid")
# Semantic only (best for natural-language queries)
results = index.search("parse config", mode="semantic")
# Lexical only (best for exact identifier lookups)
results = index.search("parse_config", mode="bm25")

Result Fields

Each result object exposes:

result = results[0]
result.score # float — relevance score
result.chunk.file_path # "src/config.py"
result.chunk.start_line # 42
result.chunk.end_line # 67
result.chunk.content # raw source code of the chunk