Usage
Indexing
Create an index from a local directory or a remote git repository:
from semble import SembleIndex
# Index a local directoryindex = SembleIndex.from_path("./my-project")
# Index a remote git repository (cloned and cached locally)index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")Indexing a full repo typically takes under 300 ms. Remote repos are cloned on first use and cached for the lifetime of the process.
Searching
Search the index with a natural-language description or a code snippet:
results = index.search("save model to disk", top_k=5)
for result in results: print(result.chunk.file_path, result.chunk.start_line) print(result.chunk.content) print()Finding Related Code
Given any search result, find other chunks that are semantically similar to it:
results = index.search("tokenizer encode", top_k=1)related = index.find_related(results[0], top_k=5)
for r in related: print(r.chunk.file_path, r.chunk.start_line)This is useful for exploring implementations. Start from one function and surface the code that uses or resembles it.
Search Modes
The mode parameter controls the retrieval strategy:
# Default: hybrid (BM25 + semantic, recommended)results = index.search("parse config", mode="hybrid")
# Semantic only (best for natural-language queries)results = index.search("parse config", mode="semantic")
# Lexical only (best for exact identifier lookups)results = index.search("parse_config", mode="bm25")Result Fields
Each result object exposes:
result = results[0]
result.score # float — relevance scoreresult.chunk.file_path # "src/config.py"result.chunk.start_line # 42result.chunk.end_line # 67result.chunk.content # raw source code of the chunk