CLI
Semble ships as a standalone CLI. This is the best fit for sub-agents (which typically cannot call MCP tools directly), scripts, and anywhere you want search results without an MCP session.
Install Semble first:
uv tool install semble # with uv (recommended)pip install semble # with pipCommands
# Search a local repo (index is built and cached automatically)semble search "authentication flow" ./my-project
# Search for a symbol or identifiersemble search "save_pretrained" ./my-project
# Search a remote repo (cloned on demand)semble search "save model to disk" https://github.com/MinishLab/model2vec
# Limit resultssemble search "save model to disk" ./my-project --top-k 10
# Search docs and prose (markdown, rst, etc.) instead of codesemble search "deployment guide" ./my-project --content docs
# Search config files (yaml, toml, terraform, etc.)semble search "database host port" ./my-project --content config
# Search everything (code, docs, and config)semble search "authentication" ./my-project --content all
# Find code similar to a known locationsemble find-related src/auth.py 42 ./my-project
# Clear cached indexessemble clear index
# Clear saved token savings statssemble clear savings
# Clear everythingsemble clear all--content accepts code (default), docs, config, or all. path defaults to the current directory when omitted; git URLs are accepted. If semble is not on $PATH, use uvx --from "semble[mcp]" semble in its place.
Controlling which files are indexed
Semble reads .gitignore and .sembleignore files to determine which files to index. Both use standard gitignore syntax and their patterns are merged. .sembleignore lets you add Semble-specific rules without touching .gitignore.
Excluding files:
generated/ # exclude generated directory*.pb.go # exclude Go protobuf filesIncluding non-default extensions — prefix the pattern with ! to force-include files Semble wouldn’t index by default:
!*.proto # include Protobuf files!*.cob # include COBOL filesSemble also always skips well-known non-source directories (node_modules/, .venv/, dist/, build/, __pycache__/, and similar) regardless of ignore files.
Storage
Indexes and token savings statistics are stored in the OS cache folder by default:
- macOS:
~/Library/Caches/semble/ - Linux:
~/.cache/semble/ - Windows:
%LOCALAPPDATA%\semble\Cache\
To override the location, set SEMBLE_CACHE_LOCATION to a full path:
export SEMBLE_CACHE_LOCATION=~/my-folder/sembleToken savings
semble savings shows how many tokens Semble has saved across all your searches:
semble savings # summary by periodsemble savings --verbose # also show breakdown by call type Semble Token Savings ════════════════════════════════════════════════════════════════ Period Calls Savings ──────────────────────────────────────────────────────────────── Today 42 [███████████████░] ~58.4k tokens (95%) Last 7 days 287 [██████████████░░] ~312.4k tokens (90%) All time 1.4k [██████████████░░] ~1.2M tokens (89%)For each call, Semble records the total character count of the unique files containing returned chunks and the character count of the snippets returned. Estimated tokens saved is (file chars − snippet chars) / 4 (4 chars per token). This is a conservative estimate: the baseline is reading matched files in full, which is how coding agents often explore unfamiliar code.
Library usage
Semble can also be used as a Python library for programmatic access, useful when building custom tooling or integrating search directly into your own code.
from semble import ContentType, SembleIndex
# Index a local directory (code only, the default)index = SembleIndex.from_path("./my-project")
# Index docs and prose (markdown, rst, etc.)index = SembleIndex.from_path("./my-project", content=ContentType.DOCS)
# Index everything (code, docs, and config)index = SembleIndex.from_path( "./my-project", content=[ContentType.CODE, ContentType.DOCS, ContentType.CONFIG],)
# Index a remote git repositoryindex = SembleIndex.from_git("https://github.com/MinishLab/model2vec")
# Search the index with a natural-language or code queryresults = index.search("save model to disk", top_k=3)
# Find code similar to a specific resultrelated = index.find_related(results[0], top_k=3)
# Each result exposes the matched chunkresult = results[0]result.chunk.file_path # "model2vec/model.py"result.chunk.start_line # 127result.chunk.end_line # 150result.chunk.content # "def save_pretrained(self, path: PathLike, ..."