CLI

Semble ships as a standalone CLI. This is the best fit for sub-agents (which typically cannot call MCP tools directly), scripts, and anywhere you want search results without an MCP session.

Install Semble first:

uv tool install semble   # with uv (recommended)
pip install semble       # with pip

Commands

# Search a local repo (index is built and cached automatically)
semble search "authentication flow" ./my-project

# Search for a symbol or identifier
semble search "save_pretrained" ./my-project

# Search a remote repo (cloned on demand)
semble search "save model to disk" https://github.com/MinishLab/model2vec

# Limit results
semble search "save model to disk" ./my-project --top-k 10

# Search docs and prose (markdown, rst, etc.) instead of code
semble search "deployment guide" ./my-project --content docs

# Search config files (yaml, toml, terraform, etc.)
semble search "database host port" ./my-project --content config

# Search everything (code, docs, and config)
semble search "authentication" ./my-project --content all

# Find code similar to a known location
semble find-related src/auth.py 42 ./my-project

# Clear cached indexes
semble clear index

# Clear saved token savings stats
semble clear savings

# Clear everything
semble clear all

--content accepts code (default), docs, config, or all. path defaults to the current directory when omitted; git URLs are accepted. If semble is not on $PATH, use uvx --from "semble[mcp]" semble in its place.

Controlling which files are indexed

Semble reads .gitignore and .sembleignore files to determine which files to index. Both use standard gitignore syntax and their patterns are merged. .sembleignore lets you add Semble-specific rules without touching .gitignore.

Excluding files:

generated/     # exclude generated directory
*.pb.go        # exclude Go protobuf files

Including non-default extensions — prefix the pattern with ! to force-include files Semble wouldn’t index by default:

!*.proto       # include Protobuf files
!*.cob         # include COBOL files

Semble also always skips well-known non-source directories (node_modules/, .venv/, dist/, build/, __pycache__/, and similar) regardless of ignore files.

Storage

Indexes and token savings statistics are stored in the OS cache folder by default:

macOS: ~/Library/Caches/semble/
Linux: ~/.cache/semble/
Windows: %LOCALAPPDATA%\semble\Cache\

To override the location, set SEMBLE_CACHE_LOCATION to a full path:

export SEMBLE_CACHE_LOCATION=~/my-folder/semble

Token savings

semble savings shows how many tokens Semble has saved across all your searches:

semble savings           # summary by period
semble savings --verbose # also show breakdown by call type

  Semble Token Savings
  ════════════════════════════════════════════════════════════════
  Period        Calls   Savings
  ────────────────────────────────────────────────────────────────
  Today         42      [███████████████░]  ~58.4k tokens (95%)
  Last 7 days   287     [██████████████░░]  ~312.4k tokens (90%)
  All time      1.4k    [██████████████░░]  ~1.2M tokens (89%)

For each call, Semble records the total character count of the unique files containing returned chunks and the character count of the snippets returned. Estimated tokens saved is (file chars − snippet chars) / 4 (4 chars per token). This is a conservative estimate: the baseline is reading matched files in full, which is how coding agents often explore unfamiliar code.

Library usage

Semble can also be used as a Python library for programmatic access, useful when building custom tooling or integrating search directly into your own code.

from semble import ContentType, SembleIndex

# Index a local directory (code only, the default)
index = SembleIndex.from_path("./my-project")

# Index docs and prose (markdown, rst, etc.)
index = SembleIndex.from_path("./my-project", content=ContentType.DOCS)

# Index everything (code, docs, and config)
index = SembleIndex.from_path(
    "./my-project",
    content=[ContentType.CODE, ContentType.DOCS, ContentType.CONFIG],
)

# Index a remote git repository
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

# Search the index with a natural-language or code query
results = index.search("save model to disk", top_k=3)

# Find code similar to a specific result
related = index.find_related(results[0], top_k=3)

# Each result exposes the matched chunk
result = results[0]
result.chunk.file_path   # "model2vec/model.py"
result.chunk.start_line  # 127
result.chunk.end_line    # 150
result.chunk.content     # "def save_pretrained(self, path: PathLike, ..."