Skip to content

Benchmarks

We benchmark quality and speed across all methods on ~1,250 queries over 63 repositories in 19 languages.

Main Results

MethodNDCG@10Index timeQuery p50
CodeRankEmbed Hybrid0.86257 s16 ms
semble0.854263 ms1.5 ms
CodeRankEmbed0.76557 s16 ms
ColGREP0.6935.8 s124 ms
BM250.673263 ms0.02 ms
ripgrep0.12612 ms

Semble achieves 99% of the retrieval quality of the 137M-parameter CodeRankEmbed Hybrid, while indexing 218× faster and answering queries 11× faster — entirely on CPU.

The charts below plot latency against NDCG@10. Marker size reflects model parameter count.

Speed vs quality (cold start) Time to first result (index + query) vs NDCG@10

Speed vs quality (warm) Query latency on a warm index vs NDCG@10

By Language

NDCG@10 per language. Best score per row is bolded.

LanguagesembleCRE HybridCREColGREPripgrep
scala0.9090.9220.8450.7650.180
cpp0.9150.9130.8460.6260.126
ruby0.9090.9090.7690.7080.230
elixir0.8940.9050.8690.8080.134
javascript0.9170.9030.9200.8230.176
zig0.9130.9010.8070.4740.000
csharp0.8850.8890.7430.6140.117
go0.8950.8840.6760.7850.133
python0.8670.8800.7940.7770.202
php0.8580.8740.7580.6630.123
swift0.8600.8730.7210.7100.160
bash0.8250.8520.8920.7060.000
lua0.8230.8470.8030.7980.000
java0.8490.8410.7060.6410.198
kotlin0.8210.8300.6700.6370.166
rust0.8560.8270.6270.6620.162
c0.7410.8060.7060.6760.000
haskell0.7650.7710.7760.6830.000
typescript0.7060.7080.5450.4300.128
overall0.8540.8620.7650.6930.126

Ablations

raw returns retrieval scores directly; + ranking feeds them through semble’s hybrid reranker.

RetrievalRaw+ ranking
BM250.6750.834
potion-code-16M0.6500.821
BM25 + potion-code-16M0.854

By query category:

ModeArchitectureSemanticSymbol
BM25 raw0.6280.6760.719
potion-code-16M raw0.6260.6660.629
semble BM25 (+ ranking)0.7700.8190.957
semble potion-code-16M (+ ranking)0.7570.8080.943
semble hybrid0.8020.8460.958

Dataset

~1,250 queries over 63 repositories in 19 languages, grouped into three categories:

CategoryQueriesWhat it tests
semantic711Code that implements a specific behavior or concept
architecture343Design decisions, module boundaries, structural patterns
symbol204Named entity lookup (function, class, type, variable)

Languages covered: bash, C, C++, C#, Elixir, Go, Haskell, Java, JavaScript, Kotlin, Lua, PHP, Python, Ruby, Rust, Scala, Swift, TypeScript, Zig.

Methods

  • ripgrep — fast regex search, included as a raw keyword-match baseline.
  • ColGREP — late-interaction code retrieval with the LateOn-Code-edge model.
  • CodeRankEmbed — 137M-param transformer embedding model. CRE Hybrid fuses its dense scores with BM25.
  • semblepotion-code-16M static embeddings + BM25 + the semble reranking stack.