Benchmarks

We benchmark quality and speed across all methods on ~1,250 queries over 63 repositories in 19 languages.

Main Results

Method	NDCG@10	Index time	Query p50
CodeRankEmbed Hybrid	0.862	57 s	16 ms
semble	0.854	263 ms	1.5 ms
CodeRankEmbed	0.765	57 s	16 ms
ColGREP	0.693	5.8 s	124 ms
BM25	0.673	263 ms	0.02 ms
ripgrep	0.126	—	12 ms

Semble achieves 99% of the retrieval quality of the 137M-parameter CodeRankEmbed Hybrid, while indexing 218× faster and answering queries 11× faster — entirely on CPU.

The charts below plot latency against NDCG@10. Marker size reflects model parameter count.

Speed vs quality (cold start) Time to first result (index + query) vs NDCG@10

Speed vs quality (warm) Query latency on a warm index vs NDCG@10

By Language

NDCG@10 per language. Best score per row is bolded.

Language	semble	CRE Hybrid	CRE	ColGREP	ripgrep
scala	0.909	0.922	0.845	0.765	0.180
cpp	0.915	0.913	0.846	0.626	0.126
ruby	0.909	0.909	0.769	0.708	0.230
elixir	0.894	0.905	0.869	0.808	0.134
javascript	0.917	0.903	0.920	0.823	0.176
zig	0.913	0.901	0.807	0.474	0.000
csharp	0.885	0.889	0.743	0.614	0.117
go	0.895	0.884	0.676	0.785	0.133
python	0.867	0.880	0.794	0.777	0.202
php	0.858	0.874	0.758	0.663	0.123
swift	0.860	0.873	0.721	0.710	0.160
bash	0.825	0.852	0.892	0.706	0.000
lua	0.823	0.847	0.803	0.798	0.000
java	0.849	0.841	0.706	0.641	0.198
kotlin	0.821	0.830	0.670	0.637	0.166
rust	0.856	0.827	0.627	0.662	0.162
c	0.741	0.806	0.706	0.676	0.000
haskell	0.765	0.771	0.776	0.683	0.000
typescript	0.706	0.708	0.545	0.430	0.128
overall	0.854	0.862	0.765	0.693	0.126

Ablations

raw returns retrieval scores directly; + ranking feeds them through semble’s hybrid reranker.

Retrieval	Raw	+ ranking
BM25	0.675	0.834
potion-code-16M	0.650	0.821
BM25 + potion-code-16M	—	0.854

By query category:

Mode	Architecture	Semantic	Symbol
BM25 raw	0.628	0.676	0.719
potion-code-16M raw	0.626	0.666	0.629
semble BM25 (+ ranking)	0.770	0.819	0.957
semble potion-code-16M (+ ranking)	0.757	0.808	0.943
semble hybrid	0.802	0.846	0.958

Dataset

~1,250 queries over 63 repositories in 19 languages, grouped into three categories:

Category	Queries	What it tests
semantic	711	Code that implements a specific behavior or concept
architecture	343	Design decisions, module boundaries, structural patterns
symbol	204	Named entity lookup (function, class, type, variable)

Languages covered: bash, C, C++, C#, Elixir, Go, Haskell, Java, JavaScript, Kotlin, Lua, PHP, Python, Ruby, Rust, Scala, Swift, TypeScript, Zig.

Methods

ripgrep — fast regex search, included as a raw keyword-match baseline.
ColGREP — late-interaction code retrieval with the LateOn-Code-edge model.
CodeRankEmbed — 137M-param transformer embedding model. CRE Hybrid fuses its dense scores with BM25.
semble — potion-code-16M static embeddings + BM25 + the semble reranking stack.