• Model2Vec (docs, repo): Create state-of-the-art static embedding models by distilling Sentence Transformers.

  • SemHash (docs, repo): Fast semantic text deduplication, outlier detection, and representative sampling.

  • Vicinity (docs, repo): A lightweight library for efficient nearest neighbor search that supports various backends.

  • Tokenlearn (docs, repo): Our method to pre-train static embedding models.

  • Model2Vec-rs (docs, repo): Rust-native implementation of Model2Vec for high performance.