• Model2Vec (docs, repo): Create state-of-the-art static embedding models by distilling Sentence Transformers.
  • SemHash (docs, repo): Fast semantic text deduplication, outlier detection, and representative sampling.
  • Vicinity (docs, repo): A lightweight library for efficient nearest neighbor search that supports various backends.
  • Tokenlearn (docs, repo): Our method to pre-train static embedding models.
  • Model2Vec-rs (docs, repo): Rust-native implementation of Model2Vec for high performance.