Model2Vec-rs

Model2Vec-rs is a Rust crate providing an efficient implementation for inference with Model2Vec static embedding models. It’s ~1.7x faster than the Python version and is designed for high performance inference in Rust applications.

Quickstart

You can utilize model2vec-rs in two ways:

As a library in your Rust projects
As a standalone Command-Line Interface (CLI) tool for quick terminal-based inferencing

Using `model2vec-rs` as a Library

First, add model2vec-rs as a dependency:

cargo add model2vec-rs

Then, you can use it like this:

use anyhow::Result;
use model2vec_rs::model::StaticModel;

fn main() -> Result<()> {
    // Load a model from the Hugging Face Hub or a local path.
    // Arguments: (repo_or_path, hf_token, normalize_embeddings, subfolder_in_repo)
    let model = StaticModel::from_pretrained(
        "minishlab/potion-base-8M", // Model ID from Hugging Face or local path to model directory
        None,                       // Optional: Hugging Face API token for private models
        None,                       // Optional: bool to override model's default normalization. `None` uses model's config.
        None                        // Optional: subfolder if model files are not at the root of the repo/path
    )?;

    let sentences = vec![
        "Hello world".to_string(),
        "Rust is awesome".to_string(),
    ];

    // Generate embeddings using default parameters
    // (Default max_length: Some(512), Default batch_size: 1024)
    let embeddings = model.encode(&sentences);
    // `embeddings` is a Vec<Vec<f32>>
    println!("Generated {} embeddings.", embeddings.len());

    // To generate embeddings with custom arguments:
    let custom_embeddings = model.encode_with_args(
        &sentences,
        Some(256), // Optional: custom max token length for truncation
        512,       // Custom batch size for processing
    );
    println!("Generated {} custom embeddings.", custom_embeddings.len());

    Ok(())
}

Using the `model2vec-rs` CLI

Install model2vec-rs:

cargo install model2vec-rs

CLI Usage

# Encode a single sentence
model2vec-rs encode-single "Hello world" "minishlab/potion-base-8M"

# Encode multiple lines from a file and save to an output file:**
echo -e "This is the first sentence.\nThis is another sentence." > my_texts.txt
model2vec-rs encode my_texts.txt "minishlab/potion-base-8M" --output embeddings_output.json

Note: ensure ~/.cargo/bin/ is in your system’s PATH to run model2vec-rs from any directory.

Overview

Model2Vec

SemHash

Vicinity

Tokenlearn

Model2Vec-rs

Quickstart

Using `model2vec-rs` as a Library

Using the `model2vec-rs` CLI

CLI Usage

Overview

Model2Vec

SemHash

Vicinity

Tokenlearn

Model2Vec-rs

​Quickstart

​Using model2vec-rs as a Library

​Using the model2vec-rs CLI

​CLI Usage

Quickstart

Using `model2vec-rs` as a Library

Using the `model2vec-rs` CLI

CLI Usage