Vicinity can be used to build and query a vector store. It provides a unified interface for various approximate nearest neighbor (ANN) backends, allowing you to easily switch between them and compare their performance.

Building a Vector Store

To build a vector store with Vicinity, you can use the Vicinity.from_vectors_and_items method. This method allows you to create a vector store from a set of vectors and their corresponding items. For this example, we will use some dummy data, but you can replace it with your own vectors and items.

import numpy as np
from vicinity import Vicinity

# Create some dummy data as strings or other serializable objects
items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)

# Initialize the Vicinity instance (using the basic backend and cosine metric)
vicinity = Vicinity.from_vectors_and_items(
    vectors=vectors,
    items=items,
)

There are two important parameters here:

  • backend_type: This specifies the type of backend to use. You can choose from various backends like Backend.BASIC (default), Backend.FAISS, etc. Each backend has its own strengths and weaknesses. View the full list of backends in the supported backends documentation.
  • metric: This specifies the distance metric to use for querying. You can choose from various metrics like Metric.COSINE (default), Metric.EUCLIDEAN, etc. Which metrics are supported depends on the backend you choose.

Note that most backends have their own parameters that can be configured. These can be passed as additional keyword arguments. For a full list of parameters for each backend, see the supported backends documentation.

For example, if you want to use the FAISS backend with the HNSW algorithm and Euclidean metric, you can do it like this:

from vicinity import Vicinity, Backend, Metric
vicinity = Vicinity.from_vectors_and_items(
    vectors=vectors,
    items=items,
    backend_type=Backend.FAISS,
    index_type="hnsw",
    metric=Metric.EUCLIDEAN
)

Querying a Vector Store

To query a vector store, you can use the query and query_threshold methods. This allows you to find the nearest neighbors of a given query vector or a list of query vectors. Both methods support both single and batch queries.

Query Top-K Nearest Neighbors

# Create a query vector
query_vector = np.random.rand(128)

# Query for nearest neighbors with a top-k search
results = vicinity.query(query_vector, k=3)

Query with a Threshold

# Create a query vector
query_vector = np.random.rand(128)

# Query for nearest neighbors with a threshold search
results = vicinity.query_threshold(query_vector, threshold=0.9)

Saving and Loading a Vector Store

You can save and load a vector store using the save and load methods. This allows you to persist your vector store to disk and load it later.

vicinity.save('my_vector_store')
vicinity = Vicinity.load('my_vector_store')

Pushing and Loading to/from Hugging Face Hub

You can push and load a vector store to/from the Hugging Face Hub using the push_to_hub and from_pretrained methods. This allows you to easily share your vector store with others.

vicinity.push_to_hub(repo_id='minishlab/my-vicinity-repo')
vicinity = Vicinity.load_from_hub(repo_id='minishlab/my-vicinity-repo')

Evaluate a Vector Store

You can evaluate a vector store using the evaluate method. This method computes the recall and queries per second of the vector store based on a set of queries and their expected results.

# Use the first 1000 vectors as query vectors
query_vectors = vectors[:1000]

# Evaluate the Vicinity instance by measuring the queries per second and recall
qps, recall = vicinity.evaluate(
    full_vectors=vectors,
    query_vectors=query_vectors,
)