Usage

Vicinity can be used to build and query a vector store. It provides a unified interface for various approximate nearest neighbor (ANN) backends, allowing you to easily switch between them and compare their performance.

Building a Vector Store

To build a vector store with Vicinity, you can use the Vicinity.from_vectors_and_items method. This method allows you to create a vector store from a set of vectors and their corresponding items. For this example, we will use some dummy data, but you can replace it with your own vectors and items.

import numpy as np
from vicinity import Vicinity

# Create some dummy data as strings or other serializable objects
items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)

# Initialize the Vicinity instance (using the basic backend and cosine metric)
vicinity = Vicinity.from_vectors_and_items(
    vectors=vectors,
    items=items,
)

Parameters

Parameter	Type	Default	Description
`vectors`	`npt.NDArray`		The vectors to use.
`items`	`Sequence[Any]`		The items to use.
`backend_type`	`Backend \| str`	`Backend.BASIC`	The type of backend to use.
`store_vectors`	`bool`	`False`	Whether to store the raw vectors in the backend.
`kwargs`	`Any`		Additional arguments to pass to the backend.

There are two important parameters here:

backend_type: This specifies the type of backend to use. You can choose from various backends like Backend.BASIC (default), Backend.FAISS, etc. Each backend has its own strengths and weaknesses. View the full list of backends in the supported backends documentation.
metric: This specifies the distance metric to use for querying. You can choose from various metrics like Metric.COSINE (default), Metric.EUCLIDEAN, etc. Which metrics are supported depends on the backend you choose.

Note that most backends have their own parameters that can be configured. These can be passed as additional keyword arguments. For a full list of parameters for each backend, see the supported backends documentation.

For example, if you want to use the FAISS backend with the HNSW algorithm and Euclidean metric, you can do it like this:

from vicinity import Vicinity, Backend, Metric
vicinity = Vicinity.from_vectors_and_items(
    vectors=vectors,
    items=items,
    backend_type=Backend.FAISS,
    index_type="hnsw",
    metric=Metric.EUCLIDEAN
)

Querying a Vector Store

To query a vector store, you can use the query and query_threshold methods. This allows you to find the nearest neighbors of a given query vector or a list of query vectors. Both methods support both single and batch queries.

Query Top-K Nearest Neighbors

# Create a query vector
query_vector = np.random.rand(128)

# Query for nearest neighbors with a top-k search
results = vicinity.query(query_vector, k=3)

Parameters

Parameter	Type	Default	Description
`vectors`	`npt.NDArray`		The vectors to find the nearest neighbors to.
`k`	`int`	`10`	The number of most similar items to retrieve.

Query with a Threshold

# Create a query vector
query_vector = np.random.rand(128)

# Query for nearest neighbors with a threshold search
results = vicinity.query_threshold(query_vector, threshold=0.9)

Parameters

Parameter	Type	Default	Description
`vectors`	`npt.NDArray`		The vectors to find the most similar vectors to.
`threshold`	`float`	`0.5`	The threshold to use.
`max_k`	`int`	`100`	The maximum number of neighbors to consider for the threshold query.

Saving and Loading a Vector Store

You can save and load a vector store using the save and load methods. This allows you to persist your vector store to disk and load it later.

vicinity.save('my_vector_store')
vicinity = Vicinity.load('my_vector_store')

Pushing and Loading to/from Hugging Face Hub

You can push and load a vector store to/from the Hugging Face Hub using the push_to_hub and from_pretrained methods. This allows you to easily share your vector store with others.

vicinity.push_to_hub(repo_id='minishlab/my-vicinity-repo')
vicinity = Vicinity.load_from_hub(repo_id='minishlab/my-vicinity-repo')

push_to_hub parameters

Parameter	Type	Default	Description
`repo_id`	`str`		The repository ID on the Hugging Face Hub.
`token`	`str \| None`	`None`	Optional authentication token for private repositories. If not provided, your default Hugging Face credentials are used.
`private`	`bool`	`False`	Whether to create the repository as private.
`kwargs`	`Any`		Additional arguments passed to `Dataset.push_to_hub()`.

load_from_hub parameters

Parameter	Type	Default	Description
`repo_id`	`str`		The repository ID on the Hugging Face Hub.
`token`	`str \| None`	`None`	Optional authentication token for private repositories. If not provided, your default Hugging Face credentials are used.
`kwargs`	`Any`		Additional arguments passed to `Dataset.load_from_hub()`.

Evaluate a Vector Store

You can evaluate a vector store using the evaluate method. This method computes the recall and queries per second of the vector store based on a set of queries and their expected results.

# Use the first 1000 vectors as query vectors
query_vectors = vectors[:1000]

# Evaluate the Vicinity instance by measuring the queries per second and recall
qps, recall = vicinity.evaluate(
    full_vectors=vectors,
    query_vectors=query_vectors,
)

Parameters

Parameter	Type	Default	Description
`full_vectors`	`npt.NDArray`		The full dataset vectors used to build the index.
`query_vectors`	`npt.NDArray`		The query vectors to evaluate.
`k`	`int`	`10`	The number of nearest neighbors to retrieve.
`epsilon`	`float`	`1e-3`	The epsilon threshold for recall calculation.