Usage
Vicinity can be used to build and query a vector store. It provides a unified interface for various approximate nearest neighbor (ANN) backends, allowing you to easily switch between them and compare their performance.
Building a Vector Store
To build a vector store with Vicinity, you can use the Vicinity.from_vectors_and_items method. This method allows you to create a vector store from a set of vectors and their corresponding items.
For this example, we will use some dummy data, but you can replace it with your own vectors and items.
import numpy as npfrom vicinity import Vicinity
# Create some dummy data as strings or other serializable objectsitems = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]vectors = np.random.rand(len(items), 128)
# Initialize the Vicinity instance (using the basic backend and cosine metric)vicinity = Vicinity.from_vectors_and_items( vectors=vectors, items=items,)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vectors | npt.NDArray | The vectors to use. | |
items | Sequence[Any] | The items to use. | |
backend_type | Backend | str | Backend.BASIC | The type of backend to use. |
store_vectors | bool | False | Whether to store the raw vectors in the backend. |
kwargs | Any | Additional arguments to pass to the backend. |
There are two important parameters here:
backend_type: This specifies the type of backend to use. You can choose from various backends likeBackend.BASIC(default),Backend.FAISS, etc. Each backend has its own strengths and weaknesses. View the full list of backends in the supported backends documentation.metric: This specifies the distance metric to use for querying. You can choose from various metrics likeMetric.COSINE(default),Metric.EUCLIDEAN, etc. Which metrics are supported depends on the backend you choose.
Note that most backends have their own parameters that can be configured. These can be passed as additional keyword arguments. For a full list of parameters for each backend, see the supported backends documentation.
For example, if you want to use the FAISS backend with the HNSW algorithm and Euclidean metric, you can do it like this:
from vicinity import Vicinity, Backend, Metricvicinity = Vicinity.from_vectors_and_items( vectors=vectors, items=items, backend_type=Backend.FAISS, index_type="hnsw", metric=Metric.EUCLIDEAN)Querying a Vector Store
To query a vector store, you can use the query and query_threshold methods. This allows you to find the nearest neighbors of a given query vector or a list of query vectors.
Both methods support both single and batch queries.
Query Top-K Nearest Neighbors
# Create a query vectorquery_vector = np.random.rand(128)
# Query for nearest neighbors with a top-k searchresults = vicinity.query(query_vector, k=3)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vectors | npt.NDArray | The vectors to find the nearest neighbors to. | |
k | int | 10 | The number of most similar items to retrieve. |
Query with a Threshold
# Create a query vectorquery_vector = np.random.rand(128)
# Query for nearest neighbors with a threshold searchresults = vicinity.query_threshold(query_vector, threshold=0.9)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
vectors | npt.NDArray | The vectors to find the most similar vectors to. | |
threshold | float | 0.5 | The threshold to use. |
max_k | int | 100 | The maximum number of neighbors to consider for the threshold query. |
Saving and Loading a Vector Store
You can save and load a vector store using the save and load methods. This allows you to persist your vector store to disk and load it later.
vicinity.save('my_vector_store')vicinity = Vicinity.load('my_vector_store')Pushing and Loading to/from Hugging Face Hub
You can push and load a vector store to/from the Hugging Face Hub using the push_to_hub and from_pretrained methods. This allows you to easily share your vector store with others.
vicinity.push_to_hub(repo_id='minishlab/my-vicinity-repo')vicinity = Vicinity.load_from_hub(repo_id='minishlab/my-vicinity-repo')push_to_hub parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
repo_id | str | The repository ID on the Hugging Face Hub. | |
token | str | None | None | Optional authentication token for private repositories. If not provided, your default Hugging Face credentials are used. |
private | bool | False | Whether to create the repository as private. |
kwargs | Any | Additional arguments passed to Dataset.push_to_hub(). |
load_from_hub parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
repo_id | str | The repository ID on the Hugging Face Hub. | |
token | str | None | None | Optional authentication token for private repositories. If not provided, your default Hugging Face credentials are used. |
kwargs | Any | Additional arguments passed to Dataset.load_from_hub(). |
Evaluate a Vector Store
You can evaluate a vector store using the evaluate method. This method computes the recall and queries per second of the vector store based on a set of queries and their expected results.
# Use the first 1000 vectors as query vectorsquery_vectors = vectors[:1000]
# Evaluate the Vicinity instance by measuring the queries per second and recallqps, recall = vicinity.evaluate( full_vectors=vectors, query_vectors=query_vectors,)Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
full_vectors | npt.NDArray | The full dataset vectors used to build the index. | |
query_vectors | npt.NDArray | The query vectors to evaluate. | |
k | int | 10 | The number of nearest neighbors to retrieve. |
epsilon | float | 1e-3 | The epsilon threshold for recall calculation. |