Building a Vector Store
To build a vector store with Vicinity, you can use theVicinity.from_vectors_and_items
method. This method allows you to create a vector store from a set of vectors and their corresponding items.
For this example, we will use some dummy data, but you can replace it with your own vectors and items.
Parameters
Parameters
backend_type
: This specifies the type of backend to use. You can choose from various backends likeBackend.BASIC
(default),Backend.FAISS
, etc. Each backend has its own strengths and weaknesses. View the full list of backends in the supported backends documentation.metric
: This specifies the distance metric to use for querying. You can choose from various metrics likeMetric.COSINE
(default),Metric.EUCLIDEAN
, etc. Which metrics are supported depends on the backend you choose.
Querying a Vector Store
To query a vector store, you can use thequery
and query_threshold
methods. This allows you to find the nearest neighbors of a given query vector or a list of query vectors.
Both methods support both single and batch queries.
Query Top-K Nearest Neighbors
Parameters
Parameters
Query with a Threshold
Parameters
Parameters
Saving and Loading a Vector Store
You can save and load a vector store using thesave
and load
methods. This allows you to persist your vector store to disk and load it later.
Pushing and Loading to/from Hugging Face Hub
You can push and load a vector store to/from the Hugging Face Hub using thepush_to_hub
and from_pretrained
methods. This allows you to easily share your vector store with others.
push_to_hub parameters
push_to_hub parameters
The repository ID on the Hugging Face Hub.
Optional authentication token for private repositories. If not provided, your default Hugging Face credentials are used.
Whether to create the repository as private.
Additional arguments passed to
Dataset.push_to_hub()
.load_from_hub parameters
load_from_hub parameters
Evaluate a Vector Store
You can evaluate a vector store using theevaluate
method. This method computes the recall and queries per second of the vector store based on a set of queries and their expected results.