Representative Sampling from a Single Dataset
To perform representative sampling from a single dataset, you can use theself_find_representative
method. This method will select a subset of samples that best represent the entire dataset based on their semantic similarity.
Parameters
Parameters
Number of representatives to select.
Number of top candidates to consider for MMR reranking. Defaults to “auto”, which calculates the limit based on the total number of records.
Trade-off parameter between relevance (1.0) and diversity (0.0). Must be between 0 and 1.
Representative Sampling Across Multiple Datasets
To perform representative sampling across multiple datasets, you can use thefind_representative
method. This method allows you to select a subset of samples from one dataset that best represents another dataset.
Parameters
Parameters
The new set of records (e.g., a test set) to find representative samples with against the fitted dataset.
Number of representatives to select.
Number of top candidates to consider for MMR reranking. Defaults to “auto”, which calculates the limit based on the total number of records.
Trade-off parameter between relevance (1.0) and diversity (0.0). Must be between 0 and 1.