Filter Outliers from a Single Dataset
To filter outliers from a single dataset, you can use theself_filter_outliers
method. This method will remove samples that are considered outliers based on their semantic similarity to the rest of the dataset.
Parameters
Parameters
The percentage (between 0 and 1) of records to consider as outliers.
Filter Outliers Across Multiple Datasets
To filter outliers across multiple datasets, you can use thefilter_outliers
method. This method allows you to remove outliers from one dataset based on another dataset, which is useful for ensuring that your test set does not contain samples that are significantly different from your training set.
Parameters
Parameters
Filter Outliers from a a Multi-Column Dataset
If you have a multi-column dataset, you can filter outliers from it by specifying the columns to use for outlier detection and filtering.FilterResult Functionality
TheFilterResult
returned by the outlier filtering methods provides several useful attributes:
selected
: The records that were not considered outliers.filtered
: The records that were considered outliers.scores_selected
: The similarity scores for the selected records.scores_filtered
: The similarity scores for the filtered records.filter_ratio
: The ratio of records that were filtered out as outliers.selected_ratio
: The ratio of records that were selected as non-outliers.