mathematical.outliers
Outlier detection functions.
Functions:
|
Identifies outlier values using the Median Absolute Deviation. |
|
Identifies outlier values that are more than |
|
Identifies outlier values using the IBM SPSS method. |
|
Identifies outlier values that are greater than |
|
Identifies outlier values that are greater than |
-
mad_outliers
(dataset, strip_zero=True, threshold=3)[source] Identifies outlier values using the Median Absolute Deviation.
- Parameters
dataset (
Sequence
)threshold (
int
) –The multiple of MAD above which values are considered to be outliers. Default
3
.Leys et al. (2013) make the following recommendations:
In univariate statistics, the Median Absolute Deviation is the most robust dispersion/scale measure in presence of outliers, and hence we strongly recommend the median plus or minus 2.5 times the MAD method for outlier detection.
The threshold should be justified and the justification should clearly state that other concerns than cherry-picking degrees of freedom guided the selection. By default, we suggest a threshold of 2.5 as a reasonable choice.
We encourage researchers to report information about outliers, namely: the number of outliers removed and their value (or at least the distance between outliers and the selected threshold)
- Return type
- Returns
A list of the outlier values, and the remaining data points.
-
quartile_outliers
(dataset, strip_zero=True)[source] Identifies outlier values that are more than
3×
the inter-quartile range from the upper or lower quartile.
-
spss_outliers
(dataset, strip_zero=True, mode='all')[source] Identifies outlier values using the IBM SPSS method.
Outlier values are more than
1.5 × IQR
fromQ1
orQ3
.“Extreme values” are more than
3 × IQR
fromQ1
orQ3
.