mathematical.outliers

Outlier detection functions.

Functions:

mad_outliers(dataset[, strip_zero, threshold])

Identifies outlier values using the Median Absolute Deviation.

quartile_outliers(dataset[, strip_zero])

Identifies outlier values that are more than the inter-quartile range from the upper or lower quartile.

spss_outliers(dataset[, strip_zero, mode])

Identifies outlier values using the IBM SPSS method.

stdev_outlier(dataset[, strip_zero, rng])

Identifies outlier values that are greater than rng × stdev from mean.

two_stdev(dataset[, strip_zero])

Identifies outlier values that are greater than stdev from the mean.

mad_outliers(dataset, strip_zero=True, threshold=3)[source]

Identifies outlier values using the Median Absolute Deviation.

Parameters
  • dataset (Sequence)

  • strip_zero (bool) – Default True.

  • threshold (int) –

    The multiple of MAD above which values are considered to be outliers. Default 3.

    Leys et al. (2013) make the following recommendations:

    1. In univariate statistics, the Median Absolute Deviation is the most robust dispersion/scale measure in presence of outliers, and hence we strongly recommend the median plus or minus 2.5 times the MAD method for outlier detection.

    2. The threshold should be justified and the justification should clearly state that other concerns than cherry-picking degrees of freedom guided the selection. By default, we suggest a threshold of 2.5 as a reasonable choice.

    3. We encourage researchers to report information about outliers, namely: the number of outliers removed and their value (or at least the distance between outliers and the selected threshold)

Return type

Tuple[List[float], List[float]]

Returns

A list of the outlier values, and the remaining data points.

quartile_outliers(dataset, strip_zero=True)[source]

Identifies outlier values that are more than the inter-quartile range from the upper or lower quartile.

Parameters
Return type

Tuple[List[float], List[float]]

Returns

A list of the outlier values, and the remaining data points.

spss_outliers(dataset, strip_zero=True, mode='all')[source]

Identifies outlier values using the IBM SPSS method.

Outlier values are more than 1.5 × IQR from Q1 or Q3.

“Extreme values” are more than 3 × IQR from Q1 or Q3.

Parameters
  • dataset (Sequence)

  • mode (str) – str. Default 'all'.

Return type

Tuple[List[float], List[float], List[float]]

Returns

A list of extreme outliers, a list of other outliers, and the remaining data points.

stdev_outlier(dataset, strip_zero=True, rng=2)[source]

Identifies outlier values that are greater than rng × stdev from mean.

Parameters
Return type

Tuple[List[float], List[float]]

Returns

A list of the outlier values, and the remaining data points.

two_stdev(dataset, strip_zero=True)[source]

Identifies outlier values that are greater than stdev from the mean.

Parameters
Return type

Tuple[List[float], List[float]]

Returns

A list of the outlier values, and the remaining data points.