# `mathematical.stats`¶

Functions for calculating statistics.

Functions:

 `absolute_deviation`(x[, axis, center, nan_policy]) Compute the absolute deviations from the median of the data along the given axis. `absolute_deviation_from_median`(x[, axis, …]) Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD. `d_cohen`(sample1, sample2[, which, tail, pooled]) Calculates and returns Cohen’s effect size index d. `g_durlak_bias`(g, n) Application of Durlak’s bias correction to the Hedge’s g statistic. `g_hedge`(sample1, sample2) Calculates and returns Hedge’s g-Statistic. `interpret_d`(d_or_g) Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ `iqr_none`(dataset) Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros. `mean_none`(dataset) Calculate the mean, excluding NaN, strings, boolean values, and zeros. `median_absolute_deviation`(x[, axis, center, …]) Compute the median absolute deviation of the data along the given axis. `median_none`(dataset) Calculate the median, excluding NaN, strings, boolean values, and zeros. `percentile_none`(dataset, percentage) Calculate the given percentile, excluding NaN, strings, boolean values, and zeros. `pooled_sd`(sample1, sample2[, weighted]) Returns the pooled standard deviation. `std_none`(dataset[, ddof]) Calculate the standard deviation, excluding NaN, strings, boolean values, and zeros. `within1min`(value1, value2) Returns whether `value2` is within one minute of `value1`.
`absolute_deviation`(x, axis=0, center=<function 'median'>, nan_policy='propagate')[source]

Compute the absolute deviations from the median of the data along the given axis.

Parameters
• x (array_like) – Input array or object that can be converted to an array.

• axis (`Optional`[`int`]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default `0`.

• center (`Callable`) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signature `func(arr, axis)`. Default `numpy.median()`.

• nan_policy (`Literal`[`'propagate'`, `'raise'`, `'omit'`]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default `'propagate'`.

Returns

If `axis=None`, a scalar is returned. If the input contains integers or floats of smaller precision than `numpy.float64`, then the output data-type is `numpy.float64`. Otherwise, the output data-type is the same as that of the input.

Return type

scalar or ndarray

Note

The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in `center=numpy.mean` will calculate the MAD around the mean - it will not calculate the mean absolute deviation.

`absolute_deviation_from_median`(x, axis=0, center=<function 'median'>, nan_policy='propagate')[source]

Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD.

Parameters
• x (array_like) – Input array or object that can be converted to an array.

• axis (`Optional`[`int`]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default `0`.

• center (`Callable`) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signature `func(arr, axis)`. Default `numpy.median()`.

• nan_policy (`Literal`[`'propagate'`, `'raise'`, `'omit'`]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default `'propagate'`.

Returns

If `axis=None`, a scalar is returned. If the input contains integers or floats of smaller precision than `numpy.float64`, then the output data-type is `numpy.float64`. Otherwise, the output data-type is the same as that of the input.

Return type

scalar or ndarray

Note

The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in `center=numpy.mean` will calculate the MAD around the mean - it will not calculate the mean absolute deviation.

`d_cohen`(sample1, sample2, which=1, tail=1, pooled=False)[source]

Calculates and returns Cohen’s effect size index d.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Edition). Hillsdale, NJ: Lawrence Erlbaum Associates

Parameters
Return type

`float`

`g_durlak_bias`(g, n)[source]

Application of Durlak’s bias correction to the Hedge’s g statistic.

n = n1+n2

Parameters
Return type

`float`

`g_hedge`(sample1, sample2)[source]

Calculates and returns Hedge’s g-Statistic.

Parameters
Return type

`float`

`interpret_d`(d_or_g)[source]

Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/

Parameters

d_or_g (`float`)

Return type

`str`

`iqr_none`(dataset)[source]

Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros.

Parameters

dataset (`Sequence`[`Union`[`float`, `bool`, `None`]]) – A list to calculate iqr from.

Return type

`float`

Returns

The interquartile range.

`mean_none`(dataset)[source]

Calculate the mean, excluding NaN, strings, boolean values, and zeros.

Parameters

dataset (`Sequence`[`Union`[`float`, `bool`, `None`]]) – list to calculate mean from

Return type

`float`

Returns

mean

`median_absolute_deviation`(x, axis=0, center=<function 'median'>, scale=1.4826, nan_policy='propagate')[source]

Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, 1) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation, but is more robust to outliers 2. The MAD of an empty array is `numpy.nan`.

Parameters
• x (array_like) – Input array or object that can be converted to an array.

• axis (`Optional`[`int`]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default `0`.

• center (`Callable`) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signature `func(arr, axis)`. Default `numpy.median()`.

• scale (`float`) – The scaling factor applied to the MAD. The default scale (1.4826) ensures consistency with the standard deviation for normally distributed data. Default `1.4826`.

• nan_policy (`Literal`[`'propagate'`, `'raise'`, `'omit'`]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default `'propagate'`.

Returns

If `axis=None`, a scalar is returned. If the input contains integers or floats of smaller precision than `numpy.float64`, then the output data-type is `numpy.float64`. Otherwise, the output data-type is the same as that of the input.

Return type

scalar or ndarray

Note

The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in `center=numpy.mean` will calculate the MAD around the mean - it will not calculate the mean absolute deviation.

References

1

“Median absolute deviation” https://en.wikipedia.org/wiki/Median_absolute_deviation

2

“Robust measures of scale” https://en.wikipedia.org/wiki/Robust_measures_of_scale

Examples

When comparing the behavior of median_absolute_deviation with `numpy.std`, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes:

```>>> import scipy.stats
>>> import mathematical.stats
>>> x = scipy.stats.norm.rvs(size=100, scale=1, random_state=123456)
>>> x.std()
0.9973906394005013
>>> mathematical.stats.median_absolute_deviation(x)
1.2280762773108278
>>> x[0] = 345.6
>>> x.std()
34.42304872314415
>>> mathematical.stats.median_absolute_deviation(x)
1.2340335571164334
Axis handling example:
>>> x = numpy.array([[10, 7, 4], [3, 2, 1]])
>>> x
array([[10,  7,  4], [ 3,  2,  1],])
>>> mathematical.stats.median_absolute_deviation(x)
array([5.1891, 3.7065, 2.2239])
>>> mathematical.stats.median_absolute_deviation(x, axis=None)
2.9652
```
`median_none`(dataset)[source]

Calculate the median, excluding NaN, strings, boolean values, and zeros.

Parameters

dataset (`Sequence`[`Union`[`float`, `bool`, `None`]]) – list to calculate median from

Return type

`float`

Returns

standard deviation

`percentile_none`(dataset, percentage)[source]

Calculate the given percentile, excluding NaN, strings, boolean values, and zeros.

Parameters
Raises

`ValueError` if `dataset` contains fewer than two values

Return type

`float`

Returns

The interquartile range.

`pooled_sd`(sample1, sample2, weighted=False)[source]

Returns the pooled standard deviation.

Parameters
Return type

`float`

`std_none`(dataset, ddof=1)[source]

Calculate the standard deviation, excluding NaN, strings, boolean values, and zeros.

Parameters
Return type

`float`

Returns

standard deviation

`within1min`(value1, value2)[source]

Returns whether `value2` is within one minute of `value1`.

Parameters
Return type

`bool`