mathematical.stats
Functions for calculating statistics.
Functions:
|
Compute the absolute deviations from the median of the data along the given axis. |
|
Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD. |
|
Calculates and returns Cohen’s effect size index d. |
|
Application of Durlak’s bias correction to the Hedge’s g statistic. |
|
Calculates and returns Hedge’s g-Statistic. |
|
Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ |
|
Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros. |
|
Calculate the mean, excluding NaN, strings, boolean values, and zeros. |
|
Compute the median absolute deviation of the data along the given axis. |
|
Calculate the median, excluding NaN, strings, boolean values, and zeros. |
|
Calculate the given percentile, excluding NaN, strings, boolean values, and zeros. |
|
Returns the pooled standard deviation. |
|
Calculate the standard deviation, excluding NaN, strings, boolean values, and zeros. |
|
Returns whether |
-
absolute_deviation
(x, axis=0, center=<function median>, nan_policy='propagate')[source] Compute the absolute deviations from the median of the data along the given axis.
- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional
[int
]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0
.center (
Callable
) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis)
. Default<function median at 0x7f3c81e93330>
.nan_policy (
Literal
['propagate'
,'raise'
,'omit'
]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'
.
- Returns
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64
, then the output data-type isnumpy.float64
. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
absolute_deviation
(x, axis:None
, center = …, nan_policy = … ) ->float
absolute_deviation
(x, axis:int
= …, center = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.mean
will calculate the MAD around the mean - it will not calculate the mean absolute deviation.
-
absolute_deviation_from_median
(x, axis=0, center=<function median>, nan_policy='propagate')[source] Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD.
- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional
[int
]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0
.center (
Callable
) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis)
. Default<function median at 0x7f3c81e93330>
.nan_policy (
Literal
['propagate'
,'raise'
,'omit'
]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'
.
- Returns
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64
, then the output data-type isnumpy.float64
. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
absolute_deviation_from_median
(x, axis:None
, center = …, nan_policy = … ) ->float
absolute_deviation_from_median
(x, axis:int
= …, center = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.mean
will calculate the MAD around the mean - it will not calculate the mean absolute deviation.
-
d_cohen
(sample1, sample2, which=1, tail=1, pooled=False)[source] Calculates and returns Cohen’s effect size index d.
See also
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Edition). Hillsdale, NJ: Lawrence Erlbaum Associates
-
g_durlak_bias
(g, n)[source] Application of Durlak’s bias correction to the Hedge’s g statistic.
n = n1+n2
- Parameters
- Return type
-
g_hedge
(sample1, sample2)[source] Calculates and returns Hedge’s g-Statistic.
Formula from https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/hedgeg.htm
-
interpret_d
(d_or_g)[source] Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/
-
iqr_none
(dataset)[source] Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros.
-
median_absolute_deviation
(x, axis=0, center=<function median>, scale=1.4826, nan_policy='propagate')[source] Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, 1) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation, but is more robust to outliers 2. The MAD of an empty array is
numpy.nan
.- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional
[int
]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0
.center (
Callable
) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis)
. Default<function median at 0x7f3c81e93330>
.scale (
float
) – The scaling factor applied to the MAD. The default scale (1.4826) ensures consistency with the standard deviation for normally distributed data. Default1.4826
.nan_policy (
Literal
['propagate'
,'raise'
,'omit'
]) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'
.
- Returns
If
axis=None
, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64
, then the output data-type isnumpy.float64
. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
median_absolute_deviation
(x, axis:None
, center = …, scale = …, nan_policy = … ) ->float
median_absolute_deviation
(x, axis:int
= …, center = …, scale = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.mean
will calculate the MAD around the mean - it will not calculate the mean absolute deviation.References
- 1
“Median absolute deviation” https://en.wikipedia.org/wiki/Median_absolute_deviation
- 2
“Robust measures of scale” https://en.wikipedia.org/wiki/Robust_measures_of_scale
Examples
When comparing the behavior of median_absolute_deviation with
numpy.std
, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes:>>> import scipy.stats >>> import mathematical.stats >>> x = scipy.stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> mathematical.stats.median_absolute_deviation(x) 1.2280762773108278 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> mathematical.stats.median_absolute_deviation(x) 1.2340335571164334 Axis handling example: >>> x = numpy.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4], [ 3, 2, 1],]) >>> mathematical.stats.median_absolute_deviation(x) array([5.1891, 3.7065, 2.2239]) >>> mathematical.stats.median_absolute_deviation(x, axis=None) 2.9652
-
median_none
(dataset)[source] Calculate the median, excluding NaN, strings, boolean values, and zeros.
-
percentile_none
(dataset, percentage)[source] Calculate the given percentile, excluding NaN, strings, boolean values, and zeros.
-
pooled_sd
(sample1, sample2, weighted=False)[source] Returns the pooled standard deviation.
- Parameters
- Return type