mathematical.stats
Functions for calculating statistics.
Functions:
|
Compute the absolute deviations from the median of the data along the given axis. |
|
Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD. |
|
Calculates and returns Cohen’s effect size index d. |
|
Application of Durlak’s bias correction to the Hedge’s g statistic. |
|
Calculates and returns Hedge’s g-Statistic. |
|
Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/ |
|
Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros. |
|
Calculate the mean, excluding NaN, strings, boolean values, and zeros. |
|
Compute the median absolute deviation of the data along the given axis. |
|
Calculate the median, excluding NaN, strings, boolean values, and zeros. |
|
Calculate the given percentile, excluding NaN, strings, boolean values, and zeros. |
|
Returns the pooled standard deviation. |
|
Calculate the standard deviation, excluding NaN, strings, boolean values, and zeros. |
|
Returns whether |
-
absolute_deviation(x, axis=0, center=<function median>, nan_policy='propagate')[source] Compute the absolute deviations from the median of the data along the given axis.
- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional[int]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0.center (
Callable) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis). Default<function median at 0x7f2c181178f0>.nan_policy (
Literal['propagate','raise','omit']) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'.
- Returns
If
axis=None, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64, then the output data-type isnumpy.float64. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
absolute_deviation(x, axis:None, center = …, nan_policy = … ) ->floatabsolute_deviation(x, axis:int= …, center = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.meanwill calculate the MAD around the mean - it will not calculate the mean absolute deviation.
-
absolute_deviation_from_median(x, axis=0, center=<function median>, nan_policy='propagate')[source] Compute the absolute deviation from the median of each point in the data along the given axis, given in terms of the MAD.
- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional[int]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0.center (
Callable) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis). Default<function median at 0x7f2c181178f0>.nan_policy (
Literal['propagate','raise','omit']) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'.
- Returns
If
axis=None, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64, then the output data-type isnumpy.float64. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
absolute_deviation_from_median(x, axis:None, center = …, nan_policy = … ) ->floatabsolute_deviation_from_median(x, axis:int= …, center = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.meanwill calculate the MAD around the mean - it will not calculate the mean absolute deviation.
-
d_cohen(sample1, sample2, which=1, tail=1, pooled=False)[source] Calculates and returns Cohen’s effect size index d.
See also
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Edition). Hillsdale, NJ: Lawrence Erlbaum Associates
-
g_durlak_bias(g, n)[source] Application of Durlak’s bias correction to the Hedge’s g statistic.
n = n1+n2
- Parameters
- Return type
-
g_hedge(sample1, sample2)[source] Calculates and returns Hedge’s g-Statistic.
Formula from https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/hedgeg.htm
-
interpret_d(d_or_g)[source] Interpret Cohen’s d or Hedge’s g values using Table 1 from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/
-
iqr_none(dataset)[source] Calculate the interquartile range, excluding NaN, strings, boolean values, and zeros.
-
median_absolute_deviation(x, axis=0, center=<function median>, scale=1.4826, nan_policy='propagate')[source] Compute the median absolute deviation of the data along the given axis. The median absolute deviation (MAD, 1) computes the median over the absolute deviations from the median. It is a measure of dispersion similar to the standard deviation, but is more robust to outliers 2. The MAD of an empty array is
numpy.nan.- Parameters
x (array_like) – Input array or object that can be converted to an array.
axis (
Optional[int]) – Axis along which the range is computed. If None, compute the MAD over the entire array. Default0.center (
Callable) – A function that will return the central value. The default is to use numpy.median. Any user defined function used will need to have the function signaturefunc(arr, axis). Default<function median at 0x7f2c181178f0>.scale (
float) – The scaling factor applied to the MAD. The default scale (1.4826) ensures consistency with the standard deviation for normally distributed data. Default1.4826.nan_policy (
Literal['propagate','raise','omit']) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default'propagate'.
- Returns
If
axis=None, a scalar is returned. If the input contains integers or floats of smaller precision thannumpy.float64, then the output data-type isnumpy.float64. Otherwise, the output data-type is the same as that of the input.- Return type
scalar or ndarray
- Overloads
median_absolute_deviation(x, axis:None, center = …, scale = …, nan_policy = … ) ->floatmedian_absolute_deviation(x, axis:int= …, center = …, scale = …, nan_policy = … ) ->ndarray
Note
The center argument only affects the calculation of the central value around which the MAD is calculated. That is, passing in
center=numpy.meanwill calculate the MAD around the mean - it will not calculate the mean absolute deviation.References
- 1
“Median absolute deviation” https://en.wikipedia.org/wiki/Median_absolute_deviation
- 2
“Robust measures of scale” https://en.wikipedia.org/wiki/Robust_measures_of_scale
Examples
When comparing the behavior of median_absolute_deviation with
numpy.std, the latter is affected when we change a single value of an array to have an outlier value while the MAD hardly changes:>>> import scipy.stats >>> import mathematical.stats >>> x = scipy.stats.norm.rvs(size=100, scale=1, random_state=123456) >>> x.std() 0.9973906394005013 >>> mathematical.stats.median_absolute_deviation(x) 1.2280762773108278 >>> x[0] = 345.6 >>> x.std() 34.42304872314415 >>> mathematical.stats.median_absolute_deviation(x) 1.2340335571164334 Axis handling example: >>> x = numpy.array([[10, 7, 4], [3, 2, 1]]) >>> x array([[10, 7, 4], [ 3, 2, 1],]) >>> mathematical.stats.median_absolute_deviation(x) array([5.1891, 3.7065, 2.2239]) >>> mathematical.stats.median_absolute_deviation(x, axis=None) 2.9652
-
median_none(dataset)[source] Calculate the median, excluding NaN, strings, boolean values, and zeros.
-
percentile_none(dataset, percentage)[source] Calculate the given percentile, excluding NaN, strings, boolean values, and zeros.
-
pooled_sd(sample1, sample2, weighted=False)[source] Returns the pooled standard deviation.
- Parameters
- Return type