mathematical.data_frames

Mathematical operations for Data Frames.

Data:

ColumnLabelList

Type hint for the column_label_list parameter in the df_*() functions.

Functions:

df_count(row[, column_label_list])

Count the number of occurrences of a non-NaN value in the specified columns of a data frame.

df_data_points(row, column_label_list)

Compile the values for the specified columns in each row into a list.

df_delta(row, left_column, right_column)

Calculate the difference between values in the two columns for each row of a data frame.

df_delta_relative(row, left_column, right_column)

Calculate the relative difference between values in the two columns for each row of a data frame.

df_log(row, column_label_list[, base])

Calculate the logarithm of the values in each row for the specified columns of a data frame.

df_log_stdev(row[, column_label_list])

Calculate the standard deviation of the log10 values in each row for the specified columns of a data frame.

df_mean(row[, column_label_list])

Calculate the mean of each row for the specified columns of a data frame.

df_median(row[, column_label_list])

Calculate the median of each row for the specified columns of a data frame.

df_outliers(row[, column_label_list, …])

Identify outliers in each row.

df_percentage(row, column_label, total)

Returns the value of the specified column as a percentage of the given total.

df_stdev(row[, column_label_list])

Calculate the standard deviation of each row for the specified columns of a data frame.

set_display_options([desired_width, …])

Set the display options for numpy and pandas.

ColumnLabelList

Type hint for the column_label_list parameter in the df_*() functions.

Alias of Optional[Sequence[str]]

df_count(row, column_label_list=None)[source]

Count the number of occurrences of a non-NaN value in the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Count"] = data_frame.apply(
    func=df_count,
    args=[["Bob", "Alice"]],
    axis=1,
    )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to count occurrences in. Default None.

Return type

int

Returns

Count of the occurrences of non-NaN values.

df_data_points(row, column_label_list)[source]

Compile the values for the specified columns in each row into a list.

Do not call this function directly; use it with df.apply() instead:

data_frame["Data Points"] = data_frame.apply(
        func=df_data_points,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Sequence[str]) – List of column labels to calculate standard deviation for.

Return type

List

Returns

The number of data points.

df_delta(row, left_column, right_column)[source]

Calculate the difference between values in the two columns for each row of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Delta"] = data_frame.apply(
        func=df_delta,
        args=["Bob", "Alice"],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • left_column (str)

  • right_column (str)

Return type

float

Returns

The difference between left_column and right_column.

New in version 0.4.0.

df_delta_relative(row, left_column, right_column)[source]

Calculate the relative difference between values in the two columns for each row of a data frame:

(left - right) / right

Do not call this function directly; use it with df.apply() instead:

data_frame["Rel. Delta"] = data_frame.apply(
        func=df_delta_relative,
        args=["Bob", "Alice"],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • left_column (str)

  • right_column (str)

Return type

float

Returns

The relative difference between left_column and right_column.

New in version 0.4.0.

df_log(row, column_label_list, base=10)[source]

Calculate the logarithm of the values in each row for the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Bob Log10"] = data_frame.apply(
        func=df_log,
        args=[["Bob"], 10],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Sequence[str]) – List of column labels to calculate log for.

  • base (float) – The logarithmic base. Default 10.

Return type

float

Returns

The logarithmic value.

df_log_stdev(row, column_label_list=None)[source]

Calculate the standard deviation of the log10 values in each row for the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Log Stdev"] = data_frame.apply(
        func=df_log_stdev,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to calculate standard deviation for. Default None.

Return type

float

Returns

The standard deviation

df_mean(row, column_label_list=None)[source]

Calculate the mean of each row for the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Mean"] = data_frame.apply(
        func=df_mean,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to calculate the mean for. Default None.

Return type

float

Returns

The mean

df_median(row, column_label_list=None)[source]

Calculate the median of each row for the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Median"] = data_frame.apply(
        func=df_median,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to calculate median for. Default None.

Return type

float

Returns

The median

df_outliers(row, column_label_list=None, outlier_mode=1)[source]

Identify outliers in each row.

This function only returns the list of outliers (if any). If you want the list of values without the outliers see the functions in mathematical.outliers.

Do not call this function directly; use it with df.apply() instead:

data_frame["Outliers"] = data_frame.apply(
        func=df_outliers,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to determine outliers for. Default None.

  • outlier_mode (int) – outlier detection method to use. Default 1.

The supported outlier modes are:

  • 1 or :py:data`mathematical.data_frames.MAD` – Use the Median Absolute Deviation

  • 2 or :py:data`mathematical.data_frames.QUARTILES` – Treat values more than the inter-quartile range away from the upper or lower quartile as outliers.

  • 3 or :py:data`mathematical.data_frames.STDEV2` – Treat values more than rng × stdev away from mean as outliers

Return type

List

Returns

The outliers.

df_percentage(row, column_label, total)[source]

Returns the value of the specified column as a percentage of the given total.

The total is usually the sum of the specified column.

Do not call this function directly; use it with df.apply() instead:

data_frame["Bob Percentage"] = data_frame.apply(
        func=df_percentage,
        args=[13, "Bob"],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label (str) – The column to calculate percentage for.

  • total (float) – The total value.

Return type

float

Returns

Percentage * 100

df_stdev(row, column_label_list=None)[source]

Calculate the standard deviation of each row for the specified columns of a data frame.

Do not call this function directly; use it with df.apply() instead:

data_frame["Stdev"] = data_frame.apply(
        func=df_stdev,
        args=[["Bob", "Alice"]],
        axis=1,
        )
Parameters
  • row (Series) – Row of the data frame.

  • column_label_list (Optional[Sequence[str]]) – List of column labels to calculate standard deviation for. Default None.

Return type

float

Returns

The standard deviation

set_display_options(desired_width=300, max_columns=15, max_rows=20)[source]

Set the display options for numpy and pandas.

Parameters
  • desired_width (int) – The desired maximum output width, in characters. Default 300.

  • max_columns (int) – The maximum number of columns to display in a pandas.DataFrame. Default 15.

  • max_rows (int) – The maximum number of rows to display in a pandas.DataFrame. Default 20.

New in version 0.3.0.