Scipy Stats Zscore + Examples

In this Python tutorial, we will learn about the “Scipy Stats Zscore” and additionally we will cover the following topics.

  • Scipy Stats Zscore
  • Scipy Stats Zscore nan
  • Scipy Stats Zscore axis
  • Scipy Stats Zscore log
  • Scipy Stats Modified Zscore

Scipy Stats Zscore

The Python Scipy has a method zscore() that exist in a module scipy.stats which calculates the z-score of each data point of the sample to the mean of the samples. Actually, it finds the distance between the observation of the sample and the means of the sample containing the many observations with the help of standard deviation.

The syntax is given below.

scipy.stats.zscore(a, axis=0, ddof=0, nan_policy='propagate')

Where parameters are:

  • a(array_data): It is array data containing sample observations.
  • axis(int): It is used to specify on which axis to compute the z-score. By default, it is 0.
  • ddof(): To determine the degree of freedom for the correction of the standard deviation that is calculated.
  • nan_ploicy: It is used to handle the nan values that exist within the array. It has some parameters that handle nan values in different ways, the parameters are omit, propagate and raise.

The method zscore() returns the zscore of the given array as input.

Let’s understand with an example by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import zscore

Create an array containing data points using the below code.

array_obs = np.array([ 0.2797,  0.7670,  0.3834,  0.6687,  0.1908,
               0.4591,  0.7036,  0.9956,  0.5601,  0.8050])

Now pass the above-created array to a method zscore() using the below code.

zscore(array_obs)
Scipy Stats Zscore
Scipy Stats Zscore

This is how to compute the z-score of a given array of data.

Read: Scipy Ndimage Rotate

READ:  Python Tkinter Progress bar widget - How to use

Scipy Stats Zscore nan

In the above subsection of python Scipy, we have computed the z-score of the given array but we have also learned about some parameters of the method zscore(). One of the parameters of the method zscore() is nan_policy deals with nan values in the given array. The nan stands for Not a number.

So here in this subsection, we will learn about how to use the nan_ploicy parameter of the method to handle the nan values in an array while computing the z-score.

The parameter nan_policy has also options that handle the nan values in different ways, these options are shown below.

  • omit: This option calculates the z-score while skipping the nan values in a given array.
  • propagate: It shows the nan values.
  • raise: It throws an error for the nan values in a given array.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import zscore

Create an array containing data points with nan values using the below code.

array_obs = np.array([ 0.2797,  0.7670,  np.nan,  0.6687,  0.1908,
               0.4591,  0.7036,  np.nan,  0.5601,  np.nan])

Here in the above code, np.nan represents the nan values in an array.

Pass the above array to the method with parameters nan_policy equal to omit using the below code.

zscore(array_obs,nan_policy='omit')
Scipy Stats Zscore nan
Scipy Stats Zscore nan

Again pass the above array to the method with parameters nan_policy equal to raise using the below code.

zscore(array_obs,nan_policy='raise')
Scipy Stats Zscore nan example
Scipy Stats Zscore nan example

Now, again pass the above array to the method with parameters nan_policy equal to propagate using the below code.

zscore(array_obs,nan_policy='propagate')
Scipy Stats Zscore nan tutorial
Scipy Stats Zscore nan tutorial

This is how to use the parameter nan_policy to handle the nan values in the given array.

Read: Scipy Signal – Helpful Tutorial

Scipy Stats Zscore axis

In the above subsection of Python Scipy, we have used the parameter nan_policy of the method zscore() to handle the nan values in a given array. Here we will use another parameter axis to compute the z-score along the specified axis of a given array.

READ:  Binary Cross Entropy TensorFlow

The parameter axis accepts two values 0 and 1 which represent the different axis of the given array. By default, the method computed the z-score along the axis 0.

Let’s understand with an example by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import zscore

Create an array containing data points using the below code.

array_obs = np.array([[ 0.8413,  0.8740,  0.3426,  0.8064],
              [ 0.9417,  0.5770,  0.2706,  0.6569],
              [ 0.1436,  0.3041,  0.9579,  0.4604],
              [ 0.8195,  0.8496,  0.409 ,  0.1273],
              [ 0.1290,  0.1842,  0.8811,  0.6631]])

Input the array to a method zscore() to calculate the z-score without specifying the axis or on by default axis value using the below code.

zscore(array_obs)

Again input the same array and the parameter axis value to 1 using the below code.

zscore(array_obs,axis=1)
Scipy Stats Zscore axis
Scipy Stats Zscore axis

Look at the z-score value of the array based on the specified axis.

Read: Scipy Integrate + Examples

Scipy Stats Zscore log

Here in this subsection of python Scipy, we will transform the array using the log and compute the z-score of that transformed array.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import zscore

Create an array containing data points using the below code.

array_obs = np.array([ 0.2797,  0.7670,  0.3834,  0.6687,  0.1908,
               0.4591,  0.7036,  0.9956,  0.5601,  0.8050])

Transform the array value into other values by applying the log method of NumPy on the array using the below code.

log_array = np.log(array_obs)
log_array

Now pass the transformed array to the method zscore() using the below code.

zscore(log_array)
Scipy Stats Zscore log
Scipy Stats Zscore log

This is how to apply log on the array and then apply the method zscore() to compute the z-score of that array.

READ:  How to drop non-numeric columns from Pandas DataFrame [3 ways]

Read: Scipy Integrate + Examples

Scipy Stats Modified Zscore

The Python Scipy doesn’t have any method to compute the modified z-score, the modified z-score uses the median instead of the mean. The formula for modified z-score is zscore = 0.6745 * (x_data - median) / median_absolute_deviation.

So here we will perform the modified z-score manually by following the below steps:

Import the required libraries using the below python code.

import numpy as np

Create array data as sample data points using the below code.

array_data = np.array([3,5,6,5,3,6,8,8,4,2])

Compute the median of the array data using the below code.

array_median = np.median(array_data)
array_median

Compute the absolute difference between each data point and the median.

abs_diff = np.abs(array_data-array_median)
abs_diff

Comput the median absolute difference by finding the median of the above array median using the below code.

median_abs_diff = np.median(abs_diff)
median_abs_diff

Now apply the formula that we have learned above to compute the modified z-score using the below code.

modified_zscore =  0.6745* (array_data - array_median)/ median_abs_diff
modified_zscore
Scipy Stats Modified Zscore
Scipy Stats Modified Zscore

This is how to compute the modified z-score.

You may also like to read the following Python Scipy tutorials.

So, in this tutorial, we have learned about the “Scipy Stats Zscore” and covered the following topics.

  • Scipy Stats Zscore
  • Scipy Stats Zscore nan
  • Scipy Stats Zscore axis
  • Scipy Stats Zscore log
  • Scipy Stats Modified Zscore