In this Python tutorial, we will learn about the “Scipy Stats Zscore” and additionally we will cover the following topics.
- Scipy Stats Zscore
- Scipy Stats Zscore nan
- Scipy Stats Zscore axis
- Scipy Stats Zscore log
- Scipy Stats Modified Zscore
Scipy Stats Zscore
The Python Scipy has a method zscore()
that exist in a module scipy.stats
which calculates the z-score of each data point of the sample to the mean of the samples. Actually, it finds the distance between the observation of the sample and the means of the sample containing the many observations with the help of standard deviation.
The syntax is given below.
scipy.stats.zscore(a, axis=0, ddof=0, nan_policy='propagate')
Where parameters are:
- a(array_data): It is array data containing sample observations.
- axis(int): It is used to specify on which axis to compute the z-score. By default, it is 0.
- ddof(): To determine the degree of freedom for the correction of the standard deviation that is calculated.
- nan_ploicy: It is used to handle the nan values that exist within the array. It has some parameters that handle nan values in different ways, the parameters are
omit
,propagate
andraise
.
The method zscore()
returns the zscore
of the given array as input.
Let’s understand with an example by following the below steps:
Import the required libraries using the below python code.
import numpy as np
from scipy.stats import zscore
Create an array containing data points using the below code.
array_obs = np.array([ 0.2797, 0.7670, 0.3834, 0.6687, 0.1908,
0.4591, 0.7036, 0.9956, 0.5601, 0.8050])
Now pass the above-created array to a method zscore()
using the below code.
zscore(array_obs)
This is how to compute the z-score of a given array of data.
Read: Scipy Ndimage Rotate
Scipy Stats Zscore nan
In the above subsection of python Scipy, we have computed the z-score of the given array but we have also learned about some parameters of the method zscore()
. One of the parameters of the method zscore()
is nan_policy
deals with nan
values in the given array. The nan
stands for Not a number
.
So here in this subsection, we will learn about how to use the nan_ploicy
parameter of the method to handle the nan values in an array while computing the z-score.
The parameter nan_policy
has also options that handle the nan
values in different ways, these options are shown below.
- omit: This option calculates the z-score while skipping the nan values in a given array.
- propagate: It shows the nan values.
- raise: It throws an error for the nan values in a given array.
Let’s take an example by following the below steps:
Import the required libraries using the below python code.
import numpy as np
from scipy.stats import zscore
Create an array containing data points with nan values using the below code.
array_obs = np.array([ 0.2797, 0.7670, np.nan, 0.6687, 0.1908,
0.4591, 0.7036, np.nan, 0.5601, np.nan])
Here in the above code, np.nan represents the nan values in an
array.
Pass the above array to the method with parameters nan_policy
equal to omit
using the below code.
zscore(array_obs,nan_policy='omit')
Again pass the above array to the method with parameters nan_policy
equal to raise
using the below code.
zscore(array_obs,nan_policy='raise')
Now, again pass the above array to the method with parameters nan_policy
equal to propagate
using the below code.
zscore(array_obs,nan_policy='propagate')
This is how to use the parameter nan_policy
to handle the nan values in the given array.
Read: Scipy Signal – Helpful Tutorial
Scipy Stats Zscore axis
In the above subsection of Python Scipy, we have used the parameter nan_policy
of the method zscore()
to handle the nan values in a given array. Here we will use another parameter axis
to compute the z-score along the specified axis of a given array.
The parameter axis
accepts two values 0
and 1
which represent the different axis of the given array. By default, the method computed the z-score along the axis 0.
Let’s understand with an example by following the below steps:
Import the required libraries using the below python code.
import numpy as np
from scipy.stats import zscore
Create an array containing data points using the below code.
array_obs = np.array([[ 0.8413, 0.8740, 0.3426, 0.8064],
[ 0.9417, 0.5770, 0.2706, 0.6569],
[ 0.1436, 0.3041, 0.9579, 0.4604],
[ 0.8195, 0.8496, 0.409 , 0.1273],
[ 0.1290, 0.1842, 0.8811, 0.6631]])
Input the array to a method zscore()
to calculate the z-score without specifying the axis or on by default axis value using the below code.
zscore(array_obs)
Again input the same array and the parameter axis
value to 1 using the below code.
zscore(array_obs,axis=1)
Look at the z-score value of the array based on the specified axis.
Read: Scipy Integrate + Examples
Scipy Stats Zscore log
Here in this subsection of python Scipy, we will transform the array using the log and compute the z-score of that transformed array.
Let’s take an example by following the below steps:
Import the required libraries using the below python code.
import numpy as np
from scipy.stats import zscore
Create an array containing data points using the below code.
array_obs = np.array([ 0.2797, 0.7670, 0.3834, 0.6687, 0.1908,
0.4591, 0.7036, 0.9956, 0.5601, 0.8050])
Transform the array value into other values by applying the log
method of NumPy on the array using the below code.
log_array = np.log(array_obs)
log_array
Now pass the transformed array to the method zscore()
using the below code.
zscore(log_array)
This is how to apply log on the array and then apply the method zscore()
to compute the z-score of that array.
Read: Scipy Integrate + Examples
Scipy Stats Modified Zscore
The Python Scipy doesn’t have any method to compute the modified z-score, the modified z-score uses the median instead of the mean. The formula for modified z-score is zscore = 0.6745 * (x_data - median) / median_absolute_deviation
.
So here we will perform the modified z-score manually by following the below steps:
Import the required libraries using the below python code.
import numpy as np
Create array data as sample data points using the below code.
array_data = np.array([3,5,6,5,3,6,8,8,4,2])
Compute the median of the array data using the below code.
array_median = np.median(array_data)
array_median
Compute the absolute difference between each data point and the median.
abs_diff = np.abs(array_data-array_median)
abs_diff
Comput the median absolute difference by finding the median of the above array median using the below code.
median_abs_diff = np.median(abs_diff)
median_abs_diff
Now apply the formula that we have learned above to compute the modified z-score using the below code.
modified_zscore = 0.6745* (array_data - array_median)/ median_abs_diff
modified_zscore
This is how to compute the modified z-score.
You may also like to read the following Python Scipy tutorials.
- Scipy Normal Distribution
- Python Scipy IIR Filter
- Python Scipy Chi-Square Test
- Python Scipy Confidence Interval
- Scipy Stats – Complete Guide
- Scipy Rotate Image + Examples
- Scipy Sparse – Helpful Tutorial
- Scipy Optimize – Helpful Guide
So, in this tutorial, we have learned about the “Scipy Stats Zscore” and covered the following topics.
- Scipy Stats Zscore
- Scipy Stats Zscore nan
- Scipy Stats Zscore axis
- Scipy Stats Zscore log
- Scipy Stats Modified Zscore
Python is one of the most popular languages in the United States of America. I have been working with Python for a long time and I have expertise in working with various libraries on Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… I have experience in working with various clients in countries like United States, Canada, United Kingdom, Australia, New Zealand, etc. Check out my profile.