Kurtosis is a statistical metric and we can utilize it in Python with the help of “scipy.stats.kurtosis“. So, in this Python tutorial, we will learn about the “Python Scipy Stats Kurtosis” with help of multiple examples. Moreover, we will cover the following topics.
- What is Kurtosis
- Python Scipy Stats Kurtosis
- Python Scipy Stats Kurtosis Test
- Python Scipy Stats Kurtosis Fisher
- Python Scipy Stats Kurtosis Nan_policy
Also, check the recent post on Python SciPy: Python Scipy Stats Mode with Examples
What is kurtosis
Kurtosis is a statistical metric used to characterize how much data clusters in a frequency distribution’s tails or peak. The tails are the endpoints of the distribution, while the peak is its highest point.
- Kurtosis can be classified as mesokurtic, leptokurtic, or platykurtic.
The concept of kurtosis and how each kurtosis looks are given below.
- Mesokurtic: distributions have curves with a medium peaked height and a moderate breadth.
2. Leptokurtic: More values near the mean and more values at the tails of the distribution.
3. Platykurtic: Fewer values near the mean and fewer values in the tails.
Read: Scipy Stats – Complete Guide
Python Scipy Stats Kurtosis
Kurtosis is a factual measure that characterizes how vigorously the tails of distribution contrast from the tails of a normal distribution.
The Scipy has a method kurtosis()
that calculates the kurtosis of a given data set. The fourth central moment, when divided by the variance’s square, is known as kurtosis.
The syntax is given below.
scipy.stats.kurtosis(a, axis=0, fisher=True, bias=True, nan_policy='propagate')
Where parameters are:
- a(array_data): It is array data whose kurtosis we want to calculate.
- axis(int): It is used to specify along which axis we want to calculate the kurtosis, by default kurtosis is calculated on the whole array.
- fisher(boolean): If it is true, then the fisher concept is used, otherwise Pearson concept is used.
- bias(boolean): If False, statistical bias is taken into account while making the calculations.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the
IQR
by ignoring the nan values. - propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
The method kurtosis()
returns the kurtosis values as an array. The values it returns are -3 for the fisher concept and zero for the Pearson concept.
Let’s take an example by following the below steps:
Import the required libraries.
from scipy import stats
array_data = stats.norm.rvs(size=2000, random_state=2)
stats.kurtosis(array_data)
The value of kurtosis is close to zero as we can see in the above output.
Read: Scipy Stats – Complete Guide
Python Scipy Stats Kurtosis test
In Scipy the method kurtosistest()
is used to check whether the given data set has normal kurtosis or not.
The syntax is given below.
scipy.stats.kurtosistest(a, axis=0, nan_policy='propagate', alternative='two-sided')
Where parameters are:
- a(array_data): It is array data whose kurtosis we want to calculate.
- axis(int): It is used to specify along which axis we want to calculate the kurtosis, by default kurtosis is calculated on the whole array.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the
IQR
by ignoring the nan values. - propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
The method kurtosistest()
returns two values statistics and p-value of type float.
Let’s take an example using the below code.
from scipy.stats import kurtosistest
kurtosistest(list(range(30)))
From the output, we can conclude whether the given data is normal kurtosis or not.
Read: Scipy Rotate Image + Examples
Python Scipy Stats Kurtosis Fisher
We have already learned about the method kurtosis()
of Python Scipy and it has one parameter fisher
of type boolean from several parameters. Fisher’s kurtosis measures a distribution’s tail-heaviness in relation to a normal distribution.
A positive Fisher’s kurtosis indicates that there are substantial outliers in the distribution. If Fisher’s kurtosis is negative, the probability density distribution is substantially more uniform than it would be if it were normal.
- Mesokurtic distributions are those that have a Fisher kurtosis of zero or extremely near to zero. This category includes a distribution that is normal.
- Platypurtic distributions are those that have negative Fisher’s kurtosis and are flat-topped or uniform. E.g a uniform distribution.
- High-positivity distributions Leptokurtic distributions refer to Fisher’s kurtosis. According to the use case, leptokurtic distributions have outliers that may need to be handled or processed. They are “tail-heavy distributions.” Examples include Levy distribution and Laplace distribution.
So here we will tune the parameter fisher
of the method kurtosis()
through an example to see the change in result.
Import the required libraries using the below python code.
from scipy import stats
Generate an array of data containing 3000 values using the method norm.rvs()
and calculate the kurtosis with parameter fisher
equal to True
.
array_data = stats.norm.rvs(size=3000, random_state=3)
stats.kurtosis(array_data,fisher = True)
Now again calculate the kurtosis of the same data with the parameter fisher
equal to False
.
array_data = norm.rvs(size=3000, random_state=3)
stats.kurtosis(array_data,fisher = False)
When we set the fisher
equal to True
, then the kurtosis for the data is -0.060, and for False
fisher value, the result is 2.93. When fisher
equal to False
, then Pearson kurtosis is calculated. That is why differences exist in the result.
Read: Scipy Optimize – Helpful Guide
Python Scipy Stats Kurtosis Nan_policy
The method kurtosis()
of Python Scipy accepts a parameter nan_policy
to handle the nan values within the array. if we calculate the kurtosis of the array containing nan values, then the method kurtosis()
returns nan as result by default.
To handle these nan values within the array, we will use the different values for the parameter nan_policy
of the method kurtosis()
. The nan_polciy parameters accept three values to deal with nan values:
omit: It means calculating the IQR
by ignoring the nan values.
propagate: It means returns nan values, and It is also by default value.
raise: It means to throw an error for the nan values.
Let’s understand with an example by following the below steps:
Import the required libraries using the below python code.
from scipy.stats import kurtosis
import numpy as np
Create an array containing nan values, to include the nan values within the array, we have used the np.nan
of Numpy using the below code.
array_data = [2,4,5,6,2,6,8,5,np.nan,5,8,8]
Compute the kurtosis of the above-created array without the parameter nan_policy
using the below code.
kurtosis(array_data)
Now, specify the parameter nan_policy
with a value equal to omit
using the below code.
kurtosis(array_data,nan_policy = 'omit')
Again change the parameter nan_policy
with a value equal to propagate
using the below code.
kurtosis(array_data,nan_policy = 'propagate')
At last change the parameter nan_policy
with a value equal to raise
using the below code.
kurtosis(array_data,nan_policy = 'propagate')
Refer to the concept of each parameter and see the output and how the nan value is handled.
You may also like to read the following Python SciPy tutorials.
- Python Scipy Special Module
- Python Scipy Eigenvalues
- Python Scipy ttest_ind
- Python Scipy Stats Poisson
- Python Scipy Distance Matrix
- Scipy Constants – Multiple Examples
- Python Scipy Stats Multivariate_Normal
So, in this tutorial, we have learned about the “Python Scipy Stats Kurtosis” and covered the following topics.
- What is Kurtosis
- Python Scipy Stats Kurtosis
- Python Scipy Stats Kurtosis Test
- Python Scipy Stats Kurtosis Fisher
- Python Scipy Stats Kurtosis Nan_policy
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.