In this Python tutorial, we will learn about the “Python Scipy Normal Test” to check the normality of the sample. And additionally, we will also cover the following topics using some examples.
- What is Normal test
- Python Scipy Normal Test
- Python Scipy Normal Test Interpretation
- Python Scipy Normal Test Nan Policy
- Python Scipy Normal Test Axis
Also, check the related tutorial: Python Scipy Stats Norm
What is Normal test
The data’s normality is being tested. But what does that actually mean? The term “normality” describes a particular type of statistical distribution known as the “normal distribution,” also known as the “Gaussian distribution” or “bell-shaped curve.”
The mean and standard deviation of the data is used to define the normal distribution, a continuous symmetric distribution. The normal distribution’s form will never change, regardless of the mean or standard deviation. The data distribution under the curve is the key property.
- An idealized distribution is a normal distribution. With the normality test, you are essentially determining if your data is sufficiently close to normal to allow you to use your statistical tool without worry, rather than whether it is absolutely compatible with a normal distribution.
- You might be able to utilize a statistical tool in some circumstances without worrying about whether your data is normal since it is robust to the normality assumption. In other words, the normality test does not unnecessarily react when the normality assumption is violated to some extent.
Read: Python Scipy Stats Poisson
Python Scipy Normal Test
Python Scipy has a method normaltest()
within the module scipy.stats
to check if a sample deviates from a normal distribution.
The syntax is given below.
scipy.stats.normaltest(a, axis=1, nan_policy='omit')
Where parameters are:
- a(array_data): It is an array of data as a sample that we want to test.
- axis(int): It is used to specify on which axis to compute the test. By default, it is 0.
- nan_ploicy: It is used to handle the nan values that exist within the array. It has some parameters that handle nan values in different ways, the parameters are
omit
,propagate
andraise
.
The method normaltest()
returns the two values as statistics and the p-value of type array or float.
Let’s take an example to test the sample by following the below steps:
Import the required libraries using the below python code.
from scipy.stats import normaltest
import numpy as np
Create a random number generator and generate the normal array data with the help of a generator using the below code.
rand_numgen = np.random.default_rng()
points = 1000
a_data = rand_numgen.normal(1, 2, size=points)
b_data = rand_numgen.normal(3, 2, size=points)
Combine both the data into one array of data using the below code.
norm_array_data = np.concatenate((a_data,b_data))
norm_array_data
Perform the normal test on that array of data which is a sample using the below code.
z_score_stat, p_value = normaltest(norm_array_data)
print("Z-score statistic",z_score_stat)
print("p-value",p_value)
From the output, we can see the p-value is greater than 0.5 which means the sample is not from a normal distribution.
This is how to check the normality of the sample using the Python Scipy library.
Read: Python Scipy Kdtree
Python Scipy Normal Test Interpretation
To interpret the normal test, we need to know about the one value that returns a method normaltest()
which is the p-value.
Each normality test provides a P value. We must be aware of the null hypothesis in order to comprehend any P value. The assumption under consideration is that all the values were drawn from a population with a Gaussian distribution.
The P value provides the answer, What is the likelihood that a random sample of data would depart from the Gaussian ideal as much as these data do if that null hypothesis were true?. If the P value is greater than 0.05, the answer is Yes. If the P value is less than or equal to 0.05, the answer is No.
If the normality test’s P value is high, what conclusion should I draw? we are only able to state that the data are compatible with a Gaussian distribution. A normality test cannot establish that the data were taken from a sample of a Gaussian distribution.
The only thing the normalcy test can do is show that the divergence from the Gaussian ideal is not greater than you would anticipate observing from pure chance. This is comforting when dealing with enormous data sets. Smaller data sets reduce the normality tests’ ability to detect even slight departures from the Gaussian ideal.
If the normality test’s P value is low, what should I draw from it? The data are sampled from a Gaussian distribution, according to the null hypothesis. The null hypothesis is rejected if the P value is low enough, and as a result, the alternative hypothesis—that the data are not sampled from a Gaussian population—is accepted.
With big data sets, the distribution may be fairly close to Gaussian or extremely far from it. You cannot learn anything about the alternative distributions from the normality test.
We have four options if your P value is low enough to deem the departures from the Gaussian distribution “statistically significant”:
- The information can originate from a different identified distribution. In that case, we might be able to change our values so that they take on a Gaussian distribution. Transform all values to their logarithms, for instance, if the data are from a distribution with a lognormal shape.
- The failure of the normality test may be due to the existence of one or more outliers. Performing an outlier test Think about eliminating the outlier (s).
- If the deviation from normality is slight, one could decide to take no action. The Gaussian assumption is typically only lightly violated by statistical tests.
- Use nonparametric tests that don’t make the Gaussian distribution assumption. However, using nonparametric tests is a big decision, as is not using them. It shouldn’t be automated and shouldn’t be relied on a single normality test.
Read: Python Scipy Stats Kurtosis
Python Scipy Normal Test Nan
To handle nan values that can occur in a sample or array of data, the Python Scipy method normaltest()
supports the argument nan_policy
. The three alternatives or methods that nan policy has to deal with nan values are listed below.
- omit: NaN values won’t be included in the calculation. The relevant entry in the output will be NaN if the axis slice along which the statistic is computed still has insufficient data.
- raise: A ValueError will be generated if a NaN is present.
- propagate: The corresponding element of the output will be NaN if there is a NaN in the axis slice (for example, the row) along which the statistic is computed.
Let’s use the following steps to understand, using an example, how to handle nan values when performing the normal test:
Import the required libraries using the below python code.
from scipy.stats import normaltest
import numpy as np
Create an array containing nan values using the below code.
data = [2,4,6,np.nan,9,30,14,16,18,np.nan]
Perform normal tests on the above data containing nan values using the below code.
normaltest(data)
The above code shows the output as nan by default.
Again, perform the test with nan_policy
equal to omit
, this option will ignore the nan values within the data and performs the test.
normaltest(data,nan_policy ='omit')
Now Again, perform the test with nan_policy
equal to raise
, this option will throw the error for nan values within the data.
normaltest(data,nan_policy ='raise')
This is how to handle the nan values within the given data using the parameter nan_policy
of method normaltest()
of Python Scipy.
Raed: Python Scipy Stats Multivariate_Normal
Python Scipy Normal Test Axis
The Python Scipy normaltest()
function accepts axis
as a parameter to test the data’s normality along the particular axis that we learned about in the preceding subsection, “Python Scipy Stats Normal Test.”
A two-dimensional array has two corresponding axes, one running horizontally across columns (axis 1) and the other vertically across rows (axis 0).
Let’s check the normality of the data along the specific axis by following the below steps:
Import the required libraries using the below python code.
from scipy.stats import normaltest
import numpy as np
Create data using the below code.
data = [[2,4,6,7,3,8,10,13],[7,3,15,9,1,6,5,12],
[2,8,10,4,6,7,3,13],[7,5,3,15,9,1,6,12],
[2,4,6,7,3,8,10,13],[6,5,12],
[2,4,6,7,3,8,10,13],[7,3,15,9,1,6,5,12]]
Apply the normal test on the data using the below code.
normaltest(data)
Again perform the test with axis = 1 using the below code.
normaltest(data,axis=1)
You may also like to read the following Python SciPy tutorials.
- Python Scipy Minimize
- Python Scipy IIR Filter
- Python Scipy Freqz
- Python Scipy Exponential
- Scipy Find Peaks
- Python Scipy ttest_ind
- Python Scipy Gamma
So, in this tutorial, we have learned about the “Python Scipy Normal Test” and covered the following topics.
- What is Normaltest
- Python Scipy Normal Test
- Python Scipy Normal Test Interpretation
- Python Scipy Normal Test Nan Policy
- Python Scipy Normal Test Axis
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.