Python Scipy Normal Test [With Examples]

In this Python tutorial, we will learn about the “Python Scipy Normal Test” to check the normality of the sample. And additionally, we will also cover the following topics using some examples.

  • What is Normal test
  • Python Scipy Normal Test
  • Python Scipy Normal Test Interpretation
  • Python Scipy Normal Test Nan Policy
  • Python Scipy Normal Test Axis

Also, check the related tutorial: Python Scipy Stats Norm

What is Normal test

The data’s normality is being tested. But what does that actually mean? The term “normality” describes a particular type of statistical distribution known as the “normal distribution,” also known as the “Gaussian distribution” or “bell-shaped curve.”

The mean and standard deviation of the data is used to define the normal distribution, a continuous symmetric distribution. The normal distribution’s form will never change, regardless of the mean or standard deviation. The data distribution under the curve is the key property.

  • An idealized distribution is a normal distribution. With the normality test, you are essentially determining if your data is sufficiently close to normal to allow you to use your statistical tool without worry, rather than whether it is absolutely compatible with a normal distribution.
  • You might be able to utilize a statistical tool in some circumstances without worrying about whether your data is normal since it is robust to the normality assumption. In other words, the normality test does not unnecessarily react when the normality assumption is violated to some extent.

Read: Python Scipy Stats Poisson

Python Scipy Normal Test

Python Scipy has a method normaltest() within the module scipy.stats to check if a sample deviates from a normal distribution.

The syntax is given below.

scipy.stats.normaltest(a, axis=1, nan_policy='omit')

Where parameters are:

  • a(array_data): It is an array of data as a sample that we want to test.
  • axis(int): It is used to specify on which axis to compute the test. By default, it is 0.
  • nan_ploicy: It is used to handle the nan values that exist within the array. It has some parameters that handle nan values in different ways, the parameters are omit, propagate and raise.

The method normaltest() returns the two values as statistics and the p-value of type array or float.

Let’s take an example to test the sample by following the below steps:

Import the required libraries using the below python code.

from scipy.stats import normaltest
import numpy as np

Create a random number generator and generate the normal array data with the help of a generator using the below code.

rand_numgen = np.random.default_rng()
points = 1000
a_data = rand_numgen.normal(1, 2, size=points)
b_data = rand_numgen.normal(3, 2, size=points)

Combine both the data into one array of data using the below code.

norm_array_data = np.concatenate((a_data,b_data))
norm_array_data

Perform the normal test on that array of data which is a sample using the below code.

z_score_stat, p_value = normaltest(norm_array_data)
print("Z-score statistic",z_score_stat)
print("p-value",p_value)
Scipy Normal test
Scipy Normal test

From the output, we can see the p-value is greater than 0.5 which means the sample is not from a normal distribution.

This is how to check the normality of the sample using the Python Scipy library.

Read: Python Scipy Kdtree

Python Scipy Normal Test Interpretation

To interpret the normal test, we need to know about the one value that returns a method normaltest() which is the p-value.

Each normality test provides a P value. We must be aware of the null hypothesis in order to comprehend any P value. The assumption under consideration is that all the values were drawn from a population with a Gaussian distribution.

The P value provides the answer, What is the likelihood that a random sample of data would depart from the Gaussian ideal as much as these data do if that null hypothesis were true?. If the P value is greater than 0.05, the answer is Yes. If the P value is less than or equal to 0.05, the answer is No.  

If the normality test’s P value is high, what conclusion should I draw? we are only able to state that the data are compatible with a Gaussian distribution. A normality test cannot establish that the data were taken from a sample of a Gaussian distribution.

The only thing the normalcy test can do is show that the divergence from the Gaussian ideal is not greater than you would anticipate observing from pure chance. This is comforting when dealing with enormous data sets. Smaller data sets reduce the normality tests’ ability to detect even slight departures from the Gaussian ideal.

If the normality test’s P value is low, what should I draw from it? The data are sampled from a Gaussian distribution, according to the null hypothesis. The null hypothesis is rejected if the P value is low enough, and as a result, the alternative hypothesis—that the data are not sampled from a Gaussian population—is accepted.

With big data sets, the distribution may be fairly close to Gaussian or extremely far from it. You cannot learn anything about the alternative distributions from the normality test.

We have four options if your P value is low enough to deem the departures from the Gaussian distribution “statistically significant”:

  • The information can originate from a different identified distribution. In that case, we might be able to change our values so that they take on a Gaussian distribution. Transform all values to their logarithms, for instance, if the data are from a distribution with a lognormal shape.
  • The failure of the normality test may be due to the existence of one or more outliers. Performing an outlier test Think about eliminating the outlier (s).
  • If the deviation from normality is slight, one could decide to take no action. The Gaussian assumption is typically only lightly violated by statistical tests.
  • Use nonparametric tests that don’t make the Gaussian distribution assumption. However, using nonparametric tests is a big decision, as is not using them. It shouldn’t be automated and shouldn’t be relied on a single normality test.

Read: Python Scipy Stats Kurtosis

Python Scipy Normal Test Nan

To handle nan values that can occur in a sample or array of data, the Python Scipy method normaltest() supports the argument nan_policy. The three alternatives or methods that nan policy has to deal with nan values are listed below.

  • omit: NaN values won’t be included in the calculation. The relevant entry in the output will be NaN if the axis slice along which the statistic is computed still has insufficient data.
  • raise: A ValueError will be generated if a NaN is present.
  • propagate: The corresponding element of the output will be NaN if there is a NaN in the axis slice (for example, the row) along which the statistic is computed.

Let’s use the following steps to understand, using an example, how to handle nan values when performing the normal test:

Import the required libraries using the below python code.

from scipy.stats import normaltest
import numpy as np

Create an array containing nan values using the below code.

data = [2,4,6,np.nan,9,30,14,16,18,np.nan]

Perform normal tests on the above data containing nan values using the below code.

normaltest(data)

The above code shows the output as nan by default.

Python Scipy Normal Test Nan
Python Scipy Normal Test Nan

Again, perform the test with nan_policy equal to omit, this option will ignore the nan values within the data and performs the test.

normaltest(data,nan_policy ='omit')
Python Scipy Normal Test Nan Example
Python Scipy Normal Test Nan Example

Now Again, perform the test with nan_policy equal to raise, this option will throw the error for nan values within the data.

normaltest(data,nan_policy ='raise')
Python Scipy Normal Test Nan Example Raise
Python Scipy Normal Test Nan Example Raise

This is how to handle the nan values within the given data using the parameter nan_policy of method normaltest() of Python Scipy.

Raed: Python Scipy Stats Multivariate_Normal

Python Scipy Normal Test Axis

The Python Scipy normaltest() function accepts axis as a parameter to test the data’s normality along the particular axis that we learned about in the preceding subsection, “Python Scipy Stats Normal Test.”

A two-dimensional array has two corresponding axes, one running horizontally across columns (axis 1) and the other vertically across rows (axis 0).

Let’s check the normality of the data along the specific axis by following the below steps:

Import the required libraries using the below python code.

from scipy.stats import normaltest
import numpy as np

Create data using the below code.

data = [[2,4,6,7,3,8,10,13],[7,3,15,9,1,6,5,12],
        [2,8,10,4,6,7,3,13],[7,5,3,15,9,1,6,12],
        [2,4,6,7,3,8,10,13],[6,5,12],
        [2,4,6,7,3,8,10,13],[7,3,15,9,1,6,5,12]]

Apply the normal test on the data using the below code.

normaltest(data)

Again perform the test with axis = 1 using the below code.

normaltest(data,axis=1)
Python Scipy Normal Test Axis
Python Scipy Normal Test Axis

You may also like to read the following Python SciPy tutorials.

So, in this tutorial, we have learned about the “Python Scipy Normal Test” and covered the following topics.

  • What is Normaltest
  • Python Scipy Normal Test
  • Python Scipy Normal Test Interpretation
  • Python Scipy Normal Test Nan Policy
  • Python Scipy Normal Test Axis