Python Scipy Mann Whitneyu – Helpful Tutorial

In this Python tutorial, we will learn about the “Python Scipy Mann Whitneyu” where we will compare the independent samples for the sample population and cover the following topics.

  • What is Mann Whitney
  • Python Scipy Mann Whitneyu
  • Python Scipy Mann Whitneyu Nan Policy
  • Python Scipy Mann Whitneyu Example
  • Python Scipy Mann Whitneyu Axis

Also, check the latest related tutorial: Python Scipy Stats Norm

What is Mann Whitney

The Mann-Whitney U-test, also known as the Wilcoxon rank-sum test, is a non-parametric statistic hypothesis test that is used to compare two independent samples of ordinal data. Two samples are picked at random for this test, and it is used to determine whether they come from the same population.

Mann-Whitney The U test is a non-parametric test, therefore it makes no assumptions about how the scores will be distributed. But there are some presumptions that are made.

  1. The population sample is chosen at random.
  2. Mutual independence and independence within the samples are presumptions. Consequently, a particular observation belongs to one group or the other (it cannot be in both).
  3. It is believed that measurements are made using an ordinal scale.

Read: Python Scipy Stats Skew

Scipy Mann Whitneyu

The Python Scipy contains a method mannwhitneyu() in a module scipy.stats for this kind of test.

The syntax is given below.

scipy.stats.mannwhitneyu(x, y, alternative='two-sided', use_continuity=False, axis=0, nan_policy='propagate', method='auto')

Where parameters are:

  • x, y(array_data): N-d sample arrays. Other than along the dimension specified by the axis, the arrays must be broadcastable.
  • alternative : explains the alternative hypothesis. “Two-sided” is the default. Let F(u) and G(u) represent the underlying distributions’ cumulative distribution functions for x and y, respectively. The following alternate hypothesis is thus possible greater, two-sided and less.
  • use_continuity(boolean): whether or not to apply a continuity correction (1/2). The default value is True if the procedure is “asymptotic”; otherwise, it has no impact.
  • axis(int): The input axis on which to calculate the statistic, if an int. An equivalent element of the output will contain the statistic for each axis slice (for example, row) of the input. If None, before computing the statistic, the input will be raveled.
  • nan_policy: Specifies how to handle NaNs in input.
  1. propagate: The corresponding element in the output will be NaN if a NaN is present in the axis slice (for example, the row) along which the statistic is computed.
  2. NaN values will not be included in the calculation.
    The relevant entry in the output will be NaN if the axis slice along which the statistic is computed still has insufficient data.
  3. raise: A ValueError will be thrown if a NaN is present.
  • method: Determines how the p-value will be calculated. “Auto” is the default. There are the following alternatives.
  1. ‘asymptotic’: compare the standardized test statistic against the normal distribution, correcting for ties.
  2. ‘exact’: computes the exact p-value by comparing the observed statistic u against the exact distribution of u statistic under the null hypothesis. No correction is made for ties.
  3. ‘auto’: chooses ‘exact’ when the size of one of the samples is less than 8 and there are no ties; choose ‘asymptotic’ otherwise.

The method mannwhitneyu() returns res( which contains the statistic and p-value) of type float.

Let’s take an example by following the below steps:

Suppose type II diabetes was discovered in nine randomly selected young adults at the ages shown below.

men = [24,19,16,22,29]
women = [12,20,17,11]

To determine whether there is a statistically significant difference between the diagnostic ages of males and females, we employ the Mann-Whitney U test. The distribution of male diagnostic ages must match the distribution of female diagnosis ages in order for the null hypothesis to hold.

In order to reject the null hypothesis in favor of the alternative, that the distributions are different, we determine that a confidence level of 95% is necessary. We may compare the observed test statistic to the exact distribution of the test statistic under the null hypothesis because there are very few samples and no ties in the data.

Import the required libraries or methods using the below python code.

from scipy import stats
stat, p_value = stats.mannwhitneyu(men,women)
print(stat)
Scipy Mann Whitney
Scipy Mann Whitney

Mann-Whitney reports the statistic related to the first sample, in this case, the men sample. This is in line with what Um = 3 reported.

Read: Python Scipy Stats Mode

Python Scipy Mann Whitneyu Nan Policy

The method mannwhitneyu() of Python Scipy accepts a parameter nan_policy to deal with nan values that exist within a sample or array of data. nan_policy has three options or ways to deal with the nan values, and that is given below.

  • omit: NaN values won’t be included in the calculation. The relevant entry in the output will be NaN if the axis slice along which the statistic is computed still has insufficient data.
  • raise: A ValueError will be generated if a NaN is present.
  • propagate: The corresponding element of the output will be NaN if there is a NaN in the axis slice (for example, the row) along which the statistic is computed.

Let’s understand with an example how to deal with nan values while performing the Mann Whitneyu test by following steps:

Here we will use the same example that we have used in the above subsection “Python Scipy Mann Whitneyu”.

Import required libraries using the below python code.

from scipy import stats
import numpy as np

Suppose type II diabetes was discovered in nine randomly selected young adults at the ages shown below.

men = [24,19,np.nan,22,29]
women = [12,20,np.nan,11]

Look at the above two arrays that contain nan values.

Now use the parameter nan_policy with a value equal to omit in a method mannwhitneyu using the below code.

stat, p_value = stats.mannwhitneyu(men,women,)
print(stat)

In the above code, the value omit ignores the nan values in a sample or array.

Read: Python Scipy Freqz

Python Scipy Mann Whitneyu Example

We already know that a Mann-Whitney test is used in situations where sample numbers are less than 30 and sample distributions are non-normal, the U test is employed to compare differences between two samples.

The two-sample t-test is thought of as having a nonparametric equivalent.

Let’s take an example that researchers want to discover if a gasoline treatment affects a vehicle’s average mpg. 15 vehicles with the fuel treatment and twelve without it are measured for mpg as a test.

To find out if there is a statistically significant difference in mpg between the two groups, the researchers decided to do a Mann-Whitney U test because the sample sizes are tiny and they believe that the sample distributions are not normally distributed.

Import the required libraries using the below python code.

from scipy.stats import mannwhitneyu

To start, we’ll make two arrays to store the mpg values for each category of vehicles using the below code.

vehicle_grp1 = [18, 17, 18, 24, 20, 23, 21, 25, 20, 24, 23, 19, 16, 14, 12]
vehicle_grp2 = [24,17, 28, 24, 25, 21, 22, 23, 18, 27, 21, 23, 29, 28, 18]

Apply the Mann-Whitney U Test using the below code.

mannwhitneyu(vehicle_grp1, vehicle_grp2, alternative='two-sided')
Python Scipy Mann Whitneyu Example
Python Scipy Mann Whitneyu Example

The null and alternate hypotheses used in this case of the Mann-Whitney U test are as follows:

  • H0: The two groups’ mpgs are equal.
  • HA: The two groups’ mpgs are not equivalent.

We are unable to accept the null hypothesis because the p-value (0.2114) is very less than 0.05.

This indicates that there is not enough data to conclude that the true mean mpg differs between the two groups.

Read: Python Scipy Minimize

Python Scipy Mann Whitneyu Axis

The method mannwhitneyu() of Python Scipy has a parameter axis to compute the statistic along the axis along if the input is an integer.

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

from scipy.stats import mannwhitneyu

Create a two-dimensional data x and y using the below code.

x_data = [[3,4,5,3,4,8,1],[1,3,8,4,9,5,7]]
y_data =[[7,5,9,4,8,3,1],[1,8,4,3,5,4,3]]

Perform the Mann Whitneyu test on the above data with axis =0 which by default value using the below code.

mannwhitneyu(x_data, y_data)

Now again, perform the test with axis = 1 using the below code.

mannwhitneyu(x_data, y_data,axis = 1)
Python Scipy Mann Whitneyu Axis
Python Scipy Mann Whitneyu Axis

In the above code, we have noticed the output of both codes with axis = 0 and axis =1 has different results.

This is how to perform the Mann Whiteneyu test on given data based on axis in Python Scipy.

Also, take a look at some more Python SciPy tutorials.

So, in this tutorial, we have learned about the “Python Scipy Mann Whitneyu” and covered the following topic.

  • What is Mann Whitney
  • Python Scipy Mann Whitneyu
  • Python Scipy Mann Whitneyu Nan Policy
  • Python Scipy Mann Whitneyu Example
  • Python Scipy Mann Whitneyu Axis