# Python Scipy ttest_ind – Complete Guide

In this Python tutorial, we will learn about the “Python Scipy Ttest_ind” to evaluate one or more populations’ means through hypothesis testing and how to implement it using Python Scipy. Additionally, cover the following topics.

• What is a T-test in the Statistic
• Python Scipy ttest_ind
• Python Scipy ttest_ind alternative
• Python Scipy ttest_ind nan
• Python Scipy ttest_ind output
• Python Scipy ttest_ind equal_var
• Python Scipy ttest_ind axis
• Python Scipy ttest_ind statistic
• Python Scipy ttest_ind degrees of freedom

## What is a T-test in the Statistic

When comparing the means of two groups and their relationships, a t-test is an inferential statistic used to assess whether there is a significant difference. When data sets have a normal distribution and unknown variances, t-tests are utilized.

When evaluating a hypothesis, the t-test uses the t-statistic, the values of the t-distribution, and the degrees of freedom to assess statistical significance. The t-test establishes the problem statement mathematically by taking a sample from each of the two sets. The two means being equal is taken as the null hypothesis.

Three essential data values are needed to calculate a t-test. They consist of the mean difference, the standard deviation of each group, and the total number of data values for each group, as well as the difference between the mean values from each data set.

The difference’s effect on chance and whether it is outside that range of chance are both determined by this comparison. The t-test investigates if the difference between the groups is a genuine difference in the study or merely a chance difference.

In this tutorial, we will compute the T-test of the independent samples using the method of Python Scipy.

Also, check: Python Scipy Stats Norm

## Python Scipy ttest_ind

To compute the T-test using the means of two independent scoring samples. The Python Scipy has a method `ttest_ind()` in a module `scipy.stats`. This is a test of the null hypothesis that the average values of the two independent samples are the same. This test takes for granted that the populations’ variances are identical.

The syntax is given below.

``scipy.stats.ttest_ind(a, b, axis=0, nan_policy='propagate', equal_var=False,  permutations=None, random_state=None, trim=0, alternative='two-sided')``

Where the parameters are:

• a,b(array_data): The arrays must be identical in shape, except for the axis-corresponding dimension.
• axis(int): A axis along which the test is computed. Calculate using the entire arrays, a, and b, if None.
• nan_policy: Explains what to do when input contains nan. The following choices are available (‘propagate’ is the default):
1. ‘propagate’: nan is returned.
2. “raise”: throws a mistake
3. ‘omit’: Calculations are done by ignoring nan values.
• equal_var(boolean): Perform a typical independent two-sample test with identical population variances if True (the default). If False, carry out Welch’s t-test, which does not require equal variance across the population.
• permutations: Calculate p-values using the t-distribution if 0 or None (the default) is selected. If not, the number of random permutations that will be used to calculate the p-values for the permutation test is called permutations. An exact test is conducted in its place if the number of permutations equals or exceeds the number of different partitions of the pooled data.
• random_state(int, numpy generator): numpy.random is used if seed is None (or np.random). It uses a singleton of RandomState. If the seed is an integer, a fresh instance of RandomState is created and seeded with the seed. A Generator or RandomState instance is utilized if the seed already has one. State of the pseudorandom number generator used to produce permutations.
• trim(float): Performs a trimmed (Yuen’s) t-test if the result is non-zero. specifies how many elements from either end of the input samples should be removed. If 0 (the default), no trimming will be done to any components on either side. The floor of the trim multiplied by the number of items is the number of trimmed elements from each tail. The allowed range is [0, 5].
• alternative: Describes the alternative hypothesis. The following choices are available (the default is “two-sided”):
1. “Two-sided” signifies that the distributions’ means, which the samples are drawn from are not equal.
2. The first sample’s underlying distribution’s mean is lower than the second sample’s underlying distribution’s mean, which is expressed as “less”.
3. “Greater” means that the distribution’s mean for the first sample is higher than that of the second sample’s distribution.

The method `ttest_ind()` returns the `statistic` and `pvalue` of type float array.

Let’s take an example and compute the T-test of the independent samples by following the below steps:

Import the required libraries using the below python code.

``````import numpy as np
from scipy.stats import norm, ttest_ind``````

Define random number generator using `np.random.default_rng()` and generate two samples from norm distribution with the same means using the method `norm.rvs()`.

``````rnd_num_gen = np.random.default_rng()
samp1 = norm.rvs(loc=3, scale=7, size=250, random_state=rnd_num_gen)
samp2 = norm.rvs(loc=3, scale=7, size=250, random_state=rnd_num_gen)
``````

Now perform the T-test on the samples with the same means using the below code.

``ttest_ind(samp1,samp2)``

Here the ttest_ind returns two values, a statistic = 0.295 and pvalue = 0.76.

## Python Scipy ttest_ind alternative

The parameter `alternative` of the method `ttest_ind()` is used to describe the alternative hypothesis.

The alternative parameter accepts the following options.

1. “two-sided”: signifies that the distributions’ means, which the samples are drawn from are not equal.
2. “less”: The first sample’s underlying distribution’s mean is lower than the second sample’s underlying distribution’s mean, which is expressed as “less”.
3. “greater”: means that the distribution’s mean for the first sample is higher than that of the second sample’s distribution.

Let’s understand with an example how to perform the T-test with an alternative hypothesis by following the below steps:

Import the required libraries or methods using the below python code.

``from scipy.stats import ttest_ind``

Create a sample using the below code.

``````samp_1 = [[1.2,2.1,5.6,1.3],[3.4,2.1,1.6,4.8]]
samp_2 = [[2.4,1.1,3.6,5.8],[0.2,4.1,2.6,6.3]]``````

Apply the T-test with an alternative hypothesis equal `two-sided`.

``ttest_ind(samp_1,samp_2,axis =1,alternative='two-sided')``

Again apply the T-test with an alternative hypothesis equal to `less`.

Now again, perform the T-test with an alternative hypothesis equal to `greater`.

This is how to use the alternative hypothesis with the help of Python SciPy ttest_ind.

## Python Scipy ttest_ind nan

The method `ttest_ind()` accepts the parameter `nan_policy` to handle the nan values within the arrays or samples which we have learned in the above subsection.

• nan_policy: Explains what to do when input contains nan. The following choices are available (‘propagate’ is the default):
1. ‘propagate’: nan is returned.
2. “raise”: throws a mistake
3. ‘omit’: Calculations are done by ignoring nan values.

Let’s see with examples how to handle the nan values in arrays or samples while performing the T-test.

Import the required methods or libraries using the below python code.

``````from scipy.stats import ttest_ind
import numpy as np``````

Generate data with nan values using the below code.

``````data1 = np.random.randn(30)
data2 = np.random.randn(30)
mask_nan = np.random.choice([1, 0], data1.shape, p=[.1, .9]).astype(bool)

Perform the T-test on the data with nan_policy equal to `raise` using the below code.

``ttest_ind(data1,data2, nan_policy='raise')``

Again perform the T-test with nan_policy equal to `omit` using the below code.

``ttest_ind(data1,data2, nan_policy='omit')``

At last, perform the T-test with nan_policy equal to `propagate` using the below code.

``ttest_ind(data1,data2, nan_policy='propagate')``

This is how to handle the nan values within the sample while computing the T-test using the method `ttest_ind()` of Python Scipy with parameter `nan_policy`.

## Python Scipy ttest_ind output

The method `ttest_ind()` of Python Scipy returns or outputs the two values after performing the T-test on the sample. The first value is `statistic` and second `pvalue`.

Using these two values, we determine the significance of the means of two samples. To know about the method `ttest_ind()` refer to the above subsection of this tutorial “Python Scipy ttest_ind”

Let’s see with an example and compute the T-test by following the below steps:

Import the required libraries or methods using the below python code.

``from scipy.stats import ttest_ind``

Generate two sample data using the below code.

``````sample_1 = [2.4,5.1,2.6,1.8]
sample_2 = [1.4,2.1,5.6,3.8]``````

Perform the T-test to get the two values that we have discussed above.

``ttest_ind(sample_1,sample_2)``

This is how to perform the T-test on the sample and get the output to determine the significance of the sample.

## Python Scipy ttest_ind axis

The `axis` parameter of the method `ttest_ind()` of Python Scipy allows us to compute the T-test along the specified axis of the given array or sample.

The provided 2-dimensional array has two axes, one that runs vertically across rows is axis 1 and the other that runs horizontally across columns is axis 0.

Here we will see an example of how to compute the T-test along the specified axis of data by following the below steps:

Import the required libraries or methods using the below python code.

``from scipy.stats import ttest_ind``

Generate sample data using the below code.

``````samp_1 = [[1.2,2.1,5.6,1.3],[2.4,1.1,3.6,5.8]]
samp_2 = [[2.4,1.1,3.6,5.8],[1.2,2.1,5.6,1.3]]``````

Perform the T-test on the whole array which is by default.

``ttest_ind(samp_1,samp_2)``

Now compute the T-test on the specified axis of the data using the below code.

``ttest_ind(samp_1,samp_2,axis =1)``

This is how to compute the T-test along the specified axis of the given array or sample using the method `ttest_ind()` with parameter `axis`.

## Python Scipy ttest_ind equal_var

If we have data samples with equal variances, then what we will do in that case?, We will use the parameter `equal_var` of method `ttest_ind()` of type boolean of Python Scipy.

When there is the same number of samples in each group or when the variance of the two data sets is comparable, the identical variance t-test, an independent t-test, is used.

The parameters accept two values `True` or `False`. Let’s see with an example by following the below steps:

Import the required libraries or methods using the below code.

``````import numpy as np
from scipy.stats import norm, ttest_ind``````

Generate data with equal variance using the below code.

``````rnd_num_gen = np.random.default_rng()
samp1 = norm.rvs(loc=4, scale=5, size=100, random_state=rnd_num_gen)
samp2 = norm.rvs(loc=4, scale=5, size=200, random_state=rnd_num_gen)``````

Compute the T-test on the above sample with equal variances using the below code.

``ttest_ind(samp1,samp2)``

This is how to compute the T-test of the sample with equal means using the method `ttest_ind()` with parameter `equal_var`.

## Python Scipy ttest_ind statistic

The method `ttest_ind()` of Python Scipy returns the value t-statistic that we have already learned in the subsection Python Scipy ttest_ind output. The t-statistic measures how far an estimated value of a parameter deviates from its hypothesized value about its standard error.

Let’s do an example by following the below steps:

Import the required libraries or methods using the below python code.

``from scipy.stats import ttest_ind``

Generate sample data using the below code.

``````samp_data1 = [[0.2,5.1,1.6,1.3],[2.4,1.1,3.6,5.8]]
samp_data2 = [[1.4,2.1,5.6,3.8],[2.2,5.1,1.6,5.3]]``````

Compute the T-test and get the `t-statistic` value using the below code.

``ttest_ind(samp_data1,samp_data2)``

In the above output, `statistic=array([-0.42717883, -0.2,....)]` is the t-statistic value.

## Python Scipy ttest_ind degrees of freedom

First, we are going to know about “What are degrees of freedom?“, The number of independent data points used to calculate an estimate is referred to as the degree of freedom of the estimate.

It’s not the same as the sample’s sample size. We must deduct 1 from the total number of items to obtain the degrees of freedom for the estimate.

Imagine we were looking for the average weight loss for a diet. One option is to utilise 50 persons with df = 49, or 10 people with 9 degrees of freedom (10 – 1 = 9).

The amount of values in a data collection that is free to change is another way to think about degrees of freedom. “Free to change” – what does that mean? The mean (average) is used in the following example:

Choose a group of numbers with an average (mean) of 10, Like we could choose from the following sets of numbers: 7, 9, 11, 2, 10, 9, or 4, 8, 12.

The third number in the set is fixed once we’ve selected the first two. In other words, we are unable to select the third piece from the group. The first two numbers are the only ones that can change. We can choose 7 + 9 or 2 + 10, but once we’ve made our choice, we must select a specific number that will yield the desired mean. Therefore, a set of three numbers has TWO degrees of freedom.

Also, take a look at some more Python SciPy tutorials.

So, in this tutorial, we have learned about the “Python Scipy ttest_ind” and covered the following topics.

• What is a T-test in the Statistic
• Python Scipy ttest_ind
• Python Scipy ttest_ind alternative
• Python Scipy ttest_ind nan
• Python Scipy ttest_ind output
• Python Scipy ttest_ind equal_var
• Python Scipy ttest_ind axis
• Python Scipy ttest_ind statistic
• Python Scipy ttest_ind degrees of freedom