In this Python tutorial, we will learn about the “Python Scipy Confidence Interval” with certain examples related to its use. Additionally, we will cover the following topics.
- Python Scipy Confidence Interval
- Python Scipy Confidence Interval T-Test
- Python Scipy Confidence Interval Mean
- Python Scipy Confidence Interval Proportion
- Python Scipy Confidence Interval T Distribution
- Python Scipy Confidence Interval Binomial
- Python Scipy Confidence Interval Linear Regression
- Python Scipy Confidence Interval Difference
- Python Scipy Confidence Interval Sample
Python Scipy Confidence Interval
A confidence interval (CI) is a set of values that are expected to include a population value with a high degree of certainty. When a population means falls between two intervals, it is commonly stated as a percentage.
The degree of uncertainty or certainty in a sampling process is measured by confidence intervals. They can use any number of confidence levels, with a 95 percent or 99 percent confidence level being the most prevalent. Statistical tools such as the t-test are used to calculate confidence intervals.
For instance, a researcher may randomly select different samples from the same population and compute a confidence interval for every sample to determine how well it represents the real value of the population variable. The datasets that arise are all unique, some intervals contain the genuine population parameter while others don’t.
But “What does it mean to have a 95% or 99% confidence interval?” The 95 or 99 percent confidence interval is a set of numbers within which you may be 95% or 99% confident that the true population means is contained.
Approximately 95% of the intervals produced could capture the true population mean if the sampling technique were performed multiple times.
Read: Python Scipy Chi-Square Test
Python Scipy Confidence Interval T-Test
The t-test is a statistical test for comparing the means of two groups. It’s frequently used in hypothesis testing to see if a method or treatment has an impact on the population of interest or if two groups differ from one another.
The Python Scipy has four different kinds of methods ttest_1samp()
, ttest_ind()
, ttest_ind_from_stats()
and ttest_rel()
.
- ttest_1samp() : Compute the T-test for ONE set of scores’ mean.
- ttest_ind(): Compute the T-test for two independent scores samples.
- ttest_ind_from_stats() : From descriptive statistics, a T-test for the means of two independent samples.
- ttest_rel() : Compute the t-test on two related scores samples, a and b.
Here we will learn about the only method ttest_1samp()
, to know the rest of the method, please visit the official website of Python SciPY.
The syntax of the method is given below.
scipy.stats.ttest_1samp(a,axis=0, popmean, nan_policy='propagate', alternative='one-sided')
Where parameters are:
- a(array_data): Observational sample.
- axis(int): The test will be computed along this axis; the default value is 0. If None, perform the computation across the entire array a.
- popmean(): In a null hypothesis, this is the expected value. If it’s array_data, it should have the exact shape, minus the axis dimension.
- alternative: The alternative hypothesis is defined here. There are several alternatives such as two-sided, less, and greater.
- nan_policy: When the input contains nan, this property specifies how to handle it. There are several options available, the default is ‘propagate’:
- ‘propagate’: It is an option that returns nan.
- ‘raise’: It causes an error to be thrown.
- ‘omit’: It ignores nan values when performing calculations.
Let’s take an example by following the below steps:
Import the required libraries or methods using the below python code.
import numpy as np
from scipy.stats import ttest_1samp
First, we’ll make an array to hold the 12 plants’ measurements using the below code.
samp_data = [11, 11, 13, 10, 9, 14, 12, 11, 12, 10, 12, 11]
Perform the one-sample test using the method ttest_1samp() as shown in the below code.
ttest_1samp(a=samp_data, popmean=14)
The two-sided p-value for the t-test statistic is 3.2025, and the t-test statistic is 6.7393.
For this one-sample t-test, the following are the two hypotheses:
- H0(Null Hypothesis): The plant has a 14-inch mean height ( µ = 14)
- H1(Alternative Hypothesis): The mean height isn’t 14 inches tall. (µ ≠14)
Here p-value is greater than 0.5, so we reject the null hypothesis and accept the alternate hypothesis.
Read: Python Scipy FFT
Python Scipy Confidence Interval Mean
A confidence interval for a mean is a set of values that, with a particular level of confidence, is likely to include the population mean.
The Formula of the Confidence Interval is given below.
Where parameters are:
x̅: represents the sample mean.
t: The t-value that corresponds to the level of confidence.
s: Standard deviation of the sample.
n: Number of samples.
If we have a small sample such as less than 30, we may construct a confidence interval for a population mean using the scipy.stats
Python library’s t.interval()
function.
Let’s understand with an example by following the below steps:
Import the required libraries using the below python code.
from scipy import stats
import numpy as np
Create sample data using the below code.
samp_data = [15, 15, 15, 15, 14, 17, 18, 21, 22, 24, 26, 29, 29, 30, 25]
Create a confidence interval of 99% using the below code.
stats.t.interval(alpha=0.95, loc=np.mean(samp_data), df=len(samp_data)-1, scale=stats.sem(samp_data))
The genuine population means has a 95% confidence interval of (17.764, 24.235).
Read: Python Scipy Matrix + Examples
Python Scipy Confidence Interval Proportion
The Python Scipy contains a method BinomTestResult.proportion_ci()
in a module scipy.stats._result_classes
that determines the estimated proportion’s confidence interval.
The syntax is given below.
BinomTestResult.proportion_ci(confidence_level=0.99, method='wilson')
Where parameters are:
- confidence_level(float): The level of confidence for the estimated proportion’s computed confidence interval. 0.95 is the default.
- method: Chooses the method for calculating the confidence interval for a proportion estimate:
- ‘exact’: The Clopper-Pearson exact approach should be used.
- ‘Wilson’: Wilson’s approach without continuity correction is referred to as ‘Wilson.’
- ‘wilsoncc’: Wilson’s technique includes continuity correction.
The method BinomTestResult.proportion_ci()
returns ci
(The confidence interval’s lower and upper bounds are stored in the object’s low and high attributes).
Read: Scipy Linalg – Helpful Guide
Python Scipy Confidence Interval Binomial
The binomial distribution is a probability distribution that expresses the likelihood of a value taking one of two independent values given a set of factors or assumptions. Here in this section, we will calculate the confidence interval using the binomial distribution.
The Python Scipy module scipy.stats
contains a method binom.interval()
, using this method we will calculate the CI. Let’s see with an example by following the below steps:
Import the required libraries using the below python code.
from scipy import stats
import numpy as np
Create sample data using the below code.
samp_data = [2,5,3,7,9,5,7,2,6,7]
Calculate the confidence interval using the below code.
stats.binom.interval(alpha=0.99,
n=len(samp_data)-1,
loc=np.mean(samp_data),
p=stats.sem(samp_data))
This is how to compute the confidence interval for the binomial distribution.
Read: Scipy Normal Distribution
Python Scipy Confidence Interval T Distribution
When the population standard deviation is unknown and the data are from a normally distributed population, the t-distribution characterizes the normalized distances between sample means and the population mean.
- In other words, The T distribution also known as Student’s T Distribution is a group of distributions that resemble the normal distribution curve but are slightly shorter and fatter.
- When there are few samples, the t distribution is utilized rather than the normal distribution The t distribution resembles the normal distribution more like the sample size increases.
- In reality, the distribution is nearly identical to the normal distribution for sample sizes of more than 20.
Below is the given picture of the Normal and T Distribution shapes.
We have already done the example related to T Distribution, please refer to the sub-section “Python Scipy Confidence Interval Mean” of this tutorial.
Read: Scipy Ndimage Rotate
Python Scipy Confidence Interval Linear Regression
The Python Scipy module scipy.stats
contains a method linregress()
that is used for two sets of measurements to perform a linear least-squares regression. Here we will calculate the linear regression between two variables x and y, then find the confidence interval on the slope and intercept of the calculated linear regression.
The syntax is given below.
scipy.stats.linregress(x, y=None, alternative='less')
Where parameters are:
- x,y(array_data): There are two measurement sets. The length of both arrays should be the same. If only x is specified (and y=None), the array must be two-dimensional, with one dimension having a length of two.
- alternative: The alternative hypothesis is defined here. There are several alternatives such as two-sided, less, and greater.
The method linregress()
returns the slope
, intercept
, rvalue
, pvalue
, stderr
, and intercept_err
of type float.
Let’s understand by an example by following the below steps:
Import the required libraries using the below python code.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import linregress, t
Create a random number generator and generate x and y data using the below code.
rand_numgen = np.random.default_rng()
x_data =rand_numgen.random(15)
y_data = 1.5*x_data + rand_numgen.random(15)
Compute the linear regression using the below code.
lin_res = linregress(x_data, y_data)
Print the slope and intercept using the below code.
print("Slope",lin_res.slope)
print("Intercept",lin_res.intercept)
Plot the data and the fitted line together on a graph using the below code.
plt.plot(x_data, y_data, 'o', label='original data')
plt.plot(x_data, lin_res.intercept + lin_res.slope*x_data, 'r', label='fitted line')
plt.legend()
plt.show()
Compute the 95% confidence interval for the slope and intercept using the below code.
t_inv = lambda prob, degree_of_freedom: abs(t.ppf(prob/2, degree_of_freedom))
Print the confidence interval on the slope and intercept using the below code.
Read: Scipy Integrate + Examples
Python Scipy Confidence Interval Difference
Let’s say we have two sets of data from a matched-pairs experiment that are not independent of each other, and we want to build a confidence interval for the mean difference between the two samples. What is the procedure for calculating the confidence interval? Assume we’ve decided on a confidence level of 0.05.
Import the required libraries using the below python code.
from scipy import stats
import numpy as np
Specify the 95% level of confidence which is represented by alpha using the below code.
alp = 0.05
Create two sample data using the below code.
samp_data1 = np.array([14, 17, 10, 19, 7, 20])
samp_data2 = np.array([ 3, 2, 10, 16, 10, 8])
Compute the difference between a sample and no of observations in each sample using the below code.
diffsamp = samp_data1 - samp_data2
len_no_obs = len(samp_data1)
Also, compute the mean and variance differences, critical value, and radius of CI using the below code.
diffmean = np.mean(diffsamp)
diffvar = np.var( diffsamp, ddof=1 )
criticalvalue = stats.t.ppf(q = 1-alp/2, df = len_no_obs - 1)
rad = criticalvalue*np.sqrt(diffvar)/np.sqrt(len_no_obs)
Now compute the Confidence interval difference using the below code.
print(diffmean - rad, diffmean + rad)
This is how to find the confidence interval difference.
Read: Scipy Signal – Helpful Tutorial
Python Scipy Confidence Interval Sample
Here in this section, we will create a function that will compute the confidence interval from given sample data.
Let’s follow the below steps to create a method or function.
Import the required libraries using the below python code.
from scipy import stats
import numpy as np
Create a function to compute the confidence interval from a given sample of data using the below code.
def m_conf_intval(samp_data, confid=0.95):
data = 1.0 * np.array(samp_data)
len_n = len(data)
mean, std_err = np.mean(data), stats.sem(data)
h = std_err * stats.t.ppf((1 + confid) / 2., len_n-1)
return mean, mean-h, mean+h
Now, provide sample data to the above-created method using the below code.
data = [2,4,6,3,8,9,4]
m_conf_intval(data)
Look at the output, the range of confidence interval is 2.729 to 7.556.
In the above code, we have created a method m_conf_intval()
to compute the confidence interval from a given data or sample.
Also, take a look at some more Python SciPy tutorials.
- Scipy Convolve – Complete Guide
- How to use Python Scipy Linprog
- Python Scipy Eigenvalues
- Scipy Stats – Complete Guide
- Scipy Optimize – Helpful Guide
- Python Scipy Distance Matrix
So, in this tutorial, we have learned about the “Python Scipy Confidence Interval” and covered the following topics.
- Python Scipy Confidence Interval
- Python Scipy Confidence Interval T-Test
- Python Scipy Confidence Interval Mean
- Python Scipy Confidence Interval Proportion
- Python Scipy Confidence Interval T Distribution
- Python Scipy Confidence Interval Binomial
- Python Scipy Confidence Interval Linear Regression
- Python Scipy Confidence Interval Difference
- Python Scipy Confidence Interval Sample
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.