In this Python tutorial, we will understand the use of “Scipy Stats” using various examples in Python. Additionally, we will cover the following topics.
- Scipy Stats
- Scipy Stats Lognormal
- Scipy Stats Norm
- Scipy Stats T-test
- Scipy Stats Pearsonr
- Scipy Stats chi-square
- Scipy Stats IQR
- Scipy Stats Poisson
- Scipy Stats Entropy
- Scipy Stats Anova
- Scipy Stats Anderson
- Scipy Stats Average
- Scipy Stats Alpha
- Scipy Stats Boxcox
- Scipy Stats Binom
- Scipy Stats Beta
- Scipy Stats Binomial test
- Scipy Stats Binned statistics
- Scipy Stats Binom pmf
- Scipy Stats CDF
- Scipy Stats Cauchy
- Scipy Stats Describe
- Scipy Stats Exponential
- Scipy Stats Gamma
- Scipy Stats Geometric
- Scipy Stats gmean
- Scipy Stats Gennorm
- Scipy Stats Genpareto
- Scipy Stats Gumbel
- Scipy Stats Genextreme
- Scipy Stats Histogram
- Scipy Stats Half normal
- Scipy Stats Half cauchy
- Scipy Stats Inverse gamma
- Scipy Stats Inverse normal CDF
- Scipy Stats Johnson
- Scipy Stats PDF
- Scipy Stats Hypergeom
- Scipy Stats Interval
- Scipy Stats ISF
- Scipy Stats Independent T-test
- Scipy Stats Fisher Exact
Scipy Stats
The Scipy has a package or module scipy.stats
that contains a huge number of statistical functions. Although statistics is a very broad area, here module contains the functions related to some of the major statistics.
- Summary Statistics
- Frequency Statistics
- Statistical tests
- Probability distributions
- Frequency statistics
- Correlation functions
- Quasi-Monte Carlo
- Masked statistics functions
- Other statistical functionality
Scipy Stats Lognormal
The Lognormal
represents the logarithm in normally distributed form. It is a random variable that is lognormal continuous.
The syntax is given below.
scipy.stats.lognorm.method_name(data,loc,size,moments,scale)
Where parameters are:
- data: It is a set of points or values that represent evenly sampled data in the form of array data.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis, and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.lognorm()
. The methods are given below.
- scipy.stats.lognorm.CDF(): It is used for the cumulative distribution function.
- scipy.stats.lognorm.PDF(): It is used for the probability density function.
- scipy.stats.lognorm.rvs(): To get the random variates.
- scipy.stats.lognorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.lognorm.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.lognorm.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.lognorm.sf(): It is used to get the values of the survival function.
- scipy.stats.lognorm.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.lognorm.logsf(): It is used to find the log related to the survival function.
- scipy.stats.lognorm.mean(): It is used to find the mean of the distribution.
- scipy.stats.lognorm.medain(): It is used to find the median of the distribution.
- scipy.stats.lognorm.var(): It is used to find the variance related to the distribution.
- scipy.stats.lognorm.std(): It is used to find the standard deviation related to the distribution
Read: Scipy Constants – Multiple Examples
Scipy Stats Norm
The scipy.stats.norm
represents the random variable that is normally continuous. It has different kinds of functions for normal distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.norm.method_name(data,loc,size,moments,scale)
Where parameters are:
- data: It is a set of points or values that represent evenly sampled data in the form of array data.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.norm()
. The methods are given below.
- scipy.stats.norm.CDF(): It is used for the cumulative distribution function.
- scipy.stats.norm.PDF(): It is used for the probability density function.
- scipy.stats.norm.rvs(): To get the random variates.
- scipy.stats.norm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.norm.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.norm.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.norm.sf(): It is used to get the values of the survival function.
- scipy.stats.norm.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.norm.logsf(): It is used to find the log related to the survival function.
- scipy.stats.norm.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.norm.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.norm.var(): It is used to find the variance related to the distribution.
- scipy.stats.norm.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
Create observation data values and calculate the probability density function
from these data values with mean = 0
and standard deviation = 1
.
observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.PDF(observatin_x,loc=0,scale=1)
Plot the created distribution using the below code.
plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density funciton of normal distribution")
plt.show()
Look at the output, which shows the probability density function graph of normal distribution.
Read: Scipy Optimize – Helpful Guide
Scipy Stats CDF
Scipy stats CDF
stand for Comulative distribution function
that is a function of an object scipy.stats.norm()
. The range of the CDF is from 0 to 1.
The syntax is given below.
scipy.stats.norm.CDF(data,loc,size,moments,scale)
Where parameters are:
data: It is a set of points or values that represent evenly sampled data in the form of array data.
loc: It is used to specify the mean, by default it is 0.
moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
scale: It is used to specify the standard deviation, by default it is 1.
Let’s take an example and calculate using the below steps:
Import the required libraries using the below code.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
Create observation data values and calculate the comulative distribution function
from these data values with mean = 0
and standard deviation = 1
.
observatin_x = np.linspace(-4,4,200)
CDF_norm = stats.norm.CDF(observatin_x,loc=0,scale=1)
Plot the created distribution using the below code.
plt.plot(observatin_x,CDF_norm)
plt.xlabel('x-values')
plt.ylabel('CDF_norm_values')
plt.title("Comulative distribution function")
plt.show()
From the above output, CDF is increasing and it tells that any value chosen from a population is going to have a probability less than or equal to some value x.
Read: Scipy Sparse – Helpful Tutorial
Scipy Stats Histogram
The Scipy has a method histogram()
to create a histogram from the given values that exist within a subpackage scipy.stats
. This function set apart the range into several bins and returns the instances in each bin.
The syntax is given below.
scipy.stats.histogram(a, numbins, defaultreallimits, weights)
Where parameters are:
- a (array): It is the array of data that is provided as input.
- numbins (int): It is used to set the number of bins for the histogram.
- defaultreallimits: It is used to specify the range like lower and upper values of the histogram.
- weights (array): It is used to specify the weight of each value within the array.
The above function exists in the older version of Scipy, so here we will use the same function but it can be accessed from the scipy module directly. Let’s take an example using the below steps.
Import the required libraries using the below code.
import numpy as np
import scipy
import matplotlib.pyplot as plt
Generating the histogram values and bins by passing the array [1, 2, 2, 3, 2, 3, 3]
and bin range 4
to the function histogram()
.
histogram, bins = scipy.histogram([1, 2, 2, 3, 2, 3, 3],
bins = range(4))
Viewing the values and size of histogram and bins respectively.
print ("Number of values in each bin : ", histogram)
print ("Size of the bins : ", bins)
Plot the above-created histogram using the below code.
plt.bar(bins[:-1], histogram, width = 0.9)
plt.xlim(min(bins), max(bins))
plt.show()
Look at the above output, this is how a histogram is created using the Scipy.
Read: Scipy Stats Zscore + Examples
Scipy Stats Pearsonr
The Pearsonr
is a Pearson correlation coefficient that is used to know the linear relationship between two variables and datasets. The method pearsonr()
in the subpackage scipy.stats
is used for that.
The syntax is given below.
scipy.stats.pearsonr(x, y)
Where parameters are:
- x: It is the array data.
- y: It is also the array data.
The method pearsonr()
returns two values an r
(Pearson correlation coefficient) and a p-value
. The values of r
between -1
and 1
where -1
means a strong negative relationship and 1
means a strong positive relationship, if the value is equal to 0
which means there is no relationship.
Let’s take an example by following the below steps:
Import the libraries using the below code.
from scipy import stats
Now access the method pearsonr()
and pass it two array values using the below code.
r, p_values = stats.pearsonr([1, 4, 3, 2, 5], [9, 10, 3.5, 7, 5])
Check the values of the Pearson correlation coefficient and p-value using the below code.
print('The Pearson correlation coefficient',r)
print('P-value ',p_values)
Read: Python Scipy FFT [11 Helpful Examples]
Scipy Stats PDF
Scipy stats CDF
stand for Probability density function
that is a function of an object scipy.stats.norm()
. The range of the PDF is from 0 to 1.
The syntax is given below.
scipy.stats.norm.PDF(data,loc,size,moments,scale)
Where parameters are:
data: It is a set of points or values that represent evenly sampled data in the form of array data.
loc: It is used to specify the mean, by default it is 0.
moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
scale: It is used to specify the standard deviation, by default it is 1.
Let’s take an example and calculate using the below steps:
Import the required libraries using the below code.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
Create observation data values and calculate the probability density function
from these data values with mean = 0
and standard deviation = 1
.
observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.pdf(observatin_x,loc=0,scale=1)
Plot the created distribution using the below code.
plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density function")
plt.show()
Read: Matplotlib save as PDF
Scipy Stats chi-square
The chi-square test tests the variation between actual and expected results in statistics. It is used in hypothesis testing. It is applied to categorical data. In scipy, there is a method chisquare
within subpackage scipy.stats
to do the testing.
- To use the chi-squared test sample size should be greater than 13.
- This test doesn’t work if the expected or actual frequencies in a categorical variable are very small. So keep at least five expected or actual frequencies in a categorical variable.
The syntax is given below.
scipy.stats.chisquare(f_obs, f_exp=None, ddof=0)
where parameters are:
- f_obs(array data): It is the observed frequencies in categorical variables.
- f_exp(array data): It is the expected frequencies in categorical variables.
- ddof(int): It is used to define the
Delta degrees of freedom
.
The method chisquare
the test returns two float values, the first is the chi-square test statistic and the second is the p-value.
Let’s take an example by following the below steps:
Import the method chisquare
from the module scipy.stats
using the below code.
from scipy.stats import chisquare
Create a two array type variable to store the observed and expected frequencies. Pass the two array data to the method chisquare
to perform the chi-squared test.
observed_f = [10, 25, 10, 13, 11, 11]
expected_f = [15, 15, 15, 15, 15, 5]
test_value = chisquare(f_obs=observed_f, f_exp=expected_f)
View the test result using the below code.
print('The value of chi-squared test statistic: ',test_value[0])
print('The value of p-vale: ',test_value[1])
The output shows the result of the chi-squared test. This is how to perform the chi-squared test on the categorical data to find the differences between actual and observed data using the value of the chi-squared test statistic and p-value.
Read: Scipy Misc + Examples
Scipy Stats IQR
The IQR
stand for Interquartile Range
which is the difference between the 1st (25th percentile) and the 3rd quartile (75th). It is used to measure the dispersion of data. The Scipy has a method iqr
to calculate Interquartile Range
of data on the stated axis that exists within the module scipy.stats
.
The syntax is given below.
scipy.stats.iqr(x, axis=None, rng=(25, 75), nan_policy='propagate', interpolation='linear')
Where parameters are:
- x(array data): Array or object is provided to a method.
- axis(int): It is used to specify the axis for computing the range.
- rng(Two-values in the range [0,100]: It is used to specify the percentiles on which range is calculated.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the
IQR
by ignoring the nan values. - propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
- interpolation(string): It is used to specify the interpolation method to use like linear, lower, higher, nearest, and midpoint.
The method iqr
returns the value in ndarray or scalar depending upon the provided input.
Let’s take an example to calculate the IQR
given array data by following the below steps.
Import the method iqr
from the module scipy.stats
using the below code.
from scipy.stats import iqr
Create an array of data using and pass the data to a method iqr
for calculating the IQR
.
x_data = np.array([[15, 8, 7], [4, 3, 2]])
iqr(x_data)
The above output shows the Interquartile Range
of given array data, this is how to find the IQR
of the data.
Read: Python NumPy Average
Scipy Stats Average
The Scipy has a statistical method mean to calculate the average of the given data. The mean or average is the sum of all the values divided by the number of values.
The syntax is given below.
scipy.mean(array_data,axis)
Where parameters are:
- array_data: It is the data in the array form containing all the elements.
- axis(int): It is used to specify the axis along which average or mean needs to be calculated.
The method mean()
return the arithmetic mean of the elements in the array.
Let’s understand through an example following the below steps.
Import the required libraries using the below code.
import scipy
Creating an array containing the elements whose arithmetic mean needs to be calculated.
array_data = [2,4,6,8,12,23]
Calculate the mean of the created array by passing it to the method mean()
.
scipy.mean(array_data)
The output shows the mean of the given arrays.
Scipy Stats Entropy
First, we need to know “What is entropy” entropy is a state of uncertainty in thermodynamics. But the concept of entropy has been taken in statistics which is applied while computing the probabilities. In statistics, entropy is used to assess the amount of information in distributions, variables and events.
The Scipy has a method entropy()
to calculate the entropy of distributions.
The syntax of the method entropy()
is given below.
scipy.stats.entropy(pk, qk=None, base=None, axis=0)
Where parameters are:
- pk(array): It takes the distribution.
- qk(array data): Arrangement against which the general entropy is figured. It must be in the same form as pk.
- base(float): It is used to define which logarithmic base to be used, by default natural logarithmic base.
- axis(int): It is used to specify the axis on which entropy is determined.
Follow the below steps for the demonstration of the method entropy()
.
Import the method entropy()
from module scipy.stats
.
from scipy.stats import entropy
pass the pk
values to a method to compute the entropy.
entropy([8/9, 2/9], base=2)
Read: Scipy Normal Distribution
Scipy Stats Anderson
The Anderson-Darling test estimates the null hypothesis that the sample is coming from a population that follows a specific distribution. The Scipy has a method anderson()
of module scipy.stats
for that test.
The syntax of the method anderson()
is given below.
scipy.stats.anderson(x, dist='norm')
Where parameters are:
- x(array_data): It is sample data.
- dist(): It is used to define the distribution to test in contrast to. It accepts the following values.
- ‘norm’,
- ‘expon’,
- ‘logistic’,
- ‘gumbel’,
- ‘gumbel_l’,
- ‘gumbel_r’,
- ‘extreme1’
The method anderson()
returns statistics, critical_values, and significance_level.
Read: Scipy Stats Zscore + Examples
Scipy Stats Anova
Anova refers to the Analysis of variance that test whether to accept the null hypothesis or alternate hypothesis. The Scipy has a method f_oneway
to test, the hypothesis that the population means of the given two or more groups are the same.
The syntax is given below.
scipy.stats.f_oneway(*args, axis=0)
Where parameters are:
- *args(array_data): It is sample_1, sample_2 measurement of every group.
- axis(int): It is used to specify the axis of the provided arrays as input on which the test is performed.
The method f_oneway
returns the two values statistic and p-value in float data type.
Let’s understand through demonstration by following the below steps.
Import the method f_oneway
from the module scipy.stats
using the below steps.
from scipy.stats import f_oneway
import numpy as np
Creating the multidimensional array using the below code.
first_data = np.array([[7.77, 7.03, 5.71],
[5.17, 7.35, 7.00],
[7.39, 7.57, 7.57],
[7.45, 5.33, 9.35],
[5.41, 7.10, 9.33],
[7.00, 7.24, 7.44]])
second_data = np.array([[5.35, 7.30, 7.15],
[5.55, 5.57, 7.53],
[5.72, 7.73, 5.72],
[7.01, 9.19, 7.41],
[7.75, 7.77, 7.30],
[5.90, 7.97, 5.97]])
third_data = np.array([[3.31, 7.77, 1.01],
[7.25, 3.24, 3.52],
[5.32, 7.71, 5.19],
[7.47, 7.73, 7.91],
[7.59, 5.01, 5.07],
[3.07, 9.72, 7.47]])
Pass the above-created arrays to a method f_oneway
for the testing using the below code.
f_statistic_value, p_value = f_oneway(first_data,second_data,third_data)
Check the computed values using the below code.
print('The value of F statistic test',f_statistic_value)
print('The value of p-value',p_value)
This how-to used the ANOVA test using the Scipy.
Read: Binary Cross Entropy TensorFlow
Scipy Stats T-test
The T-test
is used for testing the null hypothesis and calculating the T-test
of the mean of the given sample. There are several methods of T-test
in the Scipy module scipy.stats
but here we will learn about a specific method that is ttest_1samp
.
The syntax is given below.
scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate')
Where parameters are:
- a(array_data): It is the sample of independent observations.
- popmean(float or array_data): It is the mean or expected value of the population.
- axis(int): It is used to specify the axis on which the test is done.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the IQR by ignoring the nan values.
- propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
The method ttest_1samp
returns two float values, the t-statistic
and pvalue
.
Let’s take an example by following the below steps:
Import the required libraries stats
from Scipy using the below code.
from scipy import stats
import numpy as np
Create a constructor to generate a random number using the below code.
randomnub_gen = np.random.default_rng()
Creating the random number as a sample from the specific distribution using the below code.
random_variate_s = stats.norm.rvs(loc=6, scale=11, size=(51, 3), random_state=randomnub_gen)
View the generated data or numbers for the sample.
Now perform the T-test
on this generated random sample to know whether the sample is equal to the population mean or not.
stats.ttest_1samp(random_variate_s, 5.0)
Again perform the test with a population mean equal to zero using the below code.
stats.ttest_1samp(random_variate_s, 0.0)
From the above output result, we can reject or accept the null hypothesis based on statistics and p-value.
Read: Scipy Ndimage Rotate
Scipy Stats Half normal
The scipy.stats.halfnorm
represents the random variable that is half normally continuous. It has different kinds of functions of half-normal distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.halfnorm.method_name(data,loc,size,moments,scale)
Where parameters are:
- data: It is a set of points or values that represent evenly sampled data in the form of array data.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.halfnorm()
. The methods are given below.
- scipy.stats.halfnorm.CDF(): It is used for the cumulative distribution function.
- scipy.stats.halfnorm.PDF(): It is used for the probability density function.
- scipy.stats.halfnorm.rvs(): To get the random variates.
- scipy.stats.halfnorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.halfnorm.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.halfnorm.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.halfnorm.sf(): It is used to get the values of the survival function.
- scipy.stats.halfnorm.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.halfnorm.logsf(): It is used to find the log related to the survival function.
- scipy.stats.halfnorm.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.halfnorm.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.halfnorm.var(): It is used to find the variance related to the distribution.
- scipy.stats.halfnorm.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
Create observation data values and calculate the probability density function
from these data values with mean = 0
and standard deviation = 1
.
observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.PDF(observatin_x,loc=0,scale=1)
Plot the created distribution using the below code.
plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density funciton of half normal distribution")
plt.show()
Look at the above output, which looks half-normal distribution.
Read: Python Scipy Minimize
Scipy Stats Cauchy
The Cauchy
is a distribution like a normal distribution and belongs to members of a continuous probability distribution. It has a higher peak in comparison to the normal distribution.
The syntax is given below.
scipy.stats.cauchy.method_name(data,loc,scale)
Where parameters are:
- data: It is a set of points or values that represent evenly sampled data in the form of array data.
- loc: It is used to specify the mean, by default it is 0.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.cauchy()
. The methods are given below.
- scipy.stats.cauchy.CDF(): It is used for the cumulative distribution function.
- scipy.stats.cauchy.PDF(): It is used for the probability density function.
- scipy.stats.cauchy.rvs(): To get the random variates.
- scipy.stats.cauchy.stats(): It is used to get the standard deviation, mean, kurtosis and skew.
- scipy.stats.cauchy.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.cauchy.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.cauchy.sf(): It is used to get the values of the survival function.
- scipy.stats.cauchy.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.cauchy.logsf(): It is used to find the log related to the survival function.
- scipy.stats.cauchy.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.cauchy.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.cauchy.var(): It is used to find the variance related to the distribution.
- scipy.stats.cauchy.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by following the below steps:
Import the method cauchy
, numpy
and matplotlib
using the below code.
from scipy.stats import cauchy
import matplotlib.pyplot as plt
import numpy as np
Create a cauchy
distribution using the below code.
fig, ax = plt.subplots(1, 1)
x = np.linspace(cauchy.ppf(0.02),
cauchy.ppf(0.98), 99)
ax.plot(x, cauchy.PDF(x),
'r-', lw=5, alpha=0.6, label='cauchy PDF')
Look at the above output, this is how Cauchy looks like a normal distribution but with a taller peak.
Read: Python Scipy Confidence Interval
Scipy Stats Half cauchy
The HalfCauchy
is a distribution like a half-normal distribution and belongs to members of a continuous probability distribution. It has a higher peak in comparison to the half-normal distribution.
The syntax is given below.
scipy.stats.halfcauchy.method_name(data,loc,scale)
Where parameters are:
- data: It is a set of points or values that represent evenly sampled data in the form of array data.
- loc: It is used to specify the mean, by default it is 0.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.halfcauchy()
. The methods are given below.
- scipy.stats.halfcauchy.CDF(): It is used for the cumulative distribution function.
- scipy.stats.halfcauchy.PDF(): It is used for the probability density function.
- scipy.stats.halfcauchy.rvs(): To get the random variates.
- scipy.stats.halfcauchy.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.halfcauchy.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.halfcauchy.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.halfcauchy.sf(): It is used to get the values of the survival function.
- scipy.stats.halfcauchy.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.halfcauchy.logsf(): It is used to find the log related to the survival function.
- scipy.stats.halfcauchy.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.halfcauchy.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.halfcauchy.var(): It is used to find the variance related to the distribution.
- scipy.stats.halfcauchy.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by following the below steps:
Import the method halfcauchy
, numpy
and matplotlib
using the below code.
from scipy.stats import halfcauchy
import matplotlib.pyplot as plt
import numpy as np
Create a halfcauchy
distribution using the below code.
fig, ax = plt.subplots(1, 1)
x = np.linspace(halfcauchy.ppf(0.02),
halfcauchy.ppf(0.98), 99)
ax.plot(x, halfcauchy.PDF(x),
'r-', lw=5, alpha=0.6, label='cauchy PDF')
Scipy Stats Binom
The scipy.stats.binom
represents the discrete random variable. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.
It has one important parameter loc
for shifting the distribution.
The syntax is given below.
scipy.stats.binom.method_name(k,n,p,loc)
Where parameters are:
- k(int): It is used to define the no of successes.
- n(int): It is used to specify the no of trials.
- p(float): It is used to specify the assumed probability of success.
- loc: It is used to specify the mean, by default it is 0.
The above parameters are the common parameter of all the methods in the object scipy.stats.binom()
. The methods are given below.
- scipy.stats.binom.CDF(): It is used for the cumulative distribution function.
- scipy.stats.binom.rvs(): To get the random variates.
- scipy.stats.binom.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.binom.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.binom.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.binom.sf(): It is used to get the values of the survival function.
- scipy.stats.binom.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.binom.logsf(): It is used to find the log related to the survival function.
- scipy.stats.binom.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.binom.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.binom.var(): It is used to find the variance related to the distribution.
- scipy.stats.binom.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import binom
import matplotlib.pyplot as plt
Define the value of parameters n
p
using the below code.
p,n =0.3,4
Create an array of data using the method ppf()
(percent point function) of object binom
.
array_data = np.arange(binom.ppf(0.02, n, p),
binom.ppf(0.98, n, p))
array_data
show the probability mass function using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(x, binom.pmf(x, n, p), 'bo', ms=7, label='binom pmf')
ax.vlines(x, 0, binom.pmf(x, n, p), colors='b', lw=6, alpha=0.5)
Scipy Stats Describe
The Scipy has a method describe()
in a module scipy.stats
to find the descriptive statistics of the given data.
The syntax is given below.
scipy.stats.describe(a, axis=0, ddof=1, bias=True, nan_policy='propagate')
Where parameters are:
- a(array_data): It is the data of type array.
- axis(int): It is used to specify the axis on which statistics is calculated, by default it shows descriptive statistics on the whole array.
- ddof(int): It is used to specify the delta degrees of freedom.
- bias(Boolean): It is used to specify the Bias.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the IQR by ignoring the nan values.
- propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
The method descibe()
returns mean
, skewness
, kurtosis
and variance
in a type ndarray or float.
Let’s take an example by following the below steps:
Import the required libraries using the below code.
from scipy import stats
import numpy as np
Create an array containing 20 observations or values using the below code.
array_data = np.arange(20)
Pass the above-created array to a method describe()
for finding the descriptive statistics using the below code.
result = stats.describe(array_data)
result
Let’s view each statistic of the array using the below code.
print('Number of observation in array',result[0])
print('Minimum and maximum values in a array',result[1])
print('Mean of the array',result[2])
print('Variance of the array',result[3])
print('Skewness of the array',result[4])
print('Kurtosis of the array',result[5])
Scipy Stats Binomial test
The Binomial test finds the probability of the specific outcome by performing the many trials where only two possible outcomes exist. It is used for the null hypothesis test to assess the probability of the outcomes in the Bernoulli experiment.
The Scipy has a method binomtest()
to perform the Binomial test that exists within the module scipy.stats
.
The syntax is given below.
scipy.stats.binomtest(k, n, p=0.5, alternative='two-sided')
Where parameters are:
- k(int): It is used to define the no of successes.
- n(int): It is used to specify the no of trials.
- p(float): It is used to specify the assumed probability of success.
- alternative: It is used to specify the alternative hypothesis.
The method
returns the binomtest()
p-value
, proportion_estimate
value in float type with one more result proportion_ci
to know the confidence interval of the estimate.
Let’s understand through an example by following the below steps.
Import the method binomtest()
from the module scipy.stats
using the below code.
from scipy.stats import binomtest
Now, A phone manufacturer claims that no more than 15% of their phones are unsafe. 20 phones are inspected for safety, and 6 were found to be unsafe. Test the manufacturer’s claim.
Test_result = binomtest(6, n=20, p=0.1, alternative='greater')
View the result using the below code.
print('The p-value is ',Test_result.pvalue)
print('The estimated proportion is 6/20 ',Test_result.proportion_estimate)
print('The confidence interval of the estimate ',Test_result.proportion_ci(confidence_level=0.95))
Scipy Stats Binom pmf
In Scipy there is a method binom.pmf()
that exist in a module scipy.stats
to show the probability mass function using the binomial distribution.
The syntax is given below.
scipy.stats.binom.pmf(k,n, p,loc=0)
Where parameters are:
- k(int): It is used to define the no of successes.
- n(int): It is used to specify the no of trials.
- p(float): It is used to specify the assumed probability of success.
- loc: It is used to specify the mean, by default it is 0.
To understand with an example, please refer to above sub-section Scipy Stats Binom
where the method pmf
which stands for probability mass function is used in the example.
Scipy Stats gmean
The method gmean()
of module scipy.stats.mstats
of Scipy finds the geometric average of the given array on basis of the specified axis.
The syntax is given below.
scipy.stats.mstats.gmean(a, axis=0, dtype=None, weights=None)
Where parameters are:
- a(array_data): It is the collection of elements within an array or array data.
- axis(int): It is used to specify the axis of the array on which we want to find the geometric mean.
- dtype: It is used to specify the data type of the returned array.
- weights(array_data): It is used to specify the weight of the values, by default the weight of values is 1.0 in the array.
The method gmean()
returns the gmean
which is the geometric mean of a passed array of type ndarray
.
Let’s understand through an example by following the below steps.
Import the required libraries using the below code.
from scipy.stats.mstats import gmean
Find the geometric mean of the array [2,4,6,8]
using the below code.
gmean([2,4,6,8])
Scipy Stats Alpha
The scipy.stats.alpha
represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.alpha.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.alpha()
. The methods are given below.
- scipy.stats.alpha.CDF(): It is used for the cumulative distribution function.
- scipy.stats.alpha.PDF(): It is used for the probability density function.
- scipy.stats.alpha.rvs(): To get the random variates.
- scipy.stats.alpha.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.alpha.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.alpha.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.alpha.sf(): It is used to get the values of the survival function.
- scipy.stats.alpha.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.alpha.logsf(): It is used to find the log related to the survival function.
- scipy.stats.alpha.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.alpha.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.alpha.var(): It is used to find the variance related to the distribution.
- scipy.stats.alpha.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import alpha
import matplotlib.pyplot as plt
import numpy as np
Creates a variable for the shape parameters and assigns some values.
a = 4.3
Create an array of
data using the method ppf()
of an object alpha
using the below code.
array_data = np.linspace(alpha.ppf(0.01, a),
alpha.ppf(0.90, a), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of object alpha
of module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, alpha.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='alpha PDF')
Scipy Stats Beta
The scipy.stats.beta
represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.beta.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a,b: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.beta()
. The methods are given below.
- scipy.stats.beta.CDF(): It is used for the cumulative distribution function.
- scipy.stats.beta.PDF(): It is used for the probability density function.
- scipy.stats.beta.rvs(): To get the random variates.
- scipy.stats.beta.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.beta.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.beta.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.beta.sf(): It is used to get the values of the survival function.
- scipy.stats.beta.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.beta.logsf(): It is used to find the log related to the survival function.
- scipy.stats.beta.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.beta.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.beta.var(): It is used to find the variance related to the distribution.
- scipy.stats.beta.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import beta
import matplotlib.pyplot as plt
import numpy as np
creates two variables a and b for the shape parameters and assigns some values.
a = 3.4
b = 0.763
Create an array of
data using the method ppf()
of an object beta
using the below code.
array_data = np.linspace(beta.ppf(0.01, a,b),
beta.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object beta
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, beta.PDF(array_data, a,b),
'r-', lw=4, alpha=0.5, label='alpha PDF')
Scipy Stats Gamma
The scipy.stats.gamma
represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.gamma.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.gamma()
. The methods are given below.
- scipy.stats.gamma.CDF(): It is used for the cumulative distribution function.
- scipy.stats.gamma.PDF(): It is used for the probability density function.
- scipy.stats.gamma.rvs(): To get the random variates.
- scipy.stats.gamma.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.gamma.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.gamma.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.gamma.sf(): It is used to get the values of the survival function.
- scipy.stats.gamma.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.gamma.logsf(): It is used to find the log related to the survival function.
- scipy.stats.gamma.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.gamma.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.gamma.var(): It is used to find the variance related to the distribution.
- scipy.stats.gamma.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import gamma
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
a = 1.95
Create an array of data using the method ppf()
of an object gamma
using the below code.
array_data = np.linspace(gamma.ppf(0.01, a),
gamma.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object gamma
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gamma.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='alpha PDF')
Scipy Stats Inverse Normal CDF
Here, we will learn about the Inverse of the normal Cumulative distribution function
. As we already know about normal from the above sub-section ‘Scipy Stats Norm’. so here will use the method ppf()
which represents the inverse of the CDF
of object scipy.statst.norm
of Scipy.
scipy.stats.norm.ppf(q,loc,scale)
Where parameters are:
- q: It is used to specify the quantiles.
- loc: It is used to specify the mean, by default it is 0.
- scale: It is used to specify the standard deviation, by default it is 1.
Let’s take an example by following the below steps.
Import the library stats
using the below code.
from scipy import stats
Find the inverse of the CDF
using the below code.
stats.norm.CDF(stats.norm.ppf(0.7))
Scipy Stats Johnson
The scipy.stats
contains two objects johnsonsb()
and johnsonub()
that belongs to the family of Johnson distribution. It has different kinds of functions of distribution like CDF, PDF, median, etc.
- The method
johnsonsb()
represents the bounded continuous probability distribution whereasjohnsonub()
is the unbounded continuous probability distribution.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.alpha.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a,b: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import gamma
import matplotlib.pyplot as plt
import numpy as np
Code creates two variables a and b for the shape parameters and assigns some values.
a,b = 3.35,2.25
Create an array of data using the method ppf()
of an object johnsonsb
using the below code.
array_data = np.linspace(johnsonsb.ppf(0.01, a,b),
johnsonsb.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object johnsonsb
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, johnsonsb.PDF(array_data, a,b),
'r-', lw=4, alpha=0.5, label='johnsonsb PDF')
We can also find the distribution of Johnson’s unbounded continuous probability distribution using the same process as we have used for Johnson’s bounded continuous probability distribution.
Scipy Stats Inverse gamma
The scipy.stats.invgamma
represents the inverted random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.invgamma.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.invgamma()
. The methods are given below.
- scipy.stats.invgamma.CDF(): It is used for the cumulative distribution function.
- scipy.stats.invgamma.PDF(): It is used for the probability density function.
- scipy.stats.invgamma.rvs(): To get the random variates.
- scipy.stats.invgamma.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.invgamma.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.invgamma.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.invgamma.sf(): It is used to get the values of the survival function.
- scipy.stats.invgamma.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.invgamma.logsf(): It is used to find the log related to the survival function.
- scipy.stats.invgamma.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.invgamma.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.invgamma.var(): It is used to find the variance related to the distribution.
- scipy.stats.invgamma.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import invgamma
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
a = 3.04
Create an array of data using the method ppf()
of an object invgamma
using the below code.
array_data = np.linspace(invgamma.ppf(0.01, a),
invgamma.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object invgamma
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, invgamma.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='invgamma PDF')
Scipy Stats Gennorm
The scipy.stats.gennorm
represents the random variable that is generalized normal continuous in nature. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.gennorm.method_name(x,beta,loc,size,moments,scale)
Where parameters are:
- x: It is a set of points or values that represent evenly sampled data in the form of array data.
- beta: It is used to specify the shape.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.gennorm()
. The methods are given below.
- scipy.stats.gennorm.CDF(): It is used for the cumulative distribution function.
- scipy.stats.gennorm.PDF(): It is used for the probability density function.
- scipy.stats.gennorm.rvs(): To get the random variates.
- scipy.stats.gennorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.gennorm.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.gennorm.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.gennorm.sf(): It is used to get the values of the survival function.
- scipy.stats.gennorm.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.gennorm.logsf(): It is used to find the log related to the survival function.
- scipy.stats.gennorm.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.gennorm.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.gennorm.var(): It is used to find the variance related to the distribution.
- scipy.stats.gennorm.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import gennorm
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
beta = 1.4
Create an array of data using the method ppf()
of an object gennorm
using the below code.
array_data = np.linspace(gennorm.ppf(0.01, a),
gennorm.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object gennorm
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gennorm.PDF(array_data, beta),
'r-', lw=4, alpha=0.5, label='gennorm PDF')
Scipy Stats Genpareto
The scipy.stats.genpareto
represents the generalized Pareto random variable that is continuous in nature. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.genpareto.method_name(x,c,loc,size,moments,scale)
Where parameters are:
- x: It is a set of points or values that represent evenly sampled data in the form of array data.
- c: It is used to specify the shape.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.genpareto()
. The methods are given below.
- scipy.stats.genpareto.CDF(): It is used for the cumulative distribution function.
- scipy.stats.genpareto.PDF(): It is used for the probability density function.
- scipy.stats.genpareto.rvs(): To get the random variates.
- scipy.stats.genpareto.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.genpareto.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.genpareto.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.genpareto.sf(): It is used to get the values of the survival function.
- scipy.stats.genpareto.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.genpareto.logsf(): It is used to find the log related to the survival function.
- scipy.stats.genpareto.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.genpareto.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.genpareto.var(): It is used to find the variance related to the distribution.
- scipy.stats.genpareto.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import genpareto
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
c = 0.2
Create an array of data using the method ppf()
of an object genpareto
using the below code.
array_data = np.linspace(genpareto.ppf(0.01, c),
genpareto.ppf(0.90, c), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object genpareto
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, genpareto.PDF(array_data, c),
'r-', lw=4, alpha=0.5, label='genpareto PDF')
Scipy Stats Gumbel
The scipy.stats
contains two objects gumbel_r()
and gumbel_l()
that is used to model the left or right-skewed distribution. It has different kinds of functions of distribution like CDF, PDF, median, etc.
- The method
represents the right-skewed Gumbel continuous distribution whereasgumbel_r()
is the left-skewed Gumbel continuous distribution.gumbel_l()
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.gumbel_r.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a,b: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import gumbel_r
import matplotlib.pyplot as plt
import numpy as np
Create an array of data using the method ppf()
of an object gumbel_r
using the below code.
array_data = np.linspace(gumbel_r.ppf(0.01, a,b),
gumbel_r.ppf(0.90, a,b), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object gumbel_r
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gumbel_r.PDF(array_data),
'r-', lw=4, alpha=0.5, label='gumbel PDF')
Scipy Stats Binned statistics
The Scipy submodule scipy.stats
contains a method binned_statistic
to calculate statistics like the mean, median, sum, etc of the values with each bin.
The syntax is given below.
scipy.stats.binned_statistic(x, values, statistic='mean', bins=10, range=None)
Whare parameters are:
- x(array_data): It is a sequence of values that is binned.
- values(array_data, list(N)): It is value which is used to calculate statistics.
- statistic(string): It is used to specify what kind of statistics we want to compute like mean, sum, median, max, std, and count.
- bin(sequence or int): It is used to define the number of bins.
- range((float, float)): It defines the lower and upper range of the bins.
The method binned_statistic
returns the statistics of the bins and the bind edges of array type.
Let’s understand with an example by following the below steps:
Import the required libraries using the below code.
from scipy import stats
Create a set of values and compute the binned statistics using the below code.
set_values = [2.0, 2.0, 3.0, 2.5, 4.0]
stats.binned_statistic([2, 2, 3, 6, 8], set_values, 'mean', bins=2)
Scipy Stats Poisson
The scipy.stats.poisson
represents the random variable that is discrete in nature. It has different kinds of functions of distribution like CDF, median, etc.
It has one important parameter loc
for the mean for shifting the distribution using these parameters.
The syntax is given below.
scipy.stats.gamma.method_name(mu,k,loc,moments)
Where parameters are:
- mu: It is used to define the shape parameter.
- k: It is the data.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
The above parameters are the common parameter of all the methods in the object scipy.stats.poisson()
. The methods are given below.
- scipy.stats.poisson.CDF(): It is used for the cumulative distribution function.
- scipy.stats.poisson.rvs(): To get the random variates.
- scipy.stats.poisson.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.poisson.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.poisson.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.poisson.sf(): It is used to get the values of the survival function.
- scipy.stats.poisson.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.poisson.logsf(): It is used to find the log related to the survival function.
- scipy.stats.poisson.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.poisson.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.poisson.var(): It is used to find the variance related to the distribution.
- scipy.stats.poisson.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
mu = 0.5
Create an array of data using the method ppf()
of an object poisson
using the below code.
array_data = np.linspace(poisson.ppf(0.01, mu),
poisson.ppf(0.90, mu))
array_data
Now plot the probability density function by accessing the method PDF()
of an object poisson
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, poisson.pmf(array_data, mu), 'bo',ms=8,label='poisson pmf')
ax.vlines(array_data, 0,poisson.pmf(array_data, mu),colors='b', lw=4, alpha=0.5,)
This is how to use the binned statistics of Scipy.
Scipy Stats Geometric
The scipy.stats.geom
represents the random variable that is discrete in nature. It has different kinds of functions of geometric distribution like CDF, PDF, median, etc.
It has one important parameter loc
for the mean as we know we control the shape of distribution using these parameters.
The syntax is given below.
scipy.stats.geom.method_name(k,p,q,loc,size)
Where parameters are:
- k(float or float of array_data): It is used to specify the Bernoulli trials.
- p(float or float of array_data): It is used to specify the success probability for each trial.
- q(float or float of array_data): It represents the probabilities.
- loc: It is used to specify the mean, by default it is 0.
The above parameters are the common parameter of all the methods in the object scipy.stats.geom()
. The methods are given below.
- scipy.stats.geom.CDF(): It is used for the cumulative distribution function.
- scipy.stats.geom.PDF(): It is used for the probability density function.
- scipy.stats.geom.rvs(): To get the random variates.
- scipy.stats.geom.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.geom.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.geom.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.geom.sf(): It is used to get the values of the survival function.
- scipy.stats.geom.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.geom.logsf(): It is used to find the log related to the survival function.
- scipy.stats.geom.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.geom.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.geom.var(): It is used to find the variance related to the distribution.
- scipy.stats.geom.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import geom
import numpy as np
import matplotlib.pyplot as plt
Create an array containing the 30 values and also create a variable that contains the success probability of each trial using the below code.
array_data = np.arange(1,30,1)
p = 0.5
Now plot the probability mass function by accessing the method pmf()
of an object geom
of the module scipy.stats
using the below code.
geom_pmf_data = geom.pmf(array_data,p)
plt.plot(array_data,geom_pmf_data,'bo')
plt.show()
Scipy Stats Exponential
The scipy.stats.expon
represents the random variable that is continuous in nature. It has different kinds of functions of exponential distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.expon.method_name(x,q,loc,scale,size)
Where parameters are:
- x(float or float of array_data): It is used to specify the random variable.
- q(float or float of array_data): It represents the probabilities.
- loc: It is used to specify the mean, by default it is 0.
- scale: It is used to specify the standard deviation, by default it is 1.
- size: It is used to specify the output shape.
The above parameters are the common parameter of all the methods in the object scipy.stats.expon()
. The methods are given below.
- scipy.stats.expon.CDF(): It is used for the cumulative distribution function.
- scipy.stats.expon.PDF(): It is used for the probability density function.
- scipy.stats.expon.rvs(): To get the random variates.
- scipy.stats.expon.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.expon.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.expon.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.expon.sf(): It is used to get the values of the survival function.
- scipy.stats.expon.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.expon.logsf(): It is used to find the log related to the survival function.
- scipy.stats.expon.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.expon.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.expon.var(): It is used to find the variance related to the distribution.
- scipy.stats.expon.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import expon
import numpy as np
import matplotlib.pyplot as plt
Create an array containing the 30 values using the below code.
array_data = np.arange(-1,30,0.1)
Now plot the probability density function by accessing the method PDF()
of an object expon
of the module scipy.stats
using the below code.
expon_PDF_data = expon.PDF(array_data,0,2)
plt.plot(array_data,expon_PDF_data,'bo')
plt.show()
Scipy Stats Boxcox
The Scipy submodel has a method boxcox()
that transformed the non-normal dataset into the normal dataset.
The syntax is given below,
scipy.stats.boxcox(x, lmbda=None, alpha=None, optimizer=None)
Where parameters are:
- x(array_data): It is the input array data that should be positive and one-dimensional.
- lambda(scaler): It performs the transformation for the value.
- alpha(float): It returns the confidence interval for lambda.
- optimizer: If lambda is not set, then the optimizer finds the value of lambda.
The method boxcox()
returns two values boxcox of type ndarray and maxlog of type float.
Let’s understand with an example by following the below steps:
Import the required modules using the below code.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
Create or generate non-normal values using the below code.
non_normal_data = np.random.exponential(size = 500)
Transform the non-normal data or generated data into normal using the method boxcox()
and also save the lambda value.
transformed_data, lambda_value = stats.boxcox(non_normal_data)
Plot both data the non-normal and transformed data using the below code.
fig, ax = plt.subplots(1, 2)
sns.distplot(non_normal_data, hist = False, kde = True,
kde_kws = {'shade': True, 'linewidth': 2},
label = "Non-Normal", color ="green", ax = ax[0])
sns.distplot(transformed_data, hist = False, kde = True,
kde_kws = {'shade': True, 'linewidth': 2},
label = "Normal", color ="green", ax = ax[1])
plt.legend(loc = "upper right")
fig.set_figheight(5)
fig.set_figwidth(10)
Scipy Stats Genextreme
The scipy.stats.genextreme
represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.
It has two important parameters loc
for the mean and scale
for standard deviation, as we know we control the shape and location of distribution using these parameters.
The syntax is given below.
scipy.stats.genextreme.method_name(q,x,a,loc,size,moments,scale)
Where parameters are:
- x: It is used to define the quantiles.
- a,b,c: It is used to define the shape parameter.
- q: It is used to specify the tail of probability like lower and upper.
- loc: It is used to specify the mean, by default it is 0.
- moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
- scale: It is used to specify the standard deviation, by default it is 1.
The above parameters are the common parameter of all the methods in the object scipy.stats.genextreme()
. The methods are given below.
- scipy.stats.genextreme.CDF(): It is used for the cumulative distribution function.
- scipy.stats.genextreme.PDF(): It is used for the probability density function.
- scipy.stats.genextreme.rvs(): To get the random variates.
- scipy.stats.genextreme.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
- scipy.stats.genextreme.logPDF(): It is used to get the log related to the probability density function.
- scipy.stats.genextreme.logCDF(): It is used to find the log related to the cumulative distribution function.
- scipy.stats.genextreme.sf(): It is used to get the values of the survival function.
- scipy.stats.genextreme.isf(): It is used to get the values of the inverse survival function.
- scipy.stats.genextreme.logsf(): It is used to find the log related to the survival function.
- scipy.stats.genextreme.mean(): It is used to find the mean related to the normal distribution.
- scipy.stats.genextreme.medain(): It is used to find the median related to the normal distribution.
- scipy.stats.genextreme.var(): It is used to find the variance related to the distribution.
- scipy.stats.genextreme.std(): It is used to find the standard deviation related to the distribution
Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.
Import the required libraries using the below code.
from scipy.stats import genextreme
import matplotlib.pyplot as plt
import numpy as np
Code creates a variable for the shape parameters and assigns some values.
c = 1.95
Create an array of data using the method ppf()
of an object genextreme
using the below code.
array_data = np.linspace(genextreme.ppf(0.01, c),
genextreme.ppf(0.90,c), 90)
array_data
Now plot the probability density function by accessing the method PDF()
of an object genextreme
of the module scipy.stats
using the below code.
fig, ax = plt.subplots(1, 1)
ax.plot(array_data, genextreme.PDF(array_data,c),
'r-', lw=4, alpha=0.5, label='genextreme PDF')
Scipy Stats Dirichlet
The Scipy has an object dirichlet()
to create a distribution that belongs to a continuous multivariate probability distribution. It has some methods or functions that are given below.
- scipy.stats.genextreme.PDF(): It is used for the probability density function.
- scipy.stats.genextreme.var(): It is used to find the variance of the Dirichlet distribution.
- scipy.stats.genextreme.mean(): It is used to find the mean of the Dirichlet distribution.
- scipy.stats.genextreme.rvs(): To get the random variates.
- scipy.stats.genextreme.logPDF(): It is used to get the log related to the probability density function.
The syntax is given below.
scipy.stats.dirichlet(x,alpha)
Where parameters are:
- x(array_data): It is used to specify the quantiles.
- alpha(array_data): It is used to define the concentration parameters.
Let’s take an example by following the below steps:
Import the required libraries using the below code.
from scipy.stats import dirichlet
import numpy as np
Define the quantiles and alpha values within an array using the below code.
quant = np.array([0.3, 0.3, 0.4])
alp = np.array([0.5, 6, 16])
Now generate the Dirichlet random value using the below code.
dirichlet.PDF(quant,alp)
Scipy Stats Hypergeom
The Scipy has a method hypergeom()
in a module scipy.stats
that created hypergeom distribution by taking the objects from a bin.
The syntax is given below.
scipy.stats.hypergeom(M,n,N)
Where parameters are:
- M: It is used to define the total number of objects.
- n: It is used to define the number of objects of type Ith in M.
- N: It is a random variate that represents the number of Types I objects in N taken without replacement from the whole population.
Let’s take an example by following the below steps:
Import the required libraries using the below code.
from scipy.stats import hypergeom
import numpy as np
import matplotlib.pyplot as plt
Now, think that we have a total number of 30 phones, of which 10 are apple phones. if we want to know the probability of getting the number of apple phones if we choose at random 15 of the 30 phones. Let’s use the below code to find the solution to this problem.
[M, n, N] = [30, 10, 15]
rv = hypergeom(M, n, N)
x = np.arange(0, n+1)
pmf_applephones = rv.pmf(x)
Plot the above result using the below code.
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, pmf_applephones, 'bo')
ax.vlines(x, 0, pmf_applephones, lw=2)
ax.set_xlabel('# of apple phones in our group of chosen phones')
ax.set_ylabel('hypergeom PMF')
plt.show()
Scipy Stats Interval
Here in Scipy interval is referred to confidence interval
that tells the probability of falling the population parameter within a certain range of values. The scipy has a method interval()
within the submodule scipy.stats.rv_continous
that find the confidence interval with equal areas around the mean.
The syntax is given below.
rv_continuous.interval(alpha, *args, loc, scale)
Where parameters are:
alpha(arry_data like a float):
It defines the probability of drawing the RV from the returned range. Then range value should be from 0 to 1.- *args(array_data): It is used for defining the shape of the distribution.
- loc(array_data): It is used for defining the location parameter, by default it is 0.
- scale(array_data): It is used for defining the scale parameter, by default it is 1.
Scipy Stats ISF
The ISF
stands for Inverse survival function
that finds the ISF at q of the given random variates.
The syntax is given below.
rv_continuous.isf(q, *args,loc scale)
Where parameters are:
- q(array_data): It defines the upper tail probability.
- *args(array_data): It is used for defining the shape of the distribution.
- loc(array_data): It is used for defining the location parameter, by default it is 0.
- scale(array_data): It is used for defining the scale parameter, by default it is 1.
Scipy Stats Independent T-test
The T-test
is used for testing the null hypothesis and calculating the T-test of the mean of the two independent samples. In simple terms, it tests that the two independent samples have the same average value.
The syntax is given below.
scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate', alternative='two-sided', trim=0)
Where parameters are:
- a,b(array_data): It is the sample of independent observations in the form of an array.
- axis(int): It is used to specify the axis on which the test is done.
- equal_var(boolean): If it is true, then it considers that the variance of two independent samples is equal, otherwise in the case of false, it uses
Welch’s t-test
for two independent samples whose variance is not equal. - alternative: It is used to specify the alternative hypothesis.
- nan_policy: It is used to deal with the nan values and accept three values:
- omit: It means calculating the IQR by ignoring the nan values.
- propagate: It means returns nan values.
- raise: It means to throw an error for the nan values.
The method ttest_1samp
returns two float values, the t-statistic
and pvalue
.
Let’s take an example by following the below steps:
Import the required libraries stats
from Scipy using the below code.
from scipy import stats
import numpy as np
Create a constructor to generate a random number using the below code.
randomnum_gen = np.random.default_rng()
Create two samples with identical means using the below code.
sample1 = stats.norm.rvs(loc=6, scale=15, size=1000, random_state=randomnum_gen)
sample2 = stats.norm.rvs(loc=6, scale=15, size=1000, random_state=randomnum_gen)
Calculate the T-test
of independent samples that we have created above.
stats.ttest_ind(sample1, sample2)
From the above output result, we can reject or accept the null hypothesis based on statistics and p-value.
Scipy Stats Fisher Exact
The fisher exact
is a kind of statistical test of the nonrandom relation between two categorical variables. The Scipy has a method fisher_exact()
for that kind of test.
The syntax is given below.
scipy.stats.fisher_exact(table, alternative='two-sided')
Where parameters are:
- table(array_data of type ints): It 2×2 table as input on which we want to perform the test.
- alternative: It is used to specify the alternative hypothesis. The alternative options are given below:
- ‘two-sided’
- ‘less’: one-sided
- ‘greater’: one-sided
The method returns the two values oddratio
and p_value
of type float.
Let’s take an example by following the below steps:
Suppose we have a survey of the students in college about using the iPhone and Android phones based on gender, then we found the below data.
iPhone | Android | |
Male | 10 | 5 |
Female | 5 | 11 |
To find if there is a statistically significant association between gender and phones preference use the below codes.
Import the libraries using the below code.
from scipy import stats
Create the array of data for holding the survey information.
survey_data = [[10,5],[5,11]]
Perform the fisher_exact()
function on this data to know the significance.
stats.fisher_exact(survey_data)
From the output, the p_value is greater than 0.05 so there is not enough evidence to say there is an association between gender and phones preference.
So, in this Scipy tutorial, we understood the requirement and use of Scipy Stats. And we have also covered the following topics.
- Scipy Stats
- Scipy Stats Lognormal
- Scipy Stats Norm
- Scipy Stats T-test
- Scipy Stats Pearsonr
- Scipy Stats chi-square
- Scipy Stats IQR
- Scipy Stats Poisson
- Scipy Stats Entropy
- Scipy Stats Anova
- Scipy Stats Anderson
- Scipy Stats Average
- Scipy Stats Alpha
- Scipy Stats Boxcox
- Scipy Stats Binom
- Scipy Stats Beta
- Scipy Stats Binomial test
- Scipy Stats Binned statistics
- Scipy Stats Binom pmf
- Scipy Stats CDF
- Scipy Stats Cauchy
- Scipy Stats Describe
- Scipy Stats Exponential
- Scipy Stats Gamma
- Scipy Stats Geometric
- Scipy Stats gmean
- Scipy Stats Gennorm
- Scipy Stats Genpareto
- Scipy Stats Gumbel
- Scipy Stats Genextreme
- Scipy Stats Histogram
- Scipy Stats Half normal
- Scipy Stats Half cauchy
- Scipy Stats Inverse gamma
- Scipy Stats Inverse normal CDF
- Scipy Stats Johnson
- Scipy Stats PDF
- Scipy Stats Hypergeom
- Scipy Stats Interval
- Scipy Stats ISF
- Scipy Stats Independent T-test
- Scipy Stats Fisher Exact
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.