# Scipy Stats – Complete Guide

In this Python tutorial, we will understand the use of “Scipy Stats” using various examples in Python. Additionally, we will cover the following topics.

• Scipy Stats
• Scipy Stats Lognormal
• Scipy Stats Norm
• Scipy Stats T-test
• Scipy Stats Pearsonr
• Scipy Stats chi-square
• Scipy Stats IQR
• Scipy Stats Poisson
• Scipy Stats Entropy
• Scipy Stats Anova
• Scipy Stats Anderson
• Scipy Stats Average
• Scipy Stats Alpha
• Scipy Stats Boxcox
• Scipy Stats Binom
• Scipy Stats Beta
• Scipy Stats Binomial test
• Scipy Stats Binned statistics
• Scipy Stats Binom pmf
• Scipy Stats CDF
• Scipy Stats Cauchy
• Scipy Stats Describe
• Scipy Stats Exponential
• Scipy Stats Gamma
• Scipy Stats Geometric
• Scipy Stats gmean
• Scipy Stats Gennorm
• Scipy Stats Genpareto
• Scipy Stats Gumbel
• Scipy Stats Genextreme
• Scipy Stats Histogram
• Scipy Stats Half normal
• Scipy Stats Half cauchy
• Scipy Stats Inverse gamma
• Scipy Stats Inverse normal CDF
• Scipy Stats Johnson
• Scipy Stats PDF
• Scipy Stats Hypergeom
• Scipy Stats Interval
• Scipy Stats ISF
• Scipy Stats Independent T-test
• Scipy Stats Fisher Exact

## Scipy Stats

The Scipy has a package or module `scipy.stats` that contains a huge number of statistical functions. Although statistics is a very broad area, here module contains the functions related to some of the major statistics.

• Summary Statistics
• Frequency Statistics
• Statistical tests
• Probability distributions
• Frequency statistics
• Correlation functions
• Quasi-Monte Carlo
• Other statistical functionality

## Scipy Stats Lognormal

The `Lognormal` represents the logarithm in normally distributed form. It is a random variable that is lognormal continuous.

The syntax is given below.

``scipy.stats.lognorm.method_name(data,loc,size,moments,scale)``

Where parameters are:

• data: It is a set of points or values that represent evenly sampled data in the form of array data.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis, and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.lognorm()`. The methods are given below.

• scipy.stats.lognorm.CDF(): It is used for the cumulative distribution function.
• scipy.stats.lognorm.PDF(): It is used for the probability density function.
• scipy.stats.lognorm.rvs(): To get the random variates.
• scipy.stats.lognorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.lognorm.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.lognorm.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.lognorm.sf(): It is used to get the values of the survival function.
• scipy.stats.lognorm.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.lognorm.logsf(): It is used to find the log related to the survival function.
• scipy.stats.lognorm.mean(): It is used to find the mean of the distribution.
• scipy.stats.lognorm.medain(): It is used to find the median of the distribution.
• scipy.stats.lognorm.var(): It is used to find the variance related to the distribution.
• scipy.stats.lognorm.std(): It is used to find the standard deviation related to the distribution

## Scipy Stats Norm

The `scipy.stats.norm` represents the random variable that is normally continuous. It has different kinds of functions for normal distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.norm.method_name(data,loc,size,moments,scale)``

Where parameters are:

• data: It is a set of points or values that represent evenly sampled data in the form of array data.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.norm()`. The methods are given below.

• scipy.stats.norm.CDF(): It is used for the cumulative distribution function.
• scipy.stats.norm.PDF(): It is used for the probability density function.
• scipy.stats.norm.rvs(): To get the random variates.
• scipy.stats.norm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.norm.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.norm.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.norm.sf(): It is used to get the values of the survival function.
• scipy.stats.norm.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.norm.logsf(): It is used to find the log related to the survival function.
• scipy.stats.norm.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.norm.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.norm.var(): It is used to find the variance related to the distribution.
• scipy.stats.norm.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````import numpy as np
import matplotlib.pyplot as plt
from scipy import stats``````

Create observation data values and calculate the `probability density function` from these data values with `mean = 0` and `standard deviation = 1`.

``````observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.PDF(observatin_x,loc=0,scale=1)``````

Plot the created distribution using the below code.

``````plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density funciton of normal distribution")
plt.show()``````

Look at the output, which shows the probability density function graph of normal distribution.

## Scipy Stats CDF

Scipy stats `CDF` stand for `Comulative distribution function` that is a function of an object `scipy.stats.norm()`. The range of the CDF is from 0 to 1.

The syntax is given below.

``scipy.stats.norm.CDF(data,loc,size,moments,scale)``

Where parameters are:

data: It is a set of points or values that represent evenly sampled data in the form of array data.
loc: It is used to specify the mean, by default it is 0.
moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
scale: It is used to specify the standard deviation, by default it is 1.

Let’s take an example and calculate using the below steps:

Import the required libraries using the below code.

``````import numpy as np
import matplotlib.pyplot as plt
from scipy import stats``````

Create observation data values and calculate the `comulative distribution function` from these data values with `mean = 0` and `standard deviation = 1`.

``````observatin_x = np.linspace(-4,4,200)
CDF_norm = stats.norm.CDF(observatin_x,loc=0,scale=1)``````

Plot the created distribution using the below code.

``````plt.plot(observatin_x,CDF_norm)
plt.xlabel('x-values')
plt.ylabel('CDF_norm_values')
plt.title("Comulative distribution function")
plt.show()``````

From the above output, CDF is increasing and it tells that any value chosen from a population is going to have a probability less than or equal to some value x.

## Scipy Stats Histogram

The Scipy has a method `histogram()` to create a histogram from the given values that exist within a subpackage `scipy.stats`. This function set apart the range into several bins and returns the instances in each bin.

The syntax is given below.

``scipy.stats.histogram(a, numbins, defaultreallimits, weights)``

Where parameters are:

• a (array): It is the array of data that is provided as input.
• numbins (int): It is used to set the number of bins for the histogram.
• defaultreallimits: It is used to specify the range like lower and upper values of the histogram.
• weights (array): It is used to specify the weight of each value within the array.

The above function exists in the older version of Scipy, so here we will use the same function but it can be accessed from the scipy module directly. Let’s take an example using the below steps.

Import the required libraries using the below code.

``````import numpy as np
import scipy
import matplotlib.pyplot as plt``````

Generating the histogram values and bins by passing the array `[1, 2, 2, 3, 2, 3, 3]` and bin range `4` to the function `histogram()`.

``````histogram, bins = scipy.histogram([1, 2, 2, 3, 2, 3, 3],
bins = range(4))``````

Viewing the values and size of histogram and bins respectively.

``````print ("Number of values in each bin : ", histogram)
print ("Size of the bins          : ", bins)``````

Plot the above-created histogram using the below code.

``````plt.bar(bins[:-1], histogram, width = 0.9)
plt.xlim(min(bins), max(bins))
plt.show()``````

Look at the above output, this is how a histogram is created using the Scipy.

## Scipy Stats Pearsonr

The `Pearsonr` is a Pearson correlation coefficient that is used to know the linear relationship between two variables and datasets. The method `pearsonr()` in the subpackage `scipy.stats` is used for that.

The syntax is given below.

``scipy.stats.pearsonr(x, y)``

Where parameters are:

• x: It is the array data.
• y: It is also the array data.

The method `pearsonr()` returns two values an `r` (Pearson correlation coefficient) and a `p-value`. The values of `r` between `-1` and `1` where `-1` means a strong negative relationship and `1` means a strong positive relationship, if the value is equal to `0` which means there is no relationship.

Let’s take an example by following the below steps:

Import the libraries using the below code.

``from scipy import stats``

Now access the method `pearsonr()` and pass it two array values using the below code.

``r, p_values = stats.pearsonr([1, 4, 3, 2, 5], [9, 10, 3.5, 7, 5])``

Check the values of the Pearson correlation coefficient and p-value using the below code.

``````print('The Pearson correlation coefficient',r)
print('P-value                            ',p_values)``````

## Scipy Stats PDF

Scipy stats `CDF` stand for `Probability density function` that is a function of an object `scipy.stats.norm()`. The range of the PDF is from 0 to 1.

The syntax is given below.

``scipy.stats.norm.PDF(data,loc,size,moments,scale)``

Where parameters are:

data: It is a set of points or values that represent evenly sampled data in the form of array data.
loc: It is used to specify the mean, by default it is 0.
moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
scale: It is used to specify the standard deviation, by default it is 1.

Let’s take an example and calculate using the below steps:

Import the required libraries using the below code.

``````import numpy as np
import matplotlib.pyplot as plt
from scipy import stats``````

Create observation data values and calculate the `probability density function` from these data values with `mean = 0` and `standard deviation = 1`.

``````observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.pdf(observatin_x,loc=0,scale=1)``````

Plot the created distribution using the below code.

``````plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density function")
plt.show()``````

## Scipy Stats chi-square

The chi-square test tests the variation between actual and expected results in statistics. It is used in hypothesis testing. It is applied to categorical data. In scipy, there is a method `chisquare` within subpackage `scipy.stats` to do the testing.

• To use the chi-squared test sample size should be greater than 13.
• This test doesn’t work if the expected or actual frequencies in a categorical variable are very small. So keep at least five expected or actual frequencies in a categorical variable.

The syntax is given below.

``scipy.stats.chisquare(f_obs, f_exp=None, ddof=0)``

where parameters are:

• f_obs(array data): It is the observed frequencies in categorical variables.
• f_exp(array data): It is the expected frequencies in categorical variables.
• ddof(int): It is used to define the `Delta degrees of freedom`.

The method `chisquare` the test returns two float values, the first is the chi-square test statistic and the second is the p-value.

Let’s take an example by following the below steps:

Import the method `chisquare` from the module `scipy.stats` using the below code.

``from scipy.stats import chisquare``

Create a two array type variable to store the observed and expected frequencies. Pass the two array data to the method `chisquare` to perform the chi-squared test.

``````observed_f = [10, 25, 10, 13, 11, 11]
expected_f = [15, 15, 15, 15, 15, 5]
test_value = chisquare(f_obs=observed_f, f_exp=expected_f)``````

View the test result using the below code.

``````print('The value of chi-squared test statistic: ',test_value)
print('The value of p-vale: ',test_value)``````

The output shows the result of the chi-squared test. This is how to perform the chi-squared test on the categorical data to find the differences between actual and observed data using the value of the chi-squared test statistic and p-value.

## Scipy Stats IQR

The `IQR` stand for `Interquartile Range` which is the difference between the 1st (25th percentile) and the 3rd quartile (75th). It is used to measure the dispersion of data. The Scipy has a method `iqr` to calculate `Interquartile Range` of data on the stated axis that exists within the module `scipy.stats`.

The syntax is given below.

``scipy.stats.iqr(x, axis=None, rng=(25, 75), nan_policy='propagate', interpolation='linear')``

Where parameters are:

• x(array data): Array or object is provided to a method.
• axis(int): It is used to specify the axis for computing the range.
• rng(Two-values in the range [0,100]: It is used to specify the percentiles on which range is calculated.
• nan_policy: It is used to deal with the nan values and accept three values:
1. omit: It means calculating the `IQR` by ignoring the nan values.
2. propagate: It means returns nan values.
3. raise: It means to throw an error for the nan values.
• interpolation(string): It is used to specify the interpolation method to use like linear, lower, higher, nearest, and midpoint.

The method `iqr` returns the value in ndarray or scalar depending upon the provided input.

Let’s take an example to calculate the `IQR` given array data by following the below steps.

Import the method `iqr` from the module `scipy.stats` using the below code.

``from scipy.stats import iqr``

Create an array of data using and pass the data to a method `iqr` for calculating the `IQR`.

``````x_data = np.array([[15, 8, 7], [4, 3, 2]])

iqr(x_data)``````

The above output shows the `Interquartile Range` of given array data, this is how to find the `IQR` of the data.

## Scipy Stats Average

The Scipy has a statistical method mean to calculate the average of the given data. The mean or average is the sum of all the values divided by the number of values.

The syntax is given below.

``scipy.mean(array_data,axis)``

Where parameters are:

• array_data: It is the data in the array form containing all the elements.
• axis(int): It is used to specify the axis along which average or mean needs to be calculated.

The method `mean()` return the arithmetic mean of the elements in the array.

Let’s understand through an example following the below steps.

Import the required libraries using the below code.

``import scipy``

Creating an array containing the elements whose arithmetic mean needs to be calculated.

``array_data = [2,4,6,8,12,23]``

Calculate the mean of the created array by passing it to the method `mean()`.

``scipy.mean(array_data)``

The output shows the mean of the given arrays.

## Scipy Stats Entropy

First, we need to know “What is entropy” entropy is a state of uncertainty in thermodynamics. But the concept of entropy has been taken in statistics which is applied while computing the probabilities. In statistics, entropy is used to assess the amount of information in distributions, variables and events.

The Scipy has a method `entropy()` to calculate the entropy of distributions.

The syntax of the method `entropy()` is given below.

``scipy.stats.entropy(pk, qk=None, base=None, axis=0)``

Where parameters are:

• pk(array): It takes the distribution.
• qk(array data): Arrangement against which the general entropy is figured. It must be in the same form as pk.
• base(float): It is used to define which logarithmic base to be used, by default natural logarithmic base.
• axis(int): It is used to specify the axis on which entropy is determined.

Follow the below steps for the demonstration of the method `entropy()`.

Import the method `entropy()` from module `scipy.stats`.

``from scipy.stats import entropy``

pass the `pk` values to a method to compute the entropy.

``entropy([8/9, 2/9], base=2)``

## Scipy Stats Anderson

The Anderson-Darling test estimates the null hypothesis that the sample is coming from a population that follows a specific distribution. The Scipy has a method `anderson()` of module `scipy.stats` for that test.

The syntax of the method `anderson()` is given below.

``scipy.stats.anderson(x, dist='norm')``

Where parameters are:

• x(array_data): It is sample data.
• dist(): It is used to define the distribution to test in contrast to. It accepts the following values.
1. ‘norm’,
2. ‘expon’,
3. ‘logistic’,
4. ‘gumbel’,
5. ‘gumbel_l’,
6. ‘gumbel_r’,
7. ‘extreme1’

The method `anderson()` returns statistics, critical_values, and significance_level.

## Scipy Stats Anova

Anova refers to the Analysis of variance that test whether to accept the null hypothesis or alternate hypothesis. The Scipy has a method `f_oneway` to test, the hypothesis that the population means of the given two or more groups are the same.

The syntax is given below.

``scipy.stats.f_oneway(*args, axis=0)``

Where parameters are:

• *args(array_data): It is sample_1, sample_2 measurement of every group.
• axis(int): It is used to specify the axis of the provided arrays as input on which the test is performed.

The method `f_oneway` returns the two values statistic and p-value in float data type.

Let’s understand through demonstration by following the below steps.

Import the method `f_oneway` from the module `scipy.stats` using the below steps.

``````from scipy.stats import f_oneway
import numpy as np``````

Creating the multidimensional array using the below code.

``````first_data = np.array([[7.77, 7.03, 5.71],
[5.17, 7.35, 7.00],
[7.39, 7.57, 7.57],
[7.45, 5.33, 9.35],
[5.41, 7.10, 9.33],
[7.00, 7.24, 7.44]])
second_data = np.array([[5.35, 7.30, 7.15],
[5.55, 5.57, 7.53],
[5.72, 7.73, 5.72],
[7.01, 9.19, 7.41],
[7.75, 7.77, 7.30],
[5.90, 7.97, 5.97]])
third_data = np.array([[3.31, 7.77, 1.01],
[7.25, 3.24, 3.52],
[5.32, 7.71, 5.19],
[7.47, 7.73, 7.91],
[7.59, 5.01, 5.07],
[3.07, 9.72, 7.47]])``````

Pass the above-created arrays to a method `f_oneway` for the testing using the below code.

``f_statistic_value, p_value = f_oneway(first_data,second_data,third_data)``

Check the computed values using the below code.

``````print('The value of F statistic test',f_statistic_value)
print('The value of p-value',p_value)``````

This how-to used the ANOVA test using the Scipy.

## Scipy Stats T-test

The `T-test` is used for testing the null hypothesis and calculating the `T-test` of the mean of the given sample. There are several methods of `T-test` in the Scipy module `scipy.stats` but here we will learn about a specific method that is `ttest_1samp`.

The syntax is given below.

``scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate')``

Where parameters are:

• a(array_data): It is the sample of independent observations.
• popmean(float or array_data): It is the mean or expected value of the population.
• axis(int): It is used to specify the axis on which the test is done.
• nan_policy: It is used to deal with the nan values and accept three values:
1. omit: It means calculating the IQR by ignoring the nan values.
2. propagate: It means returns nan values.
3. raise: It means to throw an error for the nan values.

The method `ttest_1samp` returns two float values, the `t-statistic` and `pvalue`.

Let’s take an example by following the below steps:

Import the required libraries `stats` from Scipy using the below code.

``````from scipy import stats
import numpy as np``````

Create a constructor to generate a random number using the below code.

``randomnub_gen = np.random.default_rng()``

Creating the random number as a sample from the specific distribution using the below code.

``random_variate_s = stats.norm.rvs(loc=6, scale=11, size=(51, 3), random_state=randomnub_gen)``

View the generated data or numbers for the sample.

Now perform the `T-test` on this generated random sample to know whether the sample is equal to the population mean or not.

``stats.ttest_1samp(random_variate_s, 5.0)``

Again perform the test with a population mean equal to zero using the below code.

``stats.ttest_1samp(random_variate_s, 0.0)``

From the above output result, we can reject or accept the null hypothesis based on statistics and p-value.

## Scipy Stats Half normal

The `scipy.stats.halfnorm` represents the random variable that is half normally continuous. It has different kinds of functions of half-normal distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.halfnorm.method_name(data,loc,size,moments,scale)``

Where parameters are:

• data: It is a set of points or values that represent evenly sampled data in the form of array data.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.halfnorm()`. The methods are given below.

• scipy.stats.halfnorm.CDF(): It is used for the cumulative distribution function.
• scipy.stats.halfnorm.PDF(): It is used for the probability density function.
• scipy.stats.halfnorm.rvs(): To get the random variates.
• scipy.stats.halfnorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.halfnorm.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.halfnorm.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.halfnorm.sf(): It is used to get the values of the survival function.
• scipy.stats.halfnorm.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.halfnorm.logsf(): It is used to find the log related to the survival function.
• scipy.stats.halfnorm.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.halfnorm.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.halfnorm.var(): It is used to find the variance related to the distribution.
• scipy.stats.halfnorm.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````import numpy as np
import matplotlib.pyplot as plt
from scipy import stats``````

Create observation data values and calculate the `probability density function` from these data values with `mean = 0` and `standard deviation = 1`.

``````observatin_x = np.linspace(-4,4,200)
PDF_norm = stats.norm.PDF(observatin_x,loc=0,scale=1)``````

Plot the created distribution using the below code.

``````plt.plot(observatin_x,PDF_norm)
plt.xlabel('x-values')
plt.ylabel('PDF_norm_values')
plt.title("Probability density funciton of half normal distribution")
plt.show()``````

Look at the above output, which looks half-normal distribution.

## Scipy Stats Cauchy

The `Cauchy` is a distribution like a normal distribution and belongs to members of a continuous probability distribution. It has a higher peak in comparison to the normal distribution.

The syntax is given below.

``scipy.stats.cauchy.method_name(data,loc,scale)``

Where parameters are:

• data: It is a set of points or values that represent evenly sampled data in the form of array data.
• loc: It is used to specify the mean, by default it is 0.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.cauchy()`. The methods are given below.

• scipy.stats.cauchy.CDF(): It is used for the cumulative distribution function.
• scipy.stats.cauchy.PDF(): It is used for the probability density function.
• scipy.stats.cauchy.rvs(): To get the random variates.
• scipy.stats.cauchy.stats(): It is used to get the standard deviation, mean, kurtosis and skew.
• scipy.stats.cauchy.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.cauchy.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.cauchy.sf(): It is used to get the values of the survival function.
• scipy.stats.cauchy.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.cauchy.logsf(): It is used to find the log related to the survival function.
• scipy.stats.cauchy.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.cauchy.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.cauchy.var(): It is used to find the variance related to the distribution.
• scipy.stats.cauchy.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by following the below steps:

Import the method `cauchy` , `numpy` and `matplotlib` using the below code.

``````from scipy.stats import cauchy
import matplotlib.pyplot as plt
import numpy as np``````

Create a `cauchy` distribution using the below code.

``````fig, ax = plt.subplots(1, 1)
x = np.linspace(cauchy.ppf(0.02),
cauchy.ppf(0.98), 99)
ax.plot(x, cauchy.PDF(x),
'r-', lw=5, alpha=0.6, label='cauchy PDF')``````

Look at the above output, this is how Cauchy looks like a normal distribution but with a taller peak.

## Scipy Stats Half cauchy

The `HalfCauchy` is a distribution like a half-normal distribution and belongs to members of a continuous probability distribution. It has a higher peak in comparison to the half-normal distribution.

The syntax is given below.

``scipy.stats.halfcauchy.method_name(data,loc,scale)``

Where parameters are:

• data: It is a set of points or values that represent evenly sampled data in the form of array data.
• loc: It is used to specify the mean, by default it is 0.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.halfcauchy()`. The methods are given below.

• scipy.stats.halfcauchy.CDF(): It is used for the cumulative distribution function.
• scipy.stats.halfcauchy.PDF(): It is used for the probability density function.
• scipy.stats.halfcauchy.rvs(): To get the random variates.
• scipy.stats.halfcauchy.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.halfcauchy.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.halfcauchy.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.halfcauchy.sf(): It is used to get the values of the survival function.
• scipy.stats.halfcauchy.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.halfcauchy.logsf(): It is used to find the log related to the survival function.
• scipy.stats.halfcauchy.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.halfcauchy.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.halfcauchy.var(): It is used to find the variance related to the distribution.
• scipy.stats.halfcauchy.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by following the below steps:

Import the method `halfcauchy` , `numpy` and `matplotlib` using the below code.

``````from scipy.stats import halfcauchy
import matplotlib.pyplot as plt
import numpy as np``````

Create a `halfcauchy` distribution using the below code.

``````fig, ax = plt.subplots(1, 1)
x = np.linspace(halfcauchy.ppf(0.02),
halfcauchy.ppf(0.98), 99)
ax.plot(x, halfcauchy.PDF(x),
'r-', lw=5, alpha=0.6, label='cauchy PDF')``````

## Scipy Stats Binom

The `scipy.stats.binom` represents the discrete random variable. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.

It has one important parameter `loc` for shifting the distribution.

The syntax is given below.

``scipy.stats.binom.method_name(k,n,p,loc)``

Where parameters are:

• k(int): It is used to define the no of successes.
• n(int): It is used to specify the no of trials.
• p(float): It is used to specify the assumed probability of success.
• loc: It is used to specify the mean, by default it is 0.

The above parameters are the common parameter of all the methods in the object `scipy.stats.binom()`. The methods are given below.

• scipy.stats.binom.CDF(): It is used for the cumulative distribution function.
• scipy.stats.binom.rvs(): To get the random variates.
• scipy.stats.binom.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.binom.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.binom.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.binom.sf(): It is used to get the values of the survival function.
• scipy.stats.binom.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.binom.logsf(): It is used to find the log related to the survival function.
• scipy.stats.binom.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.binom.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.binom.var(): It is used to find the variance related to the distribution.
• scipy.stats.binom.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import binom
import matplotlib.pyplot as plt``````

Define the value of parameters `n` `p` using the below code.

``p,n =0.3,4``

Create an array of data using the method `ppf()` (percent point function) of object `binom` .

``````array_data = np.arange(binom.ppf(0.02, n, p),
binom.ppf(0.98, n, p))
array_data``````

show the probability mass function using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(x, binom.pmf(x, n, p), 'bo', ms=7, label='binom pmf')
ax.vlines(x, 0, binom.pmf(x, n, p), colors='b', lw=6, alpha=0.5)``````

## Scipy Stats Describe

The Scipy has a method `describe()` in a module `scipy.stats` to find the descriptive statistics of the given data.

The syntax is given below.

``scipy.stats.describe(a, axis=0, ddof=1, bias=True, nan_policy='propagate')``

Where parameters are:

• a(array_data): It is the data of type array.
• axis(int): It is used to specify the axis on which statistics is calculated, by default it shows descriptive statistics on the whole array.
• ddof(int): It is used to specify the delta degrees of freedom.
• bias(Boolean): It is used to specify the Bias.
• nan_policy: It is used to deal with the nan values and accept three values:
1. omit: It means calculating the IQR by ignoring the nan values.
2. propagate: It means returns nan values.
3. raise: It means to throw an error for the nan values.

The method `descibe()` returns `mean`, `skewness`, `kurtosis` and `variance` in a type ndarray or float.

Let’s take an example by following the below steps:

Import the required libraries using the below code.

``````from scipy import stats
import numpy as np``````

Create an array containing 20 observations or values using the below code.

``array_data = np.arange(20)``

Pass the above-created array to a method `describe()` for finding the descriptive statistics using the below code.

``````result = stats.describe(array_data)
result``````

Let’s view each statistic of the array using the below code.

``````print('Number of observation in array',result)
print('Minimum and maximum values in a array',result)
print('Mean of the array',result)
print('Variance of the array',result)
print('Skewness of the array',result)
print('Kurtosis of the array',result)``````

## Scipy Stats Binomial test

The Binomial test finds the probability of the specific outcome by performing the many trials where only two possible outcomes exist. It is used for the null hypothesis test to assess the probability of the outcomes in the Bernoulli experiment.

The Scipy has a method `binomtest()` to perform the Binomial test that exists within the module `scipy.stats`.

The syntax is given below.

``scipy.stats.binomtest(k, n, p=0.5, alternative='two-sided')``

Where parameters are:

• k(int): It is used to define the no of successes.
• n(int): It is used to specify the no of trials.
• p(float): It is used to specify the assumed probability of success.
• alternative: It is used to specify the alternative hypothesis.

The method `binomtest()` returns the `p-value`, `proportion_estimate` value in float type with one more result `proportion_ci` to know the confidence interval of the estimate.

Let’s understand through an example by following the below steps.

Import the method `binomtest()` from the module `scipy.stats` using the below code.

``from scipy.stats import binomtest``

Now, A phone manufacturer claims that no more than 15% of their phones are unsafe. 20 phones are inspected for safety, and 6 were found to be unsafe. Test the manufacturer’s claim.

``Test_result = binomtest(6, n=20, p=0.1, alternative='greater')``

View the result using the below code.

``````print('The p-value is ',Test_result.pvalue)
print('The estimated proportion is 6/20 ',Test_result.proportion_estimate)
print('The confidence interval of the estimate ',Test_result.proportion_ci(confidence_level=0.95))``````

## Scipy Stats Binom pmf

In Scipy there is a method `binom.pmf()` that exist in a module `scipy.stats` to show the probability mass function using the binomial distribution.

The syntax is given below.

``scipy.stats.binom.pmf(k,n, p,loc=0)``

Where parameters are:

• k(int): It is used to define the no of successes.
• n(int): It is used to specify the no of trials.
• p(float): It is used to specify the assumed probability of success.
• loc: It is used to specify the mean, by default it is 0.

To understand with an example, please refer to above sub-section `Scipy Stats Binom` where the method `pmf` which stands for probability mass function is used in the example.

## Scipy Stats gmean

The method `gmean()` of module `scipy.stats.mstats` of Scipy finds the geometric average of the given array on basis of the specified axis.

The syntax is given below.

``scipy.stats.mstats.gmean(a, axis=0, dtype=None, weights=None)``

Where parameters are:

• a(array_data): It is the collection of elements within an array or array data.
• axis(int): It is used to specify the axis of the array on which we want to find the geometric mean.
• dtype: It is used to specify the data type of the returned array.
• weights(array_data): It is used to specify the weight of the values, by default the weight of values is 1.0 in the array.

The method `gmean()` returns the `gmean` which is the geometric mean of a passed array of type `ndarray`.

Let’s understand through an example by following the below steps.

Import the required libraries using the below code.

``from scipy.stats.mstats import gmean``

Find the geometric mean of the array `[2,4,6,8]` using the below code.

``gmean([2,4,6,8])``

## Scipy Stats Alpha

The `scipy.stats.alpha` represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.alpha.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.alpha()`. The methods are given below.

• scipy.stats.alpha.CDF(): It is used for the cumulative distribution function.
• scipy.stats.alpha.PDF(): It is used for the probability density function.
• scipy.stats.alpha.rvs(): To get the random variates.
• scipy.stats.alpha.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.alpha.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.alpha.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.alpha.sf(): It is used to get the values of the survival function.
• scipy.stats.alpha.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.alpha.logsf(): It is used to find the log related to the survival function.
• scipy.stats.alpha.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.alpha.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.alpha.var(): It is used to find the variance related to the distribution.
• scipy.stats.alpha.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import alpha
import matplotlib.pyplot as plt
import numpy as np``````

Creates a variable for the shape parameters and assigns some values.

``a = 4.3``

Create an array of` `data using the method `ppf()` of an object `alpha` using the below code.

``````array_data = np.linspace(alpha.ppf(0.01, a),
alpha.ppf(0.90, a), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of object `alpha` of module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, alpha.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='alpha PDF')``````

## Scipy Stats Beta

The `scipy.stats.beta` represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.beta.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a,b: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.beta()`. The methods are given below.

• scipy.stats.beta.CDF(): It is used for the cumulative distribution function.
• scipy.stats.beta.PDF(): It is used for the probability density function.
• scipy.stats.beta.rvs(): To get the random variates.
• scipy.stats.beta.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.beta.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.beta.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.beta.sf(): It is used to get the values of the survival function.
• scipy.stats.beta.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.beta.logsf(): It is used to find the log related to the survival function.
• scipy.stats.beta.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.beta.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.beta.var(): It is used to find the variance related to the distribution.
• scipy.stats.beta.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import beta
import matplotlib.pyplot as plt
import numpy as np``````

creates two variables a and b for the shape parameters and assigns some values.

``````a = 3.4
b = 0.763``````

Create an array of` `data using the method `ppf()` of an object `beta` using the below code.

``````array_data = np.linspace(beta.ppf(0.01, a,b),
beta.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `beta` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, beta.PDF(array_data, a,b),
'r-', lw=4, alpha=0.5, label='alpha PDF')``````

## Scipy Stats Gamma

The `scipy.stats.gamma` represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.gamma.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.gamma()`. The methods are given below.

• scipy.stats.gamma.CDF(): It is used for the cumulative distribution function.
• scipy.stats.gamma.PDF(): It is used for the probability density function.
• scipy.stats.gamma.rvs(): To get the random variates.
• scipy.stats.gamma.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.gamma.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.gamma.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.gamma.sf(): It is used to get the values of the survival function.
• scipy.stats.gamma.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.gamma.logsf(): It is used to find the log related to the survival function.
• scipy.stats.gamma.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.gamma.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.gamma.var(): It is used to find the variance related to the distribution.
• scipy.stats.gamma.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import gamma
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``a = 1.95``

Create an array of data using the method `ppf()` of an object `gamma` using the below code.

``````array_data = np.linspace(gamma.ppf(0.01, a),
gamma.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `gamma` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gamma.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='alpha PDF')``````

## Scipy Stats Inverse Normal CDF

Here, we will learn about the Inverse of the normal `Cumulative distribution function`. As we already know about normal from the above sub-section ‘Scipy Stats Norm’. so here will use the method `ppf()` which represents the inverse of the `CDF` of object `scipy.statst.norm` of Scipy.

``scipy.stats.norm.ppf(q,loc,scale)``

Where parameters are:

• q: It is used to specify the quantiles.
• loc: It is used to specify the mean, by default it is 0.
• scale: It is used to specify the standard deviation, by default it is 1.

Let’s take an example by following the below steps.

Import the library `stats` using the below code.

``from scipy import stats``

Find the inverse of the `CDF` using the below code.

``stats.norm.CDF(stats.norm.ppf(0.7))``

## Scipy Stats Johnson

The `scipy.stats` contains two objects `johnsonsb()` and `johnsonub()` that belongs to the family of Johnson distribution. It has different kinds of functions of distribution like CDF, PDF, median, etc.

• The method `johnsonsb()` represents the bounded continuous probability distribution whereas `johnsonub()` is the unbounded continuous probability distribution.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.alpha.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a,b: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import gamma
import matplotlib.pyplot as plt
import numpy as np``````

Code creates two variables a and b for the shape parameters and assigns some values.

``a,b = 3.35,2.25``

Create an array of data using the method `ppf()` of an object `johnsonsb` using the below code.

``````array_data = np.linspace(johnsonsb.ppf(0.01, a,b),
johnsonsb.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `johnsonsb` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, johnsonsb.PDF(array_data, a,b),
'r-', lw=4, alpha=0.5, label='johnsonsb PDF')``````

We can also find the distribution of Johnson’s unbounded continuous probability distribution using the same process as we have used for Johnson’s bounded continuous probability distribution.

## Scipy Stats Inverse gamma

The `scipy.stats.invgamma` represents the inverted random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.invgamma.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.invgamma()`. The methods are given below.

• scipy.stats.invgamma.CDF(): It is used for the cumulative distribution function.
• scipy.stats.invgamma.PDF(): It is used for the probability density function.
• scipy.stats.invgamma.rvs(): To get the random variates.
• scipy.stats.invgamma.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.invgamma.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.invgamma.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.invgamma.sf(): It is used to get the values of the survival function.
• scipy.stats.invgamma.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.invgamma.logsf(): It is used to find the log related to the survival function.
• scipy.stats.invgamma.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.invgamma.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.invgamma.var(): It is used to find the variance related to the distribution.
• scipy.stats.invgamma.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import invgamma
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``a = 3.04``

Create an array of data using the method `ppf()` of an object `invgamma` using the below code.

``````array_data = np.linspace(invgamma.ppf(0.01, a),
invgamma.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `invgamma` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, invgamma.PDF(array_data, a),
'r-', lw=4, alpha=0.5, label='invgamma PDF')``````

## Scipy Stats Gennorm

The `scipy.stats.gennorm` represents the random variable that is generalized normal continuous in nature. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.gennorm.method_name(x,beta,loc,size,moments,scale)``

Where parameters are:

• x: It is a set of points or values that represent evenly sampled data in the form of array data.
• beta: It is used to specify the shape.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.gennorm()`. The methods are given below.

• scipy.stats.gennorm.CDF(): It is used for the cumulative distribution function.
• scipy.stats.gennorm.PDF(): It is used for the probability density function.
• scipy.stats.gennorm.rvs(): To get the random variates.
• scipy.stats.gennorm.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.gennorm.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.gennorm.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.gennorm.sf(): It is used to get the values of the survival function.
• scipy.stats.gennorm.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.gennorm.logsf(): It is used to find the log related to the survival function.
• scipy.stats.gennorm.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.gennorm.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.gennorm.var(): It is used to find the variance related to the distribution.
• scipy.stats.gennorm.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import gennorm
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``beta = 1.4``

Create an array of data using the method `ppf()` of an object `gennorm` using the below code.

``````array_data = np.linspace(gennorm.ppf(0.01, a),
gennorm.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `gennorm` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gennorm.PDF(array_data, beta),
'r-', lw=4, alpha=0.5, label='gennorm PDF')``````

## Scipy Stats Genpareto

The `scipy.stats.genpareto` represents the generalized Pareto random variable that is continuous in nature. It has different kinds of functions of normal distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.genpareto.method_name(x,c,loc,size,moments,scale)``

Where parameters are:

• x: It is a set of points or values that represent evenly sampled data in the form of array data.
• c: It is used to specify the shape.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.genpareto()`. The methods are given below.

• scipy.stats.genpareto.CDF(): It is used for the cumulative distribution function.
• scipy.stats.genpareto.PDF(): It is used for the probability density function.
• scipy.stats.genpareto.rvs(): To get the random variates.
• scipy.stats.genpareto.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.genpareto.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.genpareto.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.genpareto.sf(): It is used to get the values of the survival function.
• scipy.stats.genpareto.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.genpareto.logsf(): It is used to find the log related to the survival function.
• scipy.stats.genpareto.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.genpareto.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.genpareto.var(): It is used to find the variance related to the distribution.
• scipy.stats.genpareto.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import genpareto
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``c = 0.2``

Create an array of data using the method `ppf()` of an object `genpareto` using the below code.

``````array_data = np.linspace(genpareto.ppf(0.01, c),
genpareto.ppf(0.90, c), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `genpareto` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, genpareto.PDF(array_data, c),
'r-', lw=4, alpha=0.5, label='genpareto PDF')``````

## Scipy Stats Gumbel

The `scipy.stats` contains two objects `gumbel_r()` and `gumbel_l()` that is used to model the left or right-skewed distribution. It has different kinds of functions of distribution like CDF, PDF, median, etc.

• The method `gumbel_r()` represents the right-skewed Gumbel continuous distribution whereas `gumbel_l()` is the left-skewed Gumbel continuous distribution.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.gumbel_r.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a,b: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import gumbel_r
import matplotlib.pyplot as plt
import numpy as np``````

Create an array of data using the method `ppf()` of an object `gumbel_r` using the below code.

``````array_data = np.linspace(gumbel_r.ppf(0.01, a,b),
gumbel_r.ppf(0.90, a,b), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `gumbel_r` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, gumbel_r.PDF(array_data),
'r-', lw=4, alpha=0.5, label='gumbel PDF')``````

## Scipy Stats Binned statistics

The Scipy submodule `scipy.stats` contains a method `binned_statistic` to calculate statistics like the mean, median, sum, etc of the values with each bin.

The syntax is given below.

``scipy.stats.binned_statistic(x, values, statistic='mean', bins=10, range=None)``

Whare parameters are:

• x(array_data): It is a sequence of values that is binned.
• values(array_data, list(N)): It is value which is used to calculate statistics.
• statistic(string): It is used to specify what kind of statistics we want to compute like mean, sum, median, max, std, and count.
• bin(sequence or int): It is used to define the number of bins.
• range((float, float)): It defines the lower and upper range of the bins.

The method `binned_statistic` returns the statistics of the bins and the bind edges of array type.

Let’s understand with an example by following the below steps:

Import the required libraries using the below code.

``from scipy import stats``

Create a set of values and compute the binned statistics using the below code.

``````set_values = [2.0, 2.0, 3.0, 2.5, 4.0]
stats.binned_statistic([2, 2, 3, 6, 8], set_values, 'mean', bins=2)``````

## Scipy Stats Poisson

The `scipy.stats.poisson` represents the random variable that is discrete in nature. It has different kinds of functions of distribution like CDF, median, etc.

It has one important parameter `loc` for the mean for shifting the distribution using these parameters.

The syntax is given below.

``scipy.stats.gamma.method_name(mu,k,loc,moments)``

Where parameters are:

• mu: It is used to define the shape parameter.
• k: It is the data.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.

The above parameters are the common parameter of all the methods in the object `scipy.stats.poisson()`. The methods are given below.

• scipy.stats.poisson.CDF(): It is used for the cumulative distribution function.
• scipy.stats.poisson.rvs(): To get the random variates.
• scipy.stats.poisson.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.poisson.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.poisson.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.poisson.sf(): It is used to get the values of the survival function.
• scipy.stats.poisson.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.poisson.logsf(): It is used to find the log related to the survival function.
• scipy.stats.poisson.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.poisson.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.poisson.var(): It is used to find the variance related to the distribution.
• scipy.stats.poisson.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import poisson
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``mu = 0.5``

Create an array of data using the method `ppf()` of an object `poisson` using the below code.

``````array_data = np.linspace(poisson.ppf(0.01, mu),
poisson.ppf(0.90, mu))
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `poisson` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, poisson.pmf(array_data, mu), 'bo',ms=8,label='poisson pmf')
ax.vlines(array_data, 0,poisson.pmf(array_data, mu),colors='b', lw=4, alpha=0.5,)``````

This is how to use the binned statistics of Scipy.

## Scipy Stats Geometric

The `scipy.stats.geom` represents the random variable that is discrete in nature. It has different kinds of functions of geometric distribution like CDF, PDF, median, etc.

It has one important parameter `loc` for the mean as we know we control the shape of distribution using these parameters.

The syntax is given below.

``scipy.stats.geom.method_name(k,p,q,loc,size)``

Where parameters are:

• k(float or float of array_data): It is used to specify the Bernoulli trials.
• p(float or float of array_data): It is used to specify the success probability for each trial.
• q(float or float of array_data): It represents the probabilities.
• loc: It is used to specify the mean, by default it is 0.

The above parameters are the common parameter of all the methods in the object `scipy.stats.geom()`. The methods are given below.

• scipy.stats.geom.CDF(): It is used for the cumulative distribution function.
• scipy.stats.geom.PDF(): It is used for the probability density function.
• scipy.stats.geom.rvs(): To get the random variates.
• scipy.stats.geom.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.geom.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.geom.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.geom.sf(): It is used to get the values of the survival function.
• scipy.stats.geom.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.geom.logsf(): It is used to find the log related to the survival function.
• scipy.stats.geom.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.geom.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.geom.var(): It is used to find the variance related to the distribution.
• scipy.stats.geom.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import geom
import numpy as np
import matplotlib.pyplot as plt
``````

Create an array containing the 30 values and also create a variable that contains the success probability of each trial using the below code.

``````array_data = np.arange(1,30,1)
p = 0.5``````

Now plot the probability mass function by accessing the method `pmf()` of an object `geom` of the module `scipy.stats` using the below code.

``````geom_pmf_data = geom.pmf(array_data,p)
plt.plot(array_data,geom_pmf_data,'bo')
plt.show()``````

## Scipy Stats Exponential

The `scipy.stats.expon` represents the random variable that is continuous in nature. It has different kinds of functions of exponential distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.expon.method_name(x,q,loc,scale,size)``

Where parameters are:

• x(float or float of array_data): It is used to specify the random variable.
• q(float or float of array_data): It represents the probabilities.
• loc: It is used to specify the mean, by default it is 0.
• scale: It is used to specify the standard deviation, by default it is 1.
• size: It is used to specify the output shape.

The above parameters are the common parameter of all the methods in the object `scipy.stats.expon()`. The methods are given below.

• scipy.stats.expon.CDF(): It is used for the cumulative distribution function.
• scipy.stats.expon.PDF(): It is used for the probability density function.
• scipy.stats.expon.rvs(): To get the random variates.
• scipy.stats.expon.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.expon.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.expon.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.expon.sf(): It is used to get the values of the survival function.
• scipy.stats.expon.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.expon.logsf(): It is used to find the log related to the survival function.
• scipy.stats.expon.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.expon.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.expon.var(): It is used to find the variance related to the distribution.
• scipy.stats.expon.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import expon
import numpy as np
import matplotlib.pyplot as plt
``````

Create an array containing the 30 values using the below code.

``array_data = np.arange(-1,30,0.1)``

Now plot the probability density function by accessing the method `PDF()` of an object `expon` of the module `scipy.stats` using the below code.

``````expon_PDF_data = expon.PDF(array_data,0,2)
plt.plot(array_data,expon_PDF_data,'bo')
plt.show()``````

## Scipy Stats Boxcox

The Scipy submodel has a method `boxcox()` that transformed the non-normal dataset into the normal dataset.

The syntax is given below,

``scipy.stats.boxcox(x, lmbda=None, alpha=None, optimizer=None)``

Where parameters are:

• x(array_data): It is the input array data that should be positive and one-dimensional.
• lambda(scaler): It performs the transformation for the value.
• alpha(float): It returns the confidence interval for lambda.
• optimizer: If lambda is not set, then the optimizer finds the value of lambda.

The method `boxcox()` returns two values boxcox of type ndarray and maxlog of type float.

Let’s understand with an example by following the below steps:

Import the required modules using the below code.

``````import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
``````

Create or generate non-normal values using the below code.

``non_normal_data = np.random.exponential(size = 500)``

Transform the non-normal data or generated data into normal using the method `boxcox()` and also save the lambda value.

``transformed_data, lambda_value = stats.boxcox(non_normal_data)``

Plot both data the non-normal and transformed data using the below code.

``````fig, ax = plt.subplots(1, 2)

sns.distplot(non_normal_data, hist = False, kde = True,
kde_kws = {'shade': True, 'linewidth': 2},
label = "Non-Normal", color ="green", ax = ax)

sns.distplot(transformed_data, hist = False, kde = True,
kde_kws = {'shade': True, 'linewidth': 2},
label = "Normal", color ="green", ax = ax)

plt.legend(loc = "upper right")

fig.set_figheight(5)
fig.set_figwidth(10)``````

## Scipy Stats Genextreme

The `scipy.stats.genextreme` represents the random variable that is continuous in nature. It has different kinds of functions of distribution like CDF, PDF, median, etc.

It has two important parameters `loc` for the mean and `scale` for standard deviation, as we know we control the shape and location of distribution using these parameters.

The syntax is given below.

``scipy.stats.genextreme.method_name(q,x,a,loc,size,moments,scale)``

Where parameters are:

• x: It is used to define the quantiles.
• a,b,c: It is used to define the shape parameter.
• q: It is used to specify the tail of probability like lower and upper.
• loc: It is used to specify the mean, by default it is 0.
• moments: It is used to calculate statistics like standard deviation, kurtosis and mean.
• scale: It is used to specify the standard deviation, by default it is 1.

The above parameters are the common parameter of all the methods in the object `scipy.stats.genextreme()`. The methods are given below.

• scipy.stats.genextreme.CDF(): It is used for the cumulative distribution function.
• scipy.stats.genextreme.PDF(): It is used for the probability density function.
• scipy.stats.genextreme.rvs(): To get the random variates.
• scipy.stats.genextreme.stats(): It is used to get the standard deviation, mean, kurtosis, and skew.
• scipy.stats.genextreme.logPDF(): It is used to get the log related to the probability density function.
• scipy.stats.genextreme.logCDF(): It is used to find the log related to the cumulative distribution function.
• scipy.stats.genextreme.sf(): It is used to get the values of the survival function.
• scipy.stats.genextreme.isf(): It is used to get the values of the inverse survival function.
• scipy.stats.genextreme.logsf(): It is used to find the log related to the survival function.
• scipy.stats.genextreme.mean(): It is used to find the mean related to the normal distribution.
• scipy.stats.genextreme.medain(): It is used to find the median related to the normal distribution.
• scipy.stats.genextreme.var(): It is used to find the variance related to the distribution.
• scipy.stats.genextreme.std(): It is used to find the standard deviation related to the distribution

Let’s take an example by using one of the methods mentioned above to know how to use the methods with parameters.

Import the required libraries using the below code.

``````from scipy.stats import genextreme
import matplotlib.pyplot as plt
import numpy as np``````

Code creates a variable for the shape parameters and assigns some values.

``c = 1.95``

Create an array of data using the method `ppf()` of an object `genextreme` using the below code.

``````array_data = np.linspace(genextreme.ppf(0.01, c),
genextreme.ppf(0.90,c), 90)
array_data``````

Now plot the probability density function by accessing the method `PDF()` of an object `genextreme` of the module `scipy.stats` using the below code.

``````fig, ax = plt.subplots(1, 1)
ax.plot(array_data, genextreme.PDF(array_data,c),
'r-', lw=4, alpha=0.5, label='genextreme PDF')``````

## Scipy Stats Dirichlet

The Scipy has an object `dirichlet()` to create a distribution that belongs to a continuous multivariate probability distribution. It has some methods or functions that are given below.

• scipy.stats.genextreme.PDF(): It is used for the probability density function.
• scipy.stats.genextreme.var(): It is used to find the variance of the Dirichlet distribution.
• scipy.stats.genextreme.mean(): It is used to find the mean of the Dirichlet distribution.
• scipy.stats.genextreme.rvs(): To get the random variates.
• scipy.stats.genextreme.logPDF(): It is used to get the log related to the probability density function.

The syntax is given below.

``scipy.stats.dirichlet(x,alpha)``

Where parameters are:

• x(array_data): It is used to specify the quantiles.
• alpha(array_data): It is used to define the concentration parameters.

Let’s take an example by following the below steps:

Import the required libraries using the below code.

``````from scipy.stats import dirichlet
import numpy as np``````

Define the quantiles and alpha values within an array using the below code.

``````quant = np.array([0.3, 0.3, 0.4])
alp = np.array([0.5, 6, 16]) ``````

Now generate the Dirichlet random value using the below code.

``dirichlet.PDF(quant,alp)``

## Scipy Stats Hypergeom

The Scipy has a method `hypergeom()`in a module `scipy.stats` that created hypergeom distribution by taking the objects from a bin.

The syntax is given below.

``scipy.stats.hypergeom(M,n,N)``

Where parameters are:

• M: It is used to define the total number of objects.
• n: It is used to define the number of objects of type Ith in M.
• N: It is a random variate that represents the number of Types I objects in N taken without replacement from the whole population.

Let’s take an example by following the below steps:

Import the required libraries using the below code.

``````from scipy.stats import hypergeom
import numpy as np
import matplotlib.pyplot as plt``````

Now, think that we have a total number of 30 phones, of which 10 are apple phones. if we want to know the probability of getting the number of apple phones if we choose at random 15 of the 30 phones. Let’s use the below code to find the solution to this problem.

``````[M, n, N] = [30, 10, 15]
rv = hypergeom(M, n, N)
x = np.arange(0, n+1)
pmf_applephones = rv.pmf(x)``````

Plot the above result using the below code.

``````fig = plt.figure()
ax.plot(x, pmf_applephones, 'bo')
ax.vlines(x, 0, pmf_applephones, lw=2)
ax.set_xlabel('# of apple phones in our group of chosen phones')
ax.set_ylabel('hypergeom PMF')
plt.show()``````

## Scipy Stats Interval

Here in Scipy interval is referred to `confidence interval` that tells the probability of falling the population parameter within a certain range of values. The scipy has a method `interval()` within the submodule `scipy.stats.rv_continous` that find the confidence interval with equal areas around the mean.

The syntax is given below.

``rv_continuous.interval(alpha, *args, loc, scale)``

Where parameters are:

• `alpha(arry_data like a float):` It defines the probability of drawing the RV from the returned range. Then range value should be from 0 to 1.
• *args(array_data): It is used for defining the shape of the distribution.
• loc(array_data): It is used for defining the location parameter, by default it is 0.
• scale(array_data): It is used for defining the scale parameter, by default it is 1.

## Scipy Stats ISF

The `ISF` stands for `Inverse survival function` that finds the ISF at q of the given random variates.

The syntax is given below.

``rv_continuous.isf(q, *args,loc scale)``

Where parameters are:

• q(array_data): It defines the upper tail probability.
• *args(array_data): It is used for defining the shape of the distribution.
• loc(array_data): It is used for defining the location parameter, by default it is 0.
• scale(array_data): It is used for defining the scale parameter, by default it is 1.

## Scipy Stats Independent T-test

The `T-test` is used for testing the null hypothesis and calculating the T-test of the mean of the two independent samples. In simple terms, it tests that the two independent samples have the same average value.

The syntax is given below.

``scipy.stats.ttest_ind(a, b, axis=0, equal_var=True, nan_policy='propagate', alternative='two-sided', trim=0)``

Where parameters are:

• a,b(array_data): It is the sample of independent observations in the form of an array.
• axis(int): It is used to specify the axis on which the test is done.
• equal_var(boolean): If it is true, then it considers that the variance of two independent samples is equal, otherwise in the case of false, it uses `Welch’s t-test` for two independent samples whose variance is not equal.
• alternative: It is used to specify the alternative hypothesis.
• nan_policy: It is used to deal with the nan values and accept three values:
1. omit: It means calculating the IQR by ignoring the nan values.
2. propagate: It means returns nan values.
3. raise: It means to throw an error for the nan values.

The method `ttest_1samp` returns two float values, the `t-statistic` and `pvalue`.

Let’s take an example by following the below steps:

Import the required libraries `stats` from Scipy using the below code.

``````from scipy import stats
import numpy as np``````

Create a constructor to generate a random number using the below code.

``randomnum_gen = np.random.default_rng()``

Create two samples with identical means using the below code.

``````sample1 = stats.norm.rvs(loc=6, scale=15, size=1000, random_state=randomnum_gen)
sample2 = stats.norm.rvs(loc=6, scale=15, size=1000, random_state=randomnum_gen)``````

Calculate the `T-test` of independent samples that we have created above.

``stats.ttest_ind(sample1, sample2)``

From the above output result, we can reject or accept the null hypothesis based on statistics and p-value.

## Scipy Stats Fisher Exact

The `fisher exact` is a kind of statistical test of the nonrandom relation between two categorical variables. The Scipy has a method `fisher_exact()` for that kind of test.

The syntax is given below.

``scipy.stats.fisher_exact(table, alternative='two-sided')``

Where parameters are:

• table(array_data of type ints): It 2×2 table as input on which we want to perform the test.
• alternative: It is used to specify the alternative hypothesis. The alternative options are given below:
1. ‘two-sided’
2. ‘less’: one-sided
3. ‘greater’: one-sided

The method returns the two values `oddratio` and `p_value` of type float.

Let’s take an example by following the below steps:

Suppose we have a survey of the students in college about using the iPhone and Android phones based on gender, then we found the below data.

To find if there is a statistically significant association between gender and phones preference use the below codes.

Import the libraries using the below code.

``from scipy import stats``

Create the array of data for holding the survey information.

``survey_data = [[10,5],[5,11]]``

Perform the `fisher_exact()` function on this data to know the significance.

``stats.fisher_exact(survey_data)``

From the output, the p_value is greater than 0.05 so there is not enough evidence to say there is an association between gender and phones preference.

So, in this Scipy tutorial, we understood the requirement and use of Scipy Stats. And we have also covered the following topics.

• Scipy Stats
• Scipy Stats Lognormal
• Scipy Stats Norm
• Scipy Stats T-test
• Scipy Stats Pearsonr
• Scipy Stats chi-square
• Scipy Stats IQR
• Scipy Stats Poisson
• Scipy Stats Entropy
• Scipy Stats Anova
• Scipy Stats Anderson
• Scipy Stats Average
• Scipy Stats Alpha
• Scipy Stats Boxcox
• Scipy Stats Binom
• Scipy Stats Beta
• Scipy Stats Binomial test
• Scipy Stats Binned statistics
• Scipy Stats Binom pmf
• Scipy Stats CDF
• Scipy Stats Cauchy
• Scipy Stats Describe
• Scipy Stats Exponential
• Scipy Stats Gamma
• Scipy Stats Geometric
• Scipy Stats gmean
• Scipy Stats Gennorm
• Scipy Stats Genpareto
• Scipy Stats Gumbel
• Scipy Stats Genextreme
• Scipy Stats Histogram
• Scipy Stats Half normal
• Scipy Stats Half cauchy
• Scipy Stats Inverse gamma
• Scipy Stats Inverse normal CDF
• Scipy Stats Johnson
• Scipy Stats PDF
• Scipy Stats Hypergeom
• Scipy Stats Interval
• Scipy Stats ISF
• Scipy Stats Independent T-test
• Scipy Stats Fisher Exact