How to use Python Scipy Gaussian_Kde

In this tutorial, we will learn about the “Python Scipy Gaussian_Kde” to know how the “Python Scipy Gaussian_Kde” will be covered in this tutorial so that you may plot, integrate, resample, and other things with the gaussian KDE. Moreover, talk about the following subjects.

  • What is KDE?
  • Python Scipy Gaussian_Kde
  • Python Scipy Gaussian_Kde Bandwidth
  • Python Scipy Gaussian_Kde Singular Matrix
  • Python Scipy Gaussian_Kde Integrate
  • Python Scipy Gaussian_Kde Logpdf
  • Python Scipy Gaussian_Kde Plot
  • Python Scipy Gaussian_Kde PDF
  • Python Scipy Gaussian_Kde Resample

What is KDE?

Kernel density estimation (KDE) is a technique that, in some ways, takes the idea of a mixture of Gaussians to its logical conclusion. KDE employs a mixture with one Gaussian component per point, producing a density estimator that is fundamentally non-parametric.

The kernel, which determines the form of the distribution placed at each location, and the kernel bandwidth, which regulates the size of the kernel at each point, are the free parameters of kernel density estimation. There are numerous kernels available in practice that we may employ to estimate the kernel density.

Read: Python Scipy Lognormal

Python Scipy Gaussian_Kde

The Gaussian_Kde is the use of Gaussian kernels to represent a kernel-density estimate. The probability density function (PDF) of a random variable can be estimated in a non-parametric manner using kernel density estimation. Both single-variate and multi-variate data can be used with gaussian KDE.

Automatic bandwidth calculation is part of it. Bimodal or multimodal distributions are frequently over smooth; a unimodal distribution performs the estimation the best.

Python Scipy contains a class gaussian_kde() in a module scipy.stats to represent a kernel-density estimate vis Gaussian kernels.

The syntax is given below.

scipy.stats.gaussian_kde(dataset, bw_method=None, weights=None)

Where parameters are:

dataset(array_data): Estimating points based on data. This is a 1-D array when dealing with univariate data, otherwise, a 2-D array with shape.

bw_method(string): The approach is taken to determine the estimator bandwidth. This can be a callable, a scalar constant, “scott,” or “silverman.” If a scalar, this will be utilized as kde.factor right away. If it is a callable, it should only accept a gaussian kde instance and return a scalar. Scott is used if None (the default).

weights(array_data): The datapoints’ weights. The shape here must match the dataset’s shape. The samples are thought to be equally weighted if None (default).

Let’s take an example by following the below steps:

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import gaussian_kde

Create some random data in two dimensions using the manual function measure_mdl.

def measure_mdl(s):

    m1_ = np.random.normal(size=s)
    m2_ = np.random.normal(scale=0.5, size=s)
    return m1_+m2_, m1_-m2_

m1_, m2_ = measure_mdl(2000)
x_min = m1_.min()
x_max = m1_.max()
y_min = m2_.min()
y_max = m2_.max()

Using the data, estimate the kernel density using the below code.

X_, Y_ = np.mgrid[x_min:x_max:100j, y_min:y_max:100j]
positions_ = np.vstack([X_.ravel(), Y_.ravel()])
values_ = np.vstack([m1_, m2_])
kernel_ = gaussian_kde(values_)
Z_ = np.reshape(kernel_(positions_).T, X_.shape)
Python Scipy Gaussian Kde
Python Scipy Gaussian Kde

Graph the above data using the below code.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.imshow(np.rot90(Z_), cmap=plt.cm.gist_earth_r,
          extent=[x_min, x_max, y_min, y_max])
ax.plot(m1_, m2_, 'k.', markersize=2)
ax.set_xlim([x_min, x_max])
ax.set_ylim([y_min, y_max])
plt.show()
Python Scipy Gaussian Kde Example
Python Scipy Gaussian Kde Example

This is how to use Gaussian kernels to represent a kernel-density estimate using the method gaussian_kde() of Python Scipy.

READ:  NumPy Sum of Squares in Python [6 Methods]

Read: Python Scipy Butterworth Filter

Python Scipy Gaussian_Kde Bandwidth

When attempting to see your distributions, bandwidth choice is essential. Unfortunately, the majority of people simply call a regular function to create a density map without considering the bandwidth.

As a result, the plot can portray information incorrectly, which could result in false inferences. Let’s go over bandwidth selection in more detail and determine how to make your density graphs more accurate.

How bandwidth choice influences the smoothness of the plot.

  • Undersmoothing results from a narrow bandwidth: This implies that the density plot will resemble a collection of distinct peeks.
  • Oversmoothing results from a large bandwidth: It implies that any non-unimodal distribution characteristics will be hidden and the density plot will appear to represent a unimodal distribution.

Undersmoothing or over smoothing results from improper bandwidth selection. The reality is obscured to us in both situations. This problem prompts us to ask, “How can I determine a good bandwidth value in advance?”.

We require an algorithm that selects the ideal bandwidth value while avoiding both over- and under-smoothing. The name of such an algorithm is bandwidth selector, the Python Scipy accepts a parameter bw_method for this kind of algorithm with values like silverman and scott.

To get a better smoothing curve, we could have used any of the methods or algorithms in the above posts according to our need and these two algorithms is very famous for bandwidth selection.

Read: Python Scipy Butterworth Filter

Python Scipy Gaussian_Kde Singular Matrix

A matrix with a zero determinant is referred to as singular. Additionally, such a matrix lacks an inverse. Here in this section, we try to will compute the gaussian KDE using the singular matrix.

Import the required libraries or methods using the below python code.

import numpy as np
from scipy.stats import gaussian_kde

Create a singular matrix using the below code.

sing_mat = np.array([[3.,6.],[2.,4.]])

Now compute the gaussian KDE using the below code.

gaussian_kde(sing_mat)
Python Scipy Gaussian Kde Singular Matrix
Python Scipy Gaussian Kde Singular Matrix

From the output, it shows the error LinAlgError: singular matrix, so we can compute the gaussian KDE using the data which is a singular matrix in nature.

Raed: Python Scipy Derivative of Array

Python Scipy Gaussian_Kde Integrate

The gaussian_kde() has a method integrate_kde() to calculate the integral of the kernel density estimate’s product with another.

The syntax is given below.

gaussian_kde.integrate_kde(other)

Where parameter other is the instance of other KDE and the method returns the scalar values.

Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

%matplotlib inline
from scipy import stats
import numpy as np
import matplotlib.pyplot as pl

Generate sample data using the below code.

sample_data = [-19.41275116, -17.4594738, -17.4553103, -13.28406452, -10.77305,
        -10.48179997, -10.4761126, -9.7904519, -9.78305023, -9.44148,
         -7.85222277,  -7.8498553, -5.10130727,   1.55761078,   1.87479,
          1.88314794,   2.7612791]

Create an instance of Gaussian KDE using the below code.

bw_ = 1. / np.std(sample_data)
gkde = stats.gaussian_kde(dataset=sample_data, bw_method=bw_)

Now acces the method integrate_kde() and pass the above kde instance gkde as other instance to the method to comput the integral.

gkde.integrate_kde(gkde)
Python Scipy Gaussian Kde Integrate
Python Scipy Gaussian Kde Integrate

From the output, the integral value of the kernel density estimate is 0.0659, this is how to compute the integral of KDE using the method integrate_kde() of Python Scipy object gaussian_kde.

Read: Python Scipy Gamma

READ:  How to Create a Matrix in Python [5 Ways]

Python Scipy Gaussian_Kde Logpdf

The object gaussain_kde has a method logpdf() to compute log pdf using the provided data points.

Let’s take an example using the same code that we have used in the subsection “Python Scipy Gaussian_Kde”.

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import gaussian_kde

Create some random data in two dimensions using the manual function measure_mdl.

def measure_mdl(s):

    m1_ = np.random.normal(size=s)
    m2_ = np.random.normal(scale=0.5, size=s)
    return m1_+m2_, m1_-m2_

m1_, m2_ = measure_mdl(2000)
x_min = m1_.min()
x_max = m1_.max()
y_min = m2_.min()
y_max = m2_.max()

Using the data, estimate the kernel density using the below code.

X_, Y_ = np.mgrid[x_min:x_max:100j, y_min:y_max:100j]
positions_ = np.vstack([X_.ravel(), Y_.ravel()])
values_ = np.vstack([m1_, m2_])
kernel_ = gaussian_kde(values_)

Now compute the log pdf of the kernel_ by providing the data as values_ to method logpdf() using the below code.

print(kernel_.logpdf(values_))
Python Scipy Gaussian Kde Logpdf
Python Scipy Gaussian Kde Logpdf

This is how to compute the log pdf of the gaussian KDE using the method logpdf() of Python Scipy.

Read: Python Scipy Stats Norm

Python Scipy Gaussian_Kde Plot

We have already learned about how to compute Gaussian KDE and its parameters, here in this section, we will compute and plot the Gaussian KDE using the sample data.

Import the required libraries or methods using the below python code.

%matplotlib inline
from scipy import stats
import numpy as np
import matplotlib.pyplot as pl

Generate sample data using the below code.

sample_data = [-19.41275116, -17.4594738, -17.4553103, -13.28406452, -10.77305,
        -10.48179997, -10.4761126, -9.7904519, -9.78305023, -9.44148,
         -7.85222277,  -7.8498553, -5.10130727,   1.55761078,   1.87479,
          1.88314794,   2.7612791]

Create an instance of Gaussian KDE using the below code.

bw_ = 1. / np.std(sample_data)
gkde = stats.gaussian_kde(dataset=sample_data, bw_method=bw_)

Now Calculate the Gaussian KDE using the below code.

grid_size = 250
g_x_ = np.linspace(-25, 6, grid_size)
gkde_val = gkde(g_x_)

Plot the Gaussian KDE using the below code.

plt.plot(g_x_, gkde_val, label="Gaussian KDE Plot")
Python Scipy Gaussian Kde Plot
Python Scipy Gaussian Kde Plot

This is how to plot or graph the gaussian KDE using matplotlib library of Python with given or generated data.

Read: Python Scipy Kdtree

Python Scipy Gaussian_Kde PDF

The object gaussain_kde has a method pdf() to assess the predicted pdf using the provided data points.

Let’s take an example using the same code that we have used in the subsection “Python Scipy Gaussian_Kde”.

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import gaussian_kde

Create some random data in two dimensions using the manual function measure_mdl.

def measure_mdl(s):

    m1_ = np.random.normal(size=s)
    m2_ = np.random.normal(scale=0.5, size=s)
    return m1_+m2_, m1_-m2_

m1_, m2_ = measure_mdl(2000)
x_min = m1_.min()
x_max = m1_.max()
y_min = m2_.min()
y_max = m2_.max()

Using the data, estimate the kernel density using the below code.

X_, Y_ = np.mgrid[x_min:x_max:100j, y_min:y_max:100j]
positions_ = np.vstack([X_.ravel(), Y_.ravel()])
values_ = np.vstack([m1_, m2_])
kernel_ = gaussian_kde(values_)

Now compute the pdf of the kernel_ by providing the data as values_ to method pdf() using the below code.

print(kernel_.pdf(values_))
Python Scipy Gaussian Kde PDF
Python Scipy Gaussian Kde PDF

This is how to compute the probability density function of the gaussian KDE using the method pdf() of Python Scipy.

READ:  Python Tkinter Colors + Example

Read: Python Scipy Stats Kurtosis

Python Scipy Gaussian_Kde Resample

The object gaussian_kde has a method resample to draw a dataset at random from the calculated pdf.

The syntax is given below.

gaussian_kde.resample(size=None, seed=None)

Where parameters are:

  • size(int): The number of samples to be taken. The size is equal to the actual number of samples in the underlying dataset if the size is not specified.
  • seed(int, numpy.random.Generator): Numpy.random is used if the seed is None (or np.random). It uses a singleton of RandomState. A new RandomState instance is used and seeded with seed if the seed is an int. If the generator or random state instance that contains the seed already exists, it is used.

The method resample() returns the sample dataset of type ndarray.

Here we will use the sample example that we have done in the above subsection “Python Scipy Gaussian_Kde”.

Import the required libraries using the below python code.

import numpy as np
from scipy.stats import gaussian_kde

Create some random data in two dimensions using the manual function measure_mdl.

def measure_mdl(s):

    m1_ = np.random.normal(size=s)
    m2_ = np.random.normal(scale=0.5, size=s)
    return m1_+m2_, m1_-m2_

m1_, m2_ = measure_mdl(2000)
x_min = m1_.min()
x_max = m1_.max()
y_min = m2_.min()
y_max = m2_.max()

Using the data, estimate the kernel density and resample the data using the below code.

X_, Y_ = np.mgrid[x_min:x_max:100j, y_min:y_max:100j]
positions_ = np.vstack([X_.ravel(), Y_.ravel()])
values_ = np.vstack([m1_, m2_])
kernel_ = gaussian_kde(values_)

print(kernel_.resample())
Python Scipy Gaussian Kde Resample
Python Scipy Gaussian Kde Resample

This is how to resample to draw a dataset at random from the calculated pdf using the method resample() of Python Scipy object gaussian_kde().

You may also like to read the following Python SciPy tutorials.

Thus, in this tutorial, we learned about gaussian KDE, computed the log pdf, integrated the KDE, drew or resampled the KDE data values, and also plotted the graph of gaussian KDE. We also covered the topics listed below.

  • What is KDE?
  • Python Scipy Gaussian_Kde
  • Python Scipy Gaussian_Kde Bandwidth
  • Python Scipy Gaussian_Kde Singular Matrix
  • Python Scipy Gaussian_Kde Integrate
  • Python Scipy Gaussian_Kde Logpdf
  • Python Scipy Gaussian_Kde Plot
  • Python Scipy Gaussian_Kde PDF
  • Python Scipy Gaussian_Kde Resample