Python Scipy Smoothing

Have you ever wondered what is Python Scipy Smoothing? Here we will learn about “Python Scipy Smoothing” to smooth the curve using different filters or methods, also we will remove the noise from the noisy data by covering the following topics.

  • What is Data Smoothing?
  • Python Scipy Smoothing Spline
  • How to use the filter for smoothing
  • How to smooth the 1d data
  • How to remove noise from the data and make it smooth
  • How to control the smoothness using the method of smoothing factor
  • Python Scipy Smoothing 2d Data

What is Data Smoothing?

Data smoothing is the process of taking out noise from a data set using an algorithm. Important patterns can then be more easily distinguished as a result. Data smoothing can be used in economic analysis as well as to assist predict trends, such as those seen in securities prices. The purpose of data smoothing is to eliminate singular outliers and account for seasonality.

In the process of compiling data, any volatility or other types of noise can be eliminated or reduced. Data smoothing is the term for this.

Data smoothing is based on the notion that it can recognize simpler changes to assist in the prediction of various trends and patterns. It helps statisticians and traders who must examine a large amount of data—which is frequently difficult to comprehend to spot trends they might not otherwise notice.

The process of data smoothing can be carried out in a variety of ways. A few options are the randomization approach, conducting an exponential smoothing procedure, computing a moving average, or employing a random walk.

Also, check: Python Scipy Butterworth Filter

Python Scipy Smoothing Spline

Splines are mathematical functions that describe a collection of polynomials that are connected at particular locations known as spline knots.

This also indicates that the splines will produce a smooth function, avoiding sudden changes in slope. They are used to interpolate a set of data points with a function that exhibits continuity among the investigated range.

The Python Scipy has a class scipy.interpolate.UnivariateSpline() that fits a 1-D smoothing spline to an existing set of data points.

The syntax is given below.

class scipy.interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=3, s=None, ext=0, check_finite=False)

Where parameters are:

  • x(array_data, N): 1-dimensional array with separate input data. Must be rising; if s is 0, it must be rigorously increasing.
  • y(array_data, N): Dependant input data in a 1-D array that has the exact length as x.
  • w(N, array_data): Weights for fitting splines. It has to be positive. All weights are equivalent if w is None. No default is used.
  • bbox(array_data, 2): 2-sequence defining the approximation interval’s perimeter. Bbox is equal to [x[0], x[-1]] if bbox is None. No default is used.
  • k(int): The smoothing spline’s degree. K = 3 is a cubic spline, hence it must be 1 = k = 5. There is a 3.
  • s(float): The number of knots was determined by a positive smoothing factor. Till the smoothing condition is met, the number of knots will be increased.
  • ext(string, int): Determines how extrapolation is done for elements outside the range that the knot sequence has specified.
    • if ext=0 or ‘extrapolate’, return the extrapolated value.
    • if ext=1 or ‘zeros’, return 0
    • if ext=2 or ‘raise’, raise a ValueError
    • if ext=3 of ‘const’, return the boundary value.
  • check_finite(boolean): If it is necessary to verify that the input arrays only contain finite numbers. Disabling may improve performance, but if the inputs do contain infinities or NaNs, it may cause issues (crashes, non-termination, or illogical results). False is the default.

Let’s take an example and smooth noisy data by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

Generate x and y, and plot them using the below code.

rng_ = np.random.default_rng()
x_ = np.linspace(-4, 4, 50)
y_ = np.exp(-x_**2) + 0.1 * rng_.standard_normal(50)
plt.plot(x_, y_, 'ro', ms=5)
Python Scipy Smoothing Spline Example
Python Scipy Smoothing Spline Example

Smoothing the data using the method UnivariateSpline() with the default parameter values using the below code.

spline = interpolate.UnivariateSpline(x_, y_)
xs_ = np.linspace(-4, 4, 1000)
plt.plot(xs_, spline(xs_), 'g', lw=2)

Now again manually adjust the smoothing’s degree using the below code.

spline.set_smoothing_factor(0.5)
plt.plot(xs_, spline(xs_), 'b', lw=3)
plt.show()

The method set_smoothing_factor() that continue computing splines using the specified smoothing factor s and the knots discovered during the previous call.

Python Scipy Smoothing Spline
Python Scipy Smoothing Spline

This is how to smooth the data using the method UnivariateSpline() of Python Scipy.

Read: Python Scipy Stats Fit + Examples

Python Scipy Smoothing Filter

A digital filter called the Savitzky-Golay filter uses data points to smooth the graph. When using the least-squares method, a small window is created, the data in that window is subjected to a polynomial, and the polynomial is then used to determine the window’s center point.

Once all of the neighbors have been roughly adjusted with one another, the window is then shifted by one data point once more.

Python Scipy has a method savgol_filter() in a module scipy.signal that uses a Savitzky-Golay filter on an array.

The syntax is given below.

scipy.signal.savgol_filter(x, window_length, polyorder, deriv=0, delta=1.0, axis=- 1, mode='interp', cval=0.0)

Where parameters are:

  • x(array_data): Data that will be filtered. Before filtering, x will be transformed to type numpy.float64 if it is not a single-precision or double-precision floating-point array.
  • window_length(int): The filter window’s size. If the mode is “interp,” window length must be less than or equal to the size of x.
  • ployorder(int): The polynomial’s order, which was utilized to fit the data. Window length must be less than polyorder.
  • deriv(int): The derivative’s computation order. It must be an integer that is not negative. When the default value is 0, the data is filtered but not differentiated.
  • delta(float): The sample spacing that will be subjected to the filter. Only used when deriv > 0. 1.0 is the default.
  • axis(int): The direction along which the filter should be applied along the array’s x-axis. -1 is the default.
  • mode(string): Must be “interp,” “wrap,” “nearest,” “constant,” or “mirror.” Based on this, the kind of extension to apply to the padded signal before applying the filter is decided. When the mode is set to “constant,” cval provides the padding value. For further information on “mirror,” “constant,” “wrap,” and “nearest,” refer to the Notes. No extension is utilized when the default “interp” mode is chosen. Instead, a degree polyorder polynomial is fitted to the edge’s last window length values, and this polynomial is then used to calculate the output values for last window length / 2.
  • cval(scalar): If mode is “constant,” value to fill the input beyond the edges. The default is 0.

The method savgol_filter() returns filtered data.

Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import signal

Generate noisy data and plot the data using the below code.

x_ = np.linspace(0,2*np.pi,200)
y_ = np.sin(x_) + np.random.random(200) * 0.2
plt.plot(x_, y_)

Now apply the Savitzky-Golay filter to the noisy data to smooth it.

yhat_ = signal.savgol_filter(y_, 49, 3)

plt.plot(x_, y_)
plt.plot(x_,yhat_, color='green')
plt.show()
Python Scipy Smoothing Filter
Python Scipy Smoothing Filter

This is how to apply the Savitzky-Golay filter to the noisy data to smooth the data using the method savgol_filter() of Python Scipy.

Read: Python Scipy Stats Norm

Python Scipy Smoothing 1d

The method interp1d() of Python Scipy in a module scipy.interpolate that is used for 1-D function interpolation. Arrays of values called x and y are used to approximate a function called f: y = f.

The function returned by this class employs interpolation in its call method to determine the value of new points.

The syntax is given below.

class scipy.interpolate.interp1d(x, y, bounds_error=None,kind='linear', axis=- 1, copy=True)

Where parameters are:

  • x(array_data):A real values 1-D array.
  • y(array_data): A real value N-D array. In the interpolation axis, the length of y must match the length of x.
  • kind(str): Specifies the type of interpolation in the form of a string or an integer, along with the order of the spline interpolator to be used. The string must fall into one of the following categories: linear, nearest, nearest-up, zero, slinear, quadratic, cubic, previous, or next. The terms “zero,” “slinear,” “quadratic,” and “cubic” denote spline interpolations of the zeroth, first, second, or third order; “previous,” “next,” and “nearest” simply return the prior or next value of the point; “nearest-up,” which rounds up, and “nearest,” which rounds down, are used when interpolating half-integers (such as 0.5, 1.5). Linear is the default.
  • axis(int): Specifies the y-axis that will be used for interpolation. The final axis of y is the interpolation’s default.
  • copy(boolean): If True, x and y are internally copied by the class. If False, x and y references are used. Copy is the default action.
  • bounds_error(boolean): If True, any attempt to interpolate a value outside of the range of x results in a ValueError (where extrapolation is necessary). If False, the fill value is allocated to out-of-bounds values. Errors are raised by default unless the fill value=”extrapolate” is specified.

Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

Create x and y data and interpolate using the below code.

x_ = np.arange(0, 15)
y_ = np.exp(-x_/4.0)
f_ = interp1d(x_, y_)

Plot the computed values using the below code.

xnew_ = np.arange(0, 10, 0.1)
ynew_ = f_(xnew_) 
plt.plot(x_, y_, 'o', xnew_, ynew_, '-')
plt.show()
Python Scipy Smoothing 1d
Python Scipy Smoothing 1d

This is how to use the method interp1d() of Python Scipy to compute the smooth values of the 1d functions.

Read: Python Scipy Stats Skew

Python Scipy Smoothing Noisy Data

In Python Scipy, LSQUnivariateSpline() is an additional spline creation function. It functions practically in a manner similar to UnivariateSpline(), as we shall see.

This function’s primary distinction from the preceding one is that with the help of, it is possible to directly regulate the number and position of knots while creating spline curves.

The syntax is given below.

class scipy.interpolate.LSQUnivariateSpline(x, y, t, w=None, bbox=[None, None], k=3, ext=0, check_finite=False)

Where parameters are:

  • x(array_data): Data points’ input dimensions must be increasing.
  • y(array_data): Dimension of the input data points.
  • t(array_data): Inside-of-the-spline knots, the order must be ascending.
  • w(array_data): Weights for fitting splines. It must be uplifting. If None, all weights are equal (default).
  • bbox(array_data): 2-sequence defining the approximation interval’s perimeter. If None, bbox is equal to [x[0], x[-1]].
  • k(int): The smoothing spline’s degree. It must be 1 k 5. K = 3, a cubic spline, is the default.
  • ext(string,int): Determines how extrapolation is done for elements outside the range that the knot sequence has specified.
    • if ext=0 or ‘extrapolate’, return the extrapolated value.
    • if ext=1 or ‘zeros’, return 0
    • if ext=2 or ‘raise’, raise a ValueError
    • if ext=3 of ‘const’, return the boundary value.
  • check_finite(boolean): If it is necessary to verify that the input arrays only contain finite numbers. Disabling may improve performance, but if the inputs do contain infinities or NaNs, it may cause issues (crashes, non-termination, or illogical results). False is the default.

Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

Create x and y, then plot them using the code below.

rng_ = np.random.default_rng()
x_ = np.linspace(-4, 4, 50)
y_ = np.exp(-x_**2) + 0.1 * rng_.standard_normal(50)
plt.plot(x_, y_, 'ro', ms=5)

Fit a smoothing spline with predetermined internal knots using the below code.

t_ = [-1, 0, 1]
spline = interpolate.LSQUnivariateSpline(x_, y_, t_)
xs_ = np.linspace(-4, 4, 1000)
plt.plot(x_, y_, 'ro', ms=5)
plt.plot(xs_, spline(xs_), 'g-', lw=3)
plt.show()
Python Scipy Smoothing Noisy Data
Python Scipy Smoothing Noisy Data

This is how to create a smooth curve by removing noise from noisy data using the method LSQUnivariateSpline() of Python Scipy.

Read: Python Scipy Stats Kurtosis

Python Scipy Smoothing Factor

The class scipy.interpolate.UnivariateSpline() has a method set_smoothing_factor(s) that continually compute splines using the knots discovered in the previous call and the smoothing factor s that are provided.

Let’s take an example and use the method set_smoothing_factor() by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

Generate x and y, and plot them using the below code.

x_ = np.linspace(-4, 4, 50)
y_ = np.sin(x_) + np.random.random(50) * 0.8
plt.plot(x_, y_, 'ro', ms=5)
Python Scipy Smoothing Factor Example
Python Scipy Smoothing Factor Example

Using the code below, smooth the data using the UnivariateSpline() function using the default parameter values.

spline = interpolate.UnivariateSpline(x_, y_)
xs_ = np.linspace(-4, 4, 500)
plt.plot(xs_, spline(xs_), 'g', lw=2)

Now use the method set_smoothing_factor(0.7) to adjust the smoothness of the data using the below code.

spline.set_smoothing_factor(0.7)
plt.plot(xs_, spline(xs_), 'b', lw=3)
plt.show()
Python Scipy Smoothing Factor
Python Scipy Smoothing Factor

This is how to adjust the smoothness of the data using the method set_smoothing_factor of Python Scipy.

Read: Python Scipy Stats Multivariate_Normal

Python Scipy Smoothing 2d Data

The Python Scipy has a method interp2d() in a module scipy.interpolate that uses a 2-D grid for interpolation. Arrays of values x, y, and z are used to approximate a function f: z = f(x, y) yields a scalar value z.

This class gives a function that uses spline interpolation in its call method to determine the value of newly created points.

The bare minimum of data points needed along the axis of interpolation is (k+1)**2, where k is equal to 1 for linear interpolation, 3 for cubic interpolation, and 5 for quintic interpolation.

Bisplrep is used to build the interpolator, and a smoothing factor of 0 is used. Direct usage of bisplrep is advised if additional smoothing control is required.

The syntax is given below.

class scipy.interpolate.interp2d(x, y,  bounds_error=False,z, kind='linear', copy=True, fill_value=None)

Where parameters are:

  • xy(array_data): Coordinates for data points are defined using arrays. If the points are on a regular grid, x and y can be used to define the column and row coordinates, respectively.
  • z(array_data): the interpolation values for the function at the data points. Assuming Fortran-ordering (order=’F’), z is flattened before usage if it is a multidimensional array. If x and y give the column and row coordinates, the length of a flattened z array is len(x)*len(y), or len(z) == len(x) == len(y).
  • kind(quintic, cubic, linear): The appropriate kind of spline interpolation. It defaults to “linear.”
  • copy(boolean): If True, x, y, and z are internally copied by the class. References are allowed if False. Copying is the default.
  • bounds_errror(boolean): If this value is True, a ValueError is raised whenever interpolated values are expected outside of the input data’s (x, y) domain. If False, fill value is employed.
  • fill_value(number): The value to use, if given, for points outside the interpolation domain. Values beyond the domain are extrapolated using nearest-neighbor extrapolation if missing (None).

Let’s take an example by following the below steps:

Import the required libraries or methods using the below python code.

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate

Create a 2-dimensional grid using the below code.

x_ = np.arange(-4.01, 4.01, 0.20)
y_ = np.arange(-4.01, 4.01, 0.20)
xx_, yy_ = np.meshgrid(x_, y_)
z_ = np.sin(xx_**2+yy_**2)

Interpolate the above-crated data using the below code.

f_ = interp2d(x_, y_, z_, kind='cubic')

Plot the outcome using the interpolation function we just obtained using the below code:

xnew_ = np.arange(-4.01, 4.01, 1e-2)
ynew_ = np.arange(-4.01, 4.01, 1e-2)
znew_ = f_(xnew_, ynew_)
plt.plot(x_, z_[0, :], 'ro-', xnew_, znew_[0, :], 'b-')
plt.show()
Python Scipy Smoothing 2d Data
Python Scipy Smoothing 2d Data

This is how to create smoothness in 2d data using the method interp2d() of Python Scipy.

Also, take a look at some more Python SciPy tutorials.

In this python tutorial we learned, how to make smooth curves using different filters, and methods, and also how to remove the noise from the data with the following topics.

  • What is Data Smoothing?
  • Python Scipy Smoothing Spline
  • How to use the filter for smoothing
  • How to smooth the 1d data
  • How to remove noise from the data and make it smooth
  • How to control the smoothness using the method of smoothing factor
  • Python Scipy Smoothing 2d Data