Non-Linear Models With Scikit-Learn

As a Python developer with over a decade of experience, I’ve seen firsthand how crucial it is to handle complex, non-linear relationships in data. While linear models are easy and often a good starting point, real-world data, especially from diverse US markets like finance, healthcare, or marketing, rarely follows a simple straight line. That’s where non-linear modeling with Scikit-Learn comes in.

In this article, I’ll walk you through non-linear models using Scikit-Learn, sharing practical methods and insights that I’ve gathered over the years. Whether you’re working with housing prices in California or predicting customer churn for a telecom company in New York, understanding these techniques will elevate your machine-learning projects.

This Tutorial Covers:

Non-Linear Models

Non-linear models are algorithms that can capture complex relationships between features and target variables that don’t fit a straight line. Unlike linear models, which assume a direct proportionality, non-linear models can curve, twist, and adapt to the underlying data patterns.

In practical terms, think about predicting the value of a house. The relationship between house size and price might be linear up to a point, but then plateau or spike due to location, age, or other factors. Non-linear models help capture those nuances.

Common Non-Linear Models in Scikit-Learn

Let me introduce you to some powerful non-linear models I frequently use:

1. Decision Trees

Decision Trees are simple yet powerful models used for classification tasks. They split data based on feature thresholds to make predictions, making them easy to interpret and visualize.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Sample classification dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split and scale
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Decision Tree Classifier
model = DecisionTreeClassifier(max_depth=5, random_state=42)
model.fit(X_train_scaled, y_train)

# Predict and evaluate
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

You can see the output in the screenshot below.

I often use decision trees for customer segmentation in marketing campaigns because they naturally handle categorical and continuous data.

Read Scikit-Learn Confusion Matrix

2. Random Forests

Random Forests are powerful ensemble methods that combine multiple decision trees to produce more accurate and stable predictions.

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Step 1: Load dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Step 2: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 4: Train Random Forest model
model = RandomForestRegressor(n_estimators=100, max_depth=7, random_state=42)
model.fit(X_train_scaled, y_train)

# Step 5: Predict and evaluate
predictions = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R² Score: {r2:.4f}")

You can see the output in the screenshot below.

This Random Forest model learns from multiple customer features and generates robust credit risk predictions by averaging the outcomes of 100 decision trees

Check out Scikit-Learn Gradient Descent

3. Support Vector Machines (SVM) with Non-Linear Kernels

SVMs can classify data by finding the best boundary. Using kernels like RBF (Radial Basis Function), they map data into higher dimensions to handle non-linear separations.

# 3. Support Vector Machines (SVM) with Non-Linear Kernels

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load a real-world regression dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split and scale the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Apply SVM with RBF kernel
model = SVR(kernel='rbf', C=100, gamma=0.1)
model.fit(X_train_scaled, y_train)

# Predict and evaluate
predictions = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse:.2f}")

You can see the output in the screenshot below.

I’ve applied SVMs to detect fraudulent transactions where patterns are non-linear and subtle.

4. Polynomial Regression

Polynomial regression extends linear regression by adding polynomial terms, allowing the model to fit curves.

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

degree = 3
model = make_pipeline(PolynomialFeatures(degree), LinearRegression())
model.fit(X_train, y_train)
predictions = model.predict(X_test)

This method works well for modeling energy consumption patterns that fluctuate seasonally across US states.

5. Gradient Boosting Machines (GBM)

GBM builds models sequentially to correct previous errors, excelling at capturing complex relationships.

from sklearn.ensemble import GradientBoostingRegressor

model = GradientBoostingRegressor(n_estimators=200, learning_rate=0.1, max_depth=5)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

I frequently use GBM for sales forecasting in retail chains across the US, where demand patterns are highly non-linear.

How to Choose the Right Non-Linear Model?

Choosing the right model depends on your dataset size, feature types, and interpretability needs.

For interpretability, decision trees are great.
For accuracy and robustness, random forests or gradient boosting are preferred.
For smaller datasets with complex boundaries, SVMs shine.
Polynomial regression is good when you suspect a smooth curve relationship.

Tips for Working with Non-Linear Models in Scikit-Learn

Feature Scaling: Algorithms like SVM require scaling features using StandardScaler or MinMaxScaler.
Hyperparameter Tuning: Use GridSearchCV or RandomizedSearchCV to find the best parameters.
Cross-Validation: Always validate your model with k-fold cross-validation to avoid overfitting.
Interpretability Tools: Use SHAP or partial dependence plots to understand model predictions.

Non-linear models are essential when dealing with real-world data complexities. Scikit-Learn makes it easy to implement these models with clean and consistent APIs. Whether you’re analyzing housing trends in Chicago or optimizing marketing strategies in Miami, mastering these techniques will boost your data science projects.

If you’re new to non-linear modeling, start experimenting with decision trees and random forests. As you grow comfortable, explore SVMs, polynomial regression, and gradient boosting to tackle more challenging problems.

Keep practicing, keep experimenting, and soon you’ll be harnessing the full power of non-linear modeling in Python with Scikit-Learn.

Other Skicit-learn articles you may read:

Bijay Kumar

Bijay Kumar is an experienced Python and AI professional who enjoys helping developers learn modern technologies through practical tutorials and examples. His expertise includes Python development, Machine Learning, Artificial Intelligence, automation, and data analysis using libraries like Pandas, NumPy, TensorFlow, Matplotlib, SciPy, and Scikit-Learn. At PythonGuides.com, he shares in-depth guides designed for both beginners and experienced developers. More about us.

enjoysharepoint.com/

Non-Linear Models with Scikit-Learn

Non-Linear Models

Common Non-Linear Models in Scikit-Learn

1. Decision Trees

2. Random Forests

3. Support Vector Machines (SVM) with Non-Linear Kernels

4. Polynomial Regression

5. Gradient Boosting Machines (GBM)

How to Choose the Right Non-Linear Model?

Tips for Working with Non-Linear Models in Scikit-Learn

51 PYTHON PROGRAMS PDF FREE

Aspiring to be a Python developer?

Let’s be friends