I’ve been working with Python for over a decade, and throughout my journey, I’ve explored numerous optimization techniques. One approach that has fascinated me is the genetic algorithm, a powerful method inspired by natural selection. When combined with Scikit-Learn, it offers a unique way to optimize machine learning models beyond traditional methods.
If you’re like me and want to explore genetic algorithms in Python, this tutorial will walk you through everything you need to know. We’ll keep it simple, practical, and focused on real-world applications.
Let’s begin.
What Is a Genetic Algorithm?
At its core, a genetic algorithm (GA) mimics biological evolution. Think of it as natural selection in code form, where potential solutions to a problem evolve. Instead of manually tuning parameters or relying solely on gradient-based methods, GAs explore a population of solutions, selecting the fittest, combining them, and introducing mutations to find optimal or near-optimal results.
I’ve found GAs especially useful when the search space is complex or non-differentiable, such as feature selection or hyperparameter tuning in machine learning.
Why Use Genetic Algorithms with Scikit-Learn?
Scikit-Learn is my go-to Python library for machine learning because of its clean API and extensive functionality. However, it doesn’t natively support genetic algorithms for optimization. Integrating GAs lets you:
- Optimize hyperparameters more creatively than grid or random search.
- Perform feature selection automatically.
- Solve complex constrained optimization problems.
This combination can significantly improve model performance, especially in business scenarios like predicting customer churn in telecom or optimizing marketing campaigns.
Get Started: Installing Required Libraries
Before getting in, ensure you have Python installed (I recommend Python 3.8+). Then, install Scikit-Learn and a genetic algorithm library like DEAP or sklearn-genetic that integrates well with Scikit-Learn:
pip install scikit-learn deapOr, for a more Scikit-Learn-friendly genetic algorithm wrapper:
pip install sklearn-geneticI prefer sklearn-genetic for its seamless integration.
Read Scikit-Learn Gradient Descent
Method 1: Use sklearn-genetic for Hyperparameter Optimization
This method is easy and feels native to Scikit-Learn users.
Step 1: Import Libraries and Prepare Data
Let’s say we want to optimize a Random Forest classifier to predict customer churn in a US telecom dataset.
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn_genetic import GAFeatureSelectionCV
from sklearn.metrics import accuracy_score
import warnings
warnings.filterwarnings("ignore") # Optional: to suppress warnings# Load dataset (replace with your own CSV if needed)
data = fetch_openml(name='adult', version=2, as_frame=True)
# Select only numeric columns
X = data.data.select_dtypes(include=['float64', 'int64'])
# Set target column (income classification)
y = data.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)Step 2: Set Up Genetic Algorithm Feature Selection
selector = GAFeatureSelectionCV(
estimator=RandomForestClassifier(n_estimators=100, random_state=42),
cv=5,
scoring='accuracy',
population_size=10, # You can set 50, but for faster testing, use 10
generations=5, # You can set 20, but for faster testing, use 5
n_jobs=-1,
verbose=True,
keep_top_k=5
)Step 3: Fit and Evaluate
# Fit the selector on training data
selector.fit(X_train, y_train)
# Get the names of selected features
selected_features = X.columns[selector.support_]
print("✅ Selected Features:", list(selected_features))
# Make predictions using selected features
y_pred = selector.predict(X_test)
# Evaluate the model
print("✅ Test Accuracy:", accuracy_score(y_test, y_pred))You can refer to the screenshot below to see the output.

This approach not only tunes hyperparameters but also selects the most relevant features, which is a big win for interpretability and performance.
Check out Scikit-Learn Non-Linear
Method 2: Custom Genetic Algorithm Using DEAP
For those who want more control, I’ve used the DEAP library to build custom GAs.
Step 1: Define the Problem
Suppose you want to optimize hyperparameters like n_estimators and max_depth of a Random Forest.
Step 2: Set Up DEAP Environment
import random
from deap import base, creator, tools
from sklearn.model_selection import cross_val_score
import numpy as np
# Define evaluation function
def eval_rf(individual):
n_estimators, max_depth = individual
clf = RandomForestClassifier(n_estimators=int(n_estimators), max_depth=int(max_depth), random_state=42)
score = cross_val_score(clf, X_train, y_train, cv=5, scoring='accuracy').mean()
return score,
Step 3: Configure GA Components
creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)
toolbox = base.Toolbox()
toolbox.register("attr_n_estimators", random.randint, 10, 200)
toolbox.register("attr_max_depth", random.randint, 1, 30)
toolbox.register("individual", tools.initCycle, creator.Individual,
(toolbox.attr_n_estimators, toolbox.attr_max_depth), n=1)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", eval_rf)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=[10,1], up=[200,30], indpb=0.2)
toolbox.register("select", tools.selTournament, tournsize=3)
Step 4: Run the Genetic Algorithm
population = toolbox.population(n=50)
NGEN = 20
for gen in range(NGEN):
offspring = toolbox.select(population, len(population))
offspring = list(map(toolbox.clone, offspring))
for child1, child2 in zip(offspring[::2], offspring[1::2]):
if random.random() < 0.5:
toolbox.mate(child1, child2)
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
if random.random() < 0.2:
toolbox.mutate(mutant)
del mutant.fitness.values
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
population[:] = offspring
best_ind = tools.selBest(population, 1)[0]
print(f"Best Parameters: n_estimators={best_ind[0]}, max_depth={best_ind[1]}")
This method requires more setup but gives you full flexibility to tailor the GA to your problem.
Tips From My Experience
- Always start with a smaller population and fewer generations to test your GA setup.
- Use parallel processing (
n_jobs=-1in Scikit-Learn or multiprocessing in DEAP) to speed up evaluation. - Monitor convergence; if the fitness stops improving, it might be time to stop early.
- For business problems in the USA, like predicting loan defaults or customer retention, GAs can uncover feature combinations that traditional methods might miss.
Genetic algorithms are a fantastic addition to your Python toolkit, especially when paired with Scikit-Learn. Whether you choose a ready-made library like sklearn-genetic or build your own with DEAP, you’ll gain a powerful method to optimize complex models.
You may like to read:

I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.