Scikit learn Genetic algorithm

In this Python tutorial, we will learn How scikit learn Genetic algorithm works, and we will also cover different examples related to Genetic algorithms. Moreover, we will cover these topics.

  • Scikit learn genetic algorithm
  • Scikit learn genetic opt
  • Scikit learn genetic algorithm feature selection
  • Scikit learn genetic selection cv
  • Scikit learn genetic algorithm advantages and disadvantages

Scikit learn genetic algorithm

In this section, we will learn how scikit learn genetic algorithm works in python.

  • Before moving forward we should have some piece of knowledge about genetics. Genetic is defined as biological evolution or concerned with genetic varieties.
  • Genetic algorithms completely focus on natural selection and easily solve constrained and unconstrained escalation or we can say that optimization problem.

Code:

In the following code, we will import GeneticSelectionCv from which we can select the feature from the dataset.

  • from __future__ import print_function is used to bring the print function from python 3 into python 2.6.
  • x = num.hstack((iris.data, e)) is used to stack the sequence of input array column-wise.
  • selectors = selectors.fit(x, Y) is used to fit the data into model.
  • print(selectors.support_) is used to print the selected data.
from __future__ import print_function
import numpy as num
from sklearn import datasets, linear_model

from genetic_selection import GeneticSelectionCV


def main():
    iris = datasets.load_iris()

    # Some noisy data not correlated
    e = num.random.uniform(0, 0.2, size=(len(iris.data), 30))

    x = num.hstack((iris.data, e))
    Y = iris.target

    estimators = linear_model.LogisticRegression(solver="liblinear", multi_class="ovr")

    selectors = GeneticSelectionCV(estimators,
                                  cv=6,
                                  verbose=2,
                                  scoring="accuracy",
                                  max_features=6,
                                  n_population=60,
                                  crossover_proba=0.6,
                                  mutation_proba=0.2,
                                  n_generations=50,
                                  crossover_independent_proba=0.6,
                                  mutation_independent_proba=0.06,
                                  tournament_size=4,
                                  n_gen_no_change=20,
                                  caching=True,
                                  n_jobs=-2)
    selectors = selectors.fit(x, Y)

    print(selectors.support_)


if __name__ == "__main__":
    main()

Output:

After running the above code, we get the following output in which we can see that the selected features are printed on the screen.

scikit learn genetic algorithm
scikit learn genetic algorithm

Also, check: Scikit-learn Vs Tensorflow – Detailed Comparison

Scikit learn genetic opt

In this section, we will learn about how genetic opt works in python.

  • Scikit learn genetic opt is defined as observed the set of parameters that optimizes cross-validation metrics.
  • It uses the evolutionary algorithms that select the feature and design the different classification or regression models.

Code:

In the following code, we will import different libraries from which we can find the classification problem with the help of genetic opt.

  • GASearchCV is used to run the fitting process using evolutionary algorithm.
  • data = load_digits() is used to load the data.
  • x = data.images.reshape((nsample, -1)) is used to reshape the data images.
  • cv = StratifiedKFold(n_splits=3, shuffle=True) is used as a cross-validation strategy is could be just int.
  • evolved_estimator.fit(x_train, y_train) is used to train and optimize the estimators.
import matplotlib.pyplot as plot
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Categorical, Integer, Continuous
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
data = load_digits()
nsample = len(data.images)
x = data.images.reshape((nsample, -1))
y = data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

_, axes = plot.subplots(nrows=1, ncols=4, figsize=(10, 3))
for axis, image, label in zip(axes, data.images, data.target):
    axis.set_axis_off()
    axis.imshow(image, cmap=plot.cm.gray_r, interpolation='nearest')
    axis.set_title('Training: %i' % label)
    param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform'),
              'bootstrap': Categorical([True, False]),
              'max_depth': Integer(2, 30),
              'max_leaf_nodes': Integer(2, 35),
              'n_estimators': Integer(100, 300)}

classifier = RandomForestClassifier()


cv = StratifiedKFold(n_splits=3, shuffle=True)

# The main class from sklearn-genetic-opt
evolved_estimator = GASearchCV(estimator=classifier,
                              cv=cv,
                              scoring='accuracy',
                              param_grid=param_grid,
                              n_jobs=-1,
                              verbose=True)

evolved_estimator.fit(x_train, y_train)

Output:

After running the above code, we get the following output in which we can see that the data is fitted with the help of GASearchCV and printed on the screen.

scikit learn genetic Opt
scikit learn genetic opt

Read: Scikit-learn logistic regression

Scikit learn genetic algorithm feature selection

In this section, we will learn how scikit learn genetic algorithm feature selection works in python.

  • Feature selection is defined as a process that decreases the number of input variables when the predictive model is developed by the developer.
  • A genetic algorithm is a process of natural selection for the optimal value of problems.

Code:

In the following code, we will import some libraries by which we can select the features with the help of the genetic selection function.

  • data = load_breast_cancer() is used to load the breast cancer dataset.
  • dataframe = pds.DataFrame(data.data, columns=data.feature_names) is used to import the dataset.
  • models = models.fit(x, y) is used to fit the data into model.
  • print(‘Feature Selection:’, x.columns[models.support_]) is used to print the selected feature on the screen.
from sklearn.datasets import load_breast_cancer
from genetic_selection import GeneticSelectionCV
from sklearn.tree import DecisionTreeClassifier
import pandas as pds
import numpy as num
data = load_breast_cancer()
dataframe = pds.DataFrame(data.data, columns=data.feature_names)
dataframe['target'] = data.target
x = dataframe.drop(['target'], axis=1)
y = dataframe['target'].astype(float)
estimators = DecisionTreeClassifier()
models = GeneticSelectionCV(
    estimators, cv=5, verbose=0,
    scoring="accuracy", max_features=5,
    n_population=100, crossover_proba=0.5,
    mutation_proba=0.2, n_generations=50,
    crossover_independent_proba=0.5,
    mutation_independent_proba=0.04,
    tournament_size=3, n_gen_no_change=10,
    caching=True, n_jobs=-1)
models = models.fit(x, y)
print('Feature Selection:', x.columns[models.support_])

Output:

After running the above code, we get the following output in which we can see that the feature selection is printed on the screen.

scikit learn genetic algorithm feature selection
scikit learn genetic algorithm feature selection

Read: Scikit learn Decision Tree

Scikit learn genetic selection cv

In this section, we will learn about how scikit learn genetic selection cv works in python.

The scikit learn genetic selection is defined as a process of natural selection to explore the best value of the function

Code:

In the following code, we will import GeneticSelectionCV from genetic_selection by which the best feature is selected by the selector.

  • from __future__ import print_function is used to bring the print function from python 3 into python 2.6.
  • e = np.random.uniform(0, 0.1, size=(len(iris.data), 20)) is used to generate the random numbers uniformly.
  • x = np.hstack((iris.data, e)) is used to stack the sequence of input array column wise.
  • GeneticSelectionCV() is used to generate at random from the sample space of the feature set.
  • selectors = selectors.fit(x, y) is used to fit the model.
  • print(selectors.support_) is used to print the selected data.
from __future__ import print_function
import numpy as num
from sklearn import datasets, linear_model

from genetic_selection import GeneticSelectionCV


def main():
    iris = datasets.load_iris()
    e = np.random.uniform(0, 0.1, size=(len(iris.data), 20))

    x = np.hstack((iris.data, e))
    y = iris.target

    estimators = linear_model.LogisticRegression(solver="liblinear", multi_class="ovr")

    selectors = GeneticSelectionCV(estimators,
                                  cv=10,
                                  verbose=4,
                                  scoring="accuracy",
                                  max_features=8,
                                  n_population=70,
                                  crossover_proba=0.7,
                                  mutation_proba=0.4,
                                  n_generations=80,
                                  crossover_independent_proba=0.7,
                                  mutation_independent_proba=0.07,
                                  tournament_size=5,
                                  n_gen_no_change=20,
                                  caching=True,
                                  n_jobs=-4)
    selectors = selectors.fit(x, y)

    print(selectors.support_)


if __name__ == "__main__":
    main()

Output:

After running the above code, we get the following output in which we can see that the selected data is printed on the screen.

scikit learn genetic selection cv
scikit learn genetic selection cv

Read: Scikit learn Hidden Markov Model

Scikit learn genetic algorithm advantages and disadvantages

In this section, we will learn about scikit learn genetic algorithm advantages and disadvantages in python.

Advantages:

  • Genetic Algorithm is easy to understand the person can easily understand what is happening in this algorithm.
  • Genetic Algorithm is very good for noisy environments.
  • A Genetic Algorithm is searched from the set of chromosomes or population of points but not a single point.
  • The Genetic Algorithm uses the probabilistic transition rule not use of the deterministic rule.
  • Genetic Algorithms can easily be parallelized.
  • Genetic Algorithm can work easily or well on continuous or discrete problems.
  • Genetic Algorithms support multiple objective optimization.
  • The Genetic Algorithm is probabilistic, time-dependent, nonlinear, non-stationary.
  • Genetic Algorithm requires less information.
  • Genetic Algorithm using chromosomes.

Disadvantage:

  • Genetic Algorithm requires or needs a special definition.
  • Genetic Algorithm requires less information about the problem but the operators write and the representation getting is difficult.
  • Genetic Algorithm has computational complexity.
  • The Genetic Algorithm is very time-consuming.

You may also like to read the following tutorials on Scikit learn.

So, in this tutorial we discussed the scikit learn Genetic algorithm and we have also covered different examples related to its implementation. Here is the list of examples that we have covered.

  • Scikit learn genetic algorithm
  • Scikit learn genetic opt
  • Scikit learn genetic algorithm feature selection
  • Scikit learn genetic selection cv
  • Scikit learn genetic algorithm advantages and disadvantages