In this Python tutorial, we will learn How to create a scikit learn Random Forest in Python and we will also cover different examples related to Random Forest. And, we will cover these topics.
- Scikit learn random forest
- Scikit learn random forest example
- Scikit learn random forest regression
- Scikit learn random forest hyperparameter tunning
- Scikit learn random forest feature importance
- Scikit learn random forest categorical variable
- Scikit learn random forest cross-validation
- Scikit learn random forest visualization
Scikit learn random forest
In this section, we will learn How to make a scikit learn random forest in python.
- A Random Forest is a supervised machine learning model for solving regression or classification problems.
- It is a technique that combines many classifiers to solve difficult or complex problems in an easy way.
- It generated the outcome based on the prediction of the decision tree that predicts by taking the mean and average of the output from the many trees.
- It also has a limitation it reduces the overfitting of the dataset and increases the precision.
Also, check: Scikit-learn logistic regression
Scikit learn random forest example
In this section, we will learn about How to create a scikit learn random forest examples in python.
- Random Forest is a supervised machine learning model used for classification, regression, and all so other tasks using decision trees.
- Random Forest produces a set of decision trees that randomly select the subset of the training set.
Code:
In the following code, we will import the dataset from sklearn and create a random forest classifier.
- iris = datasets.load_iris() is used to load the iris dataset.
- X, y = datasets.load_iris( return_X_y = True) is used to divide the dataset into two parts training dataset and testing dataset.
- from sklearn.model_selection import train_test_split is used to slitting an array in a random train or test subset.
- X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30) in this there are 70% training dataset and 30% test dataset.
from sklearn import datasets
iris = datasets.load_iris()
print(iris.target_names)
print(iris.feature_names)
X, y = datasets.load_iris( return_X_y = True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30)
After running the above code we get the following output in which we can see that the feature of the dataset is shown on the screen.
print(data.head()) is used to print the first five rows of the dataset on the screen.
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
data = pd.DataFrame({'sepallength' : iris.data[:,0],'sepalwidth' : iris.data[:,1], 'petallength' : iris.data[:,2], 'petalwidth' : iris.data[:,3],'species' : iris.target})
print(data.head())
After running the above code we get the following output in which we can see that the first five rows are printed on the screen.
- classifier = RandomForestClassifier(n_estimators = 100) is used to creating a random forest classifier.
- classifier.fit(X_train, y_train) is used to fit function to train the model using the training set.
- y_pred = classifier.predict(X_test) is used to performing the prediction on the test dataset.
- from sklearn import metrics is used to find the accuracy or error from metrics.
- print(“Accuracy of the model: “, metrics.accuracy_score(y_test, y_pred)) is used to print the accuracy of model after calculation.
classifier = RandomForestClassifier(n_estimators = 100)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn import metrics
print()
print("Accuracy of the model: ", metrics.accuracy_score(y_test, y_pred))
In this picture, we can see that the accuracy of the random forest model.
Read: Scikit-learn Vs Tensorflow – Detailed Comparison
Scikit learn random forest regression
In this section, we will learn about scikit learn random forest regression in python.
Random Forest is a supervised machine learning algorithm is a technique that merges many classifiers to provide solutions to hard problems it a resemble method of regression.
Code:
In the following code, we will import sklearn library from which we can create a random forest regression.
- x, y = make_regression(n_features=4, n_informative=2,random_state=0,shuffle=False) is used to make the random forest regression.
- print(regression.predict([[0, 0, 0, 0]])) is used to predict the regression.
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import make_regression
x, y = make_regression(n_features=4, n_informative=2,
random_state=0, shuffle=False)
regression = RandomForestRegressor(max_depth=2, random_state=0)
regression.fit(x, y)
RandomForestRegressor(...)
print(regression.predict([[0, 0, 0, 0]]))
Output:
After running the above code we get the following output in which we can see random forest regression prediction.
Read: Scikit learn Decision Tree
Scikit learn random forest hyperparameter tunning
In this section, we will learn about how to make scikit learn random forest hyperparameter tunning in python.
Random Forest hyperparameter tunning involve a number of the decision tree in the forest and the number of features considered by each tree while they are slit into different parts.
Code:
In the following code, we will import RandomForestRegressor from sklearn.esemble and also import print from print.
The print(‘Parameters currently in use:\n’) is used to print the current parameter used by the current forest.
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(random_state = 42)
from pprint import pprint
print('Parameters currently in use:\n')
pprint(rf.get_params())
Output:
After running the above code we get the following output in which we can see that the current parameter is printed on the screen.
- n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)] it explain the number of tree in random forest.
- max_features = [‘auto’, ‘sqrt’] it is number of features to consider at every split.
- max_depth = [int(x) for x in np.linspace(10, 110, num = 11)] is maximum number of level in tree.
- min_samples_split = [2, 5, 10] is a minimum number of sample required to split a node.
- min_samples_leaf = [1, 2, 4] is a minimum number of sample required at each leaf node.
from sklearn.model_selection import RandomizedSearchCV
import numpy as np
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num = 11)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4]
bootstrap = [True, False]
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap}
pprint(random_grid)
After running the above code, we get the following output in which we can see that a scikit learn hyperparameter tunning is created on the screen.
Read: Scikit learn accuracy_score
Scikit learn random forest feature importance
In this section, we will learn about how to create scikit learn random forest feature importance in python.
- Feature importance is the best way to describe the complete process. It describes which feature is relevant and which is not.
- It also helps to understand the solved problem in a better way and sometimes conduct the model improvement by use of feature selection.
Code:
In the following code, we will import some libraries for making a random forest and describe the feature with the help of feature importance.
- classifier.predict([[3, 3, 2, 2]]) is used to predict which type of flower is it.
- classifier = RandomForestClassifier(n_estimators = 100) is used to create a random forest classifier.
- classifier.fit(X_train, y_train) is used to train the model by the training set.
- feature_imp = pd.Series(classifier.feature_importances_,index=iris.feature_names).sort_values(ascending = False) using feature importance variable for describing the feature of the model.
from sklearn import datasets
iris = datasets.load_iris()
print(iris.target_names)
print(iris.feature_names)
X, y = datasets.load_iris( return_X_y = True)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30)
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
data = pd.DataFrame({'sepallength' : iris.data[:,0],'sepalwidth' : iris.data[:,1], 'petallength' : iris.data[:,2], 'petalwidth' : iris.data[:,3],'species' : iris.target})
print(data.head())
classifier = RandomForestClassifier(n_estimators = 100)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn import metrics
print()
print("Accuracy of the model: ", metrics.accuracy_score(y_test, y_pred))
classifier.predict([[3, 3, 2, 2]])
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 100)
classifier.fit(X_train, y_train)
import pandas as pd
feature_imp = pd.Series(classifier.feature_importances_, index = iris.feature_names).sort_values(ascending = False)
feature_imp
Output:
After running the above code, we get the following output in which we can see the feature of the model is described by feature importance.
Read: Scikit learn Hidden Markov Model
Scikit learn random forest categorical variable
In this section, we will learn about how to make scikit learn random forest categorical variables.
- A categorical variable is a variable in which we can assign the variable to a particular group.
- The categorical variable can take only two values as binary values that only prefers the value as 0 and 1.
Code:
In the following code, we will import pandas as pd, and from this, we read the data.
import pandas as pd
df = pd.read_csv('bank.csv')
df.head()
Output:
After running the above code we get the following output in which we can see that the first five rows of the dataset are shown on the screen.
In the following code, we print the variable in the category we assign the variable to a particular group.
s = (df.dtypes == 'object')
object_cols = list(s[s].index)
print("Categorical variables:")
print(object_cols)
After running the above code, we get the following output in which we can see that the object column of the categorical variable is shown on the screen.
Here we can see that the variable is assigned to a particular group.
features = df[['Sex','Housing','Saving accounts']]
features.head()
Read: Scikit learn Feature Selection
Scikit learn random forest cross-validation
In this section, we will learn about scikit learn random forest cross-validation in python.
- Cross-validation is a process that is used to evaluate the performance or accuracy of a model. It is also used to prevent the model from overfitting in a predictive model.
- Cross-validation we can make a fixed number of folds of data and run the analysis of data.
Read: Scikit learn Linear Regression
Scikit learn random forest visualization
In this section, we will learn about how to make scikit learn random forest virtualization in python.
- As we know random forest work on several decision trees. Plot them and see how the model can predict the value of the target variable.
- Visualization is a process that picks 2 or 3 trees randomly that gives a good intuition of the model.
Code:
In the following code, we will import libraries from which we can make random forest visualization.
- model = RandomForestClassifier(n_estimators=10) is used to make a model and can also used single decision tree.
- estimator = model.estimators_[5] is used to extract the single tree.
- call([‘dot’, ‘-Tpng’, ‘tree.dot’, ‘-o’, ‘tree.png’, ‘-Gdpi=600’]) is used to convert to png using system command.
from sklearn.datasets import load_iris
iris = load_iris()
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=10)
model.fit(iris.data, iris.target)
estimator = model.estimators_[5]
from sklearn.tree import export_graphviz
export_graphviz(estimator, out_file='tree.dot',
feature_names = iris.feature_names,
class_names = iris.target_names,
rounded = True, proportion = False,
precision = 2, filled = True)
from subprocess import call
call(['dot', '-Tpng', 'tree.dot', '-o', 'tree.png', '-Gdpi=600'])
from IPython.display import Image
Image(filename = 'tree.png')
Output:
After running the above code, we get the following output in which we can see that the scikit learn random forest visualization is done on the screen.
You may also like to read the following scikit learn tutorials.
So, in this tutorial, we discussed Scikit learn random forest in python and we have also covered different examples related to its implementation. Here is the list of examples that we have covered.
- Scikit learn random forest
- Scikit learn random forest example
- Scikit learn random forest regression
- Scikit learn random forest hyperparameter tunning
- Scikit learn random forest feature importance
- Scikit learn random forest categorical variable
- Scikit learn random forest cross-validation
- Scikit learn random forest visualization
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.