In this Python tutorial, we will learn how to subset a DataFrame in Python. To understand various approaches, we’ll use some built-in functions to subset a DataFrame in Python.
As a Developer, while making the Python Project I got the requirement to subset a DataFrame in Python.
Here we will see:
- How to Subset a DataFrame in Python using loc()
- How to Subset a DataFrame in Python using iloc()
- How to Subset a DataFrame in Python using an indexing operator
- Select rows where the student_age is equal to or greater than 15
How to Subset a DataFrame in Python
In Python, there are primarily three methods that are commonly used and important to understand how to subset a dataframe in Python.
How to Subset a DataFrame in Python using loc()
- In this section, we will discuss how to Subset a DataFrame in Pandas using loc().
- Selecting a set of desirable rows and columns from a data frame is the procedure known as subsetting.
- With the help of the loc() function in Python, we may create a subset of a data frame based on a particular row, column, or both.
- The loc() function relies on labels to choose and generate the customized subset, thus we must provide it with the label of the row or column.
- In this example first, we will create a dataframe for this we are going to use the pd.dataframe() function in Python.
Note: We must first create a dataframe before we can create subsets of it. First, let’s get it out of the way.
import pandas as pd
Student_info = {'Student_id': [672,345, 678,123,783],
'Student_name': ['John','James','Potter','George','Micheal'],
'Student_age': [17,15,14,12,11]
}
df = pd.DataFrame(Student_info, columns= ['Student_id','Student_name','Student_age'])
print (df)
Here is the Screenshot of the following given code.
In this case, pandas were used to generate a data frame. DataFrame() technique.
By providing the labels of the columns and the index of the rows, the loc() method in Python can also be used to change the value of a row with respect to its columns.
Syntax:
Here is the Syntax of the loc() method in Python Pandas
dataframe.loc[row index,['column-names']] = value
Example:–
Let’s take an example and check how to Subset a DataFrame in Python using loc().
Source Code:
result= df.loc[[0,1,3]]
print(result)
You can refer to the below Screenshot
This is how to Subset a DataFrame in Python using loc().
Read: Python Pandas CSV Tutorial
How to Subset a DataFrame in Python using iloc()
- Now let us understand how to Subset a DataFrame in Pandas using iloc().
- The iloc() method in Python allows us to construct subsets by selecting particular values based on indexes from rows and columns.
- In other words, the iloc() function operates on index values as opposed to labels, as does the loc() function. Using the data and the index numbers of the rows and columns, we may pick and generate a subset of a Python dataframe.
Example:
Let’s take an example and check how to Subset a DataFrame in Python using iloc().
Source Code:
result= df.iloc[[0,1,3],[0,2]]
result
Here is the implementation of the following given code
As you can see in the Screenshot we have discussed how to Subset a DataFrame in Pandas using iloc().
Read: How to delete a column in pandas
How to Subset a DataFrame in Python using Indexing operator
- In this section, we will discuss how to Subset a DataFrame in Pandas using an Indexing operator.
- We may quickly build a subset of the data by using the indexing operator square brackets.
- In Python, indexing is a technique used to refer to specific elements within an iterable by their position. In other words, depending on your requirements, you can directly access your preferred elements within an iterable and perform different operations.
Example:
Here we will take an example and check how to Subset a DataFrame in Python using an Indexing operator.
Source Code:
df[['Student_id','Student_name']]
You can refer to the below Screenshot
In this example, we have understood how to Subset a DataFrame in Python using the Indexing operator.
Read: GroupBy in Python Pandas
Select rows where the student_age is equal or greater than 15
- In this section, we will discuss how to select rows where the student_age is equal to or greater than 15.
- To get all the rows where the student_age is equal to or greater than 15, we will use the loc() method. The loc() function relies on labels to choose and generate the customized subset, thus we must provide it with the label of the row or column.
Example:
Let’s take an example and check how to select rows where the student_age is equal to or greater than 15.
Source Code:
new_result = df.loc[df['Student_age'] >= 10]
print (new_result)
Here is the execution of the following given code
This is how to select rows where the student_age is equal to or greater than 15.
You may also like to read the following Python Pandas tutorials.
- Missing Data in Pandas in Python
- Python Pandas DataFrame Iterrows
- Crosstab in Python Pandas
- Pandas replace nan with 0
In this article, we have discussed how to subset a DataFrame in Python. And also we have covered the following given topics.
- How to Subset a DataFrame in Python using loc()
- How to Subset a DataFrame in Python using iloc()
- How to Subset a DataFrame in Python using an indexing operator
- Select rows where the student_age is equal to or greater than 15
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.