In this Python NumPy tutorial, I will explain what the np.where() in Python is, its syntax, the parameters required, and its return value. I will explain how to apply np.where in Pandas Python with different examples.
To create or modify columns conditional, we can use the np.where() function in Pandas Python. the np.where() takes a condition as an argument and returns an output accordingly. Also, we can filter rows with some conditions, and can handle the missing data or NaN values in a Python Pandas datatframe.
np.where Python
The np.where in Python can be thought of as a vectorized form of the ternary x if condition else y. It takes a condition and two options as arguments: if the condition is met, it outputs the first option; otherwise, it returns the second.
Let’s see a basic example of the use of the np.where() function in Python.
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
indices = np.where(arr > 3)
print(indices)
Output: The implementation of the code is given below
(array([3, 4], dtype=int64),)
np.where in Python syntax
The basic syntax of np.where in Python is:
np.where(condition, [x, y])
numpy.where in Python parameter
Here, the basic parameters required in the NumPy where() function are:
Name | Description |
---|---|
condition | An array-like structure where each element is checked against the condition. It returns an array in Python with elements from either x or y, depending on whether the condition is True or False. |
x | (Optional) Array or scalar value to be returned when the condition is True. |
y | (Optional) Array or scalar value to be returned when the condition is False. |
np.where return value
If both x and y are provided, the output we get is a Python NumPy array. It contains elements of x where the condition is True and elements from y otherwise. If only the condition is given, numpy.where returns the indices of the elements that are True.
np.where in Pandas Python
Pandas is built on top of NumPy in Python, which means that we can often use NumPy functions directly on Pandas Series and DataFrames. This interoperability extends to np.where in Pandas Python, allows for seamless conditional operations within a DataFrame.
np.where Pandas Python use cases
Let’s see some use cases of np.where in Pandas Python:
Case 1: np where Pandas with Conditional Column Creation or Modification
We can use np.where to create a new column in a DataFrame Pandas in Python based on a condition or to modify an existing one.
Example: Suppose we have a Pandas DataFrame df with a column age in Python. We want to create a new column age_group that labels each person as either ‘adult’ or ‘child’ based on their age through np.where() in Python.
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 12, 32]})
df['age_group'] = np.where(df['age'] < 18, 'child', 'adult')
print(df)
Output: The result after the implementation of the code with a screenshot is mentioned below:
name age age_group
0 Alice 25 adult
1 Bob 12 child
2 Charlie 32 adult
This way we can use the np.where in Pandas to create a single column in a Dataframe in Python.
Case 2: np.where on dataframe multiple columns
The real power of np.where in Pandas is observed when we apply it across multiple columns of a dataframe in Python.
Example: Consider a DataFrame in Python Pandas with data, including columns for ‘Revenue’ and ‘Cost’. We want to add a new column ‘Profitable’ that indicates ‘Yes’ if the ‘Revenue’ exceeds ‘Cost’ and ‘No’ otherwise.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Product': ['Widget A', 'Widget B', 'Widget C'],
'Revenue': [1500, 800, 1200],
'Cost': [1000, 700, 500]
})
df['Profitable'] = np.where(df['Revenue'] > df['Cost'], 'Yes', 'No')
print(df)
Output: The implementation of the code:
Product Revenue Cost Profitable
0 Widget A 1500 1000 Yes
1 Widget B 800 700 Yes
2 Widget C 1200 500 Yes
This way we can use the np.where in Pandas Python to manipulate multiple columns in a Dataframe.
Case 3: np.where in Pandas for Filtering Rows
While Pandas already has powerful querying capabilities, np.where in Pandas Python can be used for indexing to filter rows in certain scenarios.
Example: Let’s say we have a Pandas DataFrame in Python containing information and we want to filter out rows with the help of numpy.where() function.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'age': [25, 17, 32, 15, 22]
})
indices = np.where(df['age'] >= 18)[0]
adults_df = df.iloc[indices]
print(adults_df)
Output: After the implementation, we get:
name age
0 Alice 25
2 Charlie 32
4 Eva 22
This way we can use the np.where in Pandas Python to filter rows accordingly in a Dataframe in Python.
Case 4: np.where dataframe in Python Pandas to handle missing data
The np.where in Pandas Python is also a useful tool for dealing with missing or NaN values by conditionally replacing them with a specific value.
Example: Here, We are using the np.where() in Pandas Python to replace missing values in a DataFrame, conditionally.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [np.nan, 1, 2]})
print("Before the Dataframe was:\n", df)
df['A'] = np.where(df['A'].isna(), 0, df['A'])
print("After the Operation, Dataframe is:\n", df)
Output: The implementation of the code in Python Pycharm editor is given:
Before the Dataframe was:
A B
0 1.0 NaN
1 2.0 1.0
2 NaN 2.0
After the Operation, Dataframe is:
A B
0 1.0 NaN
1 2.0 1.0
2 0.0 2.0
This way we can use the np.where in Pandas to replace the missing data in a Dataframe in Python.
Case 5: np.where multiple conditions pandas
We can also use np.where in Pandas Python to evaluate multiple conditions. For this purpose, we often use logical operators like & (and), | (or), and ~ (not) from Python.
Example: Say, we have a Python Pandas Dataframe, and we want to label our data with some values based on other columns. We can use the np.where() in Pandas Python to do so.
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Product': ['Widget A', 'Widget B', 'Widget C'],
'Revenue': [1500, 800, 1200],
'Cost': [1000, 700, 500]
})
condition_high_profit = (df['Revenue'] > 1000) & (df['Cost'] < 500)
condition_low_profit = (df['Revenue'] <= 1000) & (df['Cost'] < 500)
condition_high_loss = (df['Revenue'] > 1000) & (df['Cost'] >= 500)
df['Profit_Category'] = np.where(
condition_high_profit, 'High Profit',
np.where(
condition_low_profit, 'Low Profit',
np.where(
condition_high_loss, 'High Loss', 'Low Loss'
)
)
)
print(df)
Output: The implementation of the code is given below:
Product Revenue Cost Profit_Category
0 Widget A 1500 1000 High Loss
1 Widget B 800 700 Low Loss
2 Widget C 1200 500 High Loss
This way we can use the np.where in Pandas Python to apply multiple conditions in a Dataframe.
Conclusion
The np.where in Pandas library is an invaluable tool for performing conditional logic on DataFrame columns in Python. It enables data analysts and scientists to efficiently apply single or multiple conditions to DataFrames, enhancing data manipulation and analysis tasks.
By mastering np.where in Pandas, we can maintain cleaner code, improve performance, and gain the flexibility needed to handle a wide array of data conditioning scenarios in Python.
You may also like to read:
- Python NumPy absolute value
- NumPy shape in Python
- np.max function in Python NumPy
- NumPy random number between two values in Python
I am Bijay Kumar, a Microsoft MVP in SharePoint. Apart from SharePoint, I started working on Python, Machine learning, and artificial intelligence for the last 5 years. During this time I got expertise in various Python libraries also like Tkinter, Pandas, NumPy, Turtle, Django, Matplotlib, Tensorflow, Scipy, Scikit-Learn, etc… for various clients in the United States, Canada, the United Kingdom, Australia, New Zealand, etc. Check out my profile.