np.where in Pandas Python [4+ Examples]

In this Python NumPy tutorial, I will explain what the np.where() in Python is, its syntax, the parameters required, and its return value. I will explain how to apply np.where in Pandas Python with different examples.

To create or modify columns conditional, we can use the np.where() function in Pandas Python. the np.where() takes a condition as an argument and returns an output accordingly. Also, we can filter rows with some conditions, and can handle the missing data or NaN values in a Python Pandas datatframe.

np.where Python

The np.where in Python can be thought of as a vectorized form of the ternary x if condition else y. It takes a condition and two options as arguments: if the condition is met, it outputs the first option; otherwise, it returns the second.

Let’s see a basic example of the use of the np.where() function in Python.

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
indices = np.where(arr > 3)
print(indices)

Output: The implementation of the code is given below

(array([3, 4], dtype=int64),)
np.where in Python

np.where in Python syntax

The basic syntax of np.where in Python is:

np.where(condition, [x, y])

numpy.where in Python parameter

Here, the basic parameters required in the NumPy where() function are:

NameDescription
conditionAn array-like structure where each element is checked against the condition. It returns an array in Python with elements from either x or y, depending on whether the condition is True or False.
x(Optional) Array or scalar value to be returned when the condition is True.
y(Optional) Array or scalar value to be returned when the condition is False.
List of parameters required in Python NumPy where() function.

np.where return value

If both x and y are provided, the output we get is a Python NumPy array. It contains elements of x where the condition is True and elements from y otherwise. If only the condition is given, numpy.where returns the indices of the elements that are True.

np.where in Pandas Python

Pandas is built on top of NumPy in Python, which means that we can often use NumPy functions directly on Pandas Series and DataFrames. This interoperability extends to np.where in Pandas Python, allows for seamless conditional operations within a DataFrame.

np.where Pandas Python use cases

Let’s see some use cases of np.where in Pandas Python:

Case 1: np where Pandas with Conditional Column Creation or Modification

We can use np.where to create a new column in a DataFrame Pandas in Python based on a condition or to modify an existing one.

Example: Suppose we have a Pandas DataFrame df with a column age in Python. We want to create a new column age_group that labels each person as either ‘adult’ or ‘child’ based on their age through np.where() in Python.

import pandas as pd
import numpy as np

df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 12, 32]})
df['age_group'] = np.where(df['age'] < 18, 'child', 'adult')
print(df)

Output: The result after the implementation of the code with a screenshot is mentioned below:

      name  age age_group
0    Alice   25     adult
1      Bob   12     child
2  Charlie   32     adult
np.where pandas python column

This way we can use the np.where in Pandas to create a single column in a Dataframe in Python.

Case 2: np.where on dataframe multiple columns

The real power of np.where in Pandas is observed when we apply it across multiple columns of a dataframe in Python.

Example: Consider a DataFrame in Python Pandas with data, including columns for ‘Revenue’ and ‘Cost’. We want to add a new column ‘Profitable’ that indicates ‘Yes’ if the ‘Revenue’ exceeds ‘Cost’ and ‘No’ otherwise.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'Product': ['Widget A', 'Widget B', 'Widget C'],
    'Revenue': [1500, 800, 1200],
    'Cost': [1000, 700, 500]
})

df['Profitable'] = np.where(df['Revenue'] > df['Cost'], 'Yes', 'No')
print(df)

Output: The implementation of the code:

    Product  Revenue  Cost Profitable
0  Widget A     1500  1000        Yes
1  Widget B      800   700        Yes
2  Widget C     1200   500        Yes
pandas np where in python

This way we can use the np.where in Pandas Python to manipulate multiple columns in a Dataframe.

Case 3: np.where in Pandas for Filtering Rows

While Pandas already has powerful querying capabilities, np.where in Pandas Python can be used for indexing to filter rows in certain scenarios.

Example: Let’s say we have a Pandas DataFrame in Python containing information and we want to filter out rows with the help of numpy.where() function.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'age': [25, 17, 32, 15, 22]
})

indices = np.where(df['age'] >= 18)[0]
adults_df = df.iloc[indices]
print(adults_df)

Output: After the implementation, we get:

      name  age
0    Alice   25
2  Charlie   32
4      Eva   22
pandas np.where in Python

This way we can use the np.where in Pandas Python to filter rows accordingly in a Dataframe in Python.

Case 4: np.where dataframe in Python Pandas to handle missing data

The np.where in Pandas Python is also a useful tool for dealing with missing or NaN values by conditionally replacing them with a specific value.

Example: Here, We are using the np.where() in Pandas Python to replace missing values in a DataFrame, conditionally.

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, np.nan], 'B': [np.nan, 1, 2]})
print("Before the Dataframe was:\n", df)
df['A'] = np.where(df['A'].isna(), 0, df['A'])
print("After the Operation, Dataframe is:\n", df)

Output: The implementation of the code in Python Pycharm editor is given:

Before the Dataframe was:
      A    B
0  1.0  NaN
1  2.0  1.0
2  NaN  2.0
After the Operation, Dataframe is:
      A    B
0  1.0  NaN
1  2.0  1.0
2  0.0  2.0
numpy where dataframe in Python

This way we can use the np.where in Pandas to replace the missing data in a Dataframe in Python.

Case 5: np.where multiple conditions pandas

We can also use np.where in Pandas Python to evaluate multiple conditions. For this purpose, we often use logical operators like & (and), | (or), and ~ (not) from Python.

Example: Say, we have a Python Pandas Dataframe, and we want to label our data with some values based on other columns. We can use the np.where() in Pandas Python to do so.

import numpy as np
import pandas as pd

df = pd.DataFrame({
    'Product': ['Widget A', 'Widget B', 'Widget C'],
    'Revenue': [1500, 800, 1200],
    'Cost': [1000, 700, 500]
})

condition_high_profit = (df['Revenue'] > 1000) & (df['Cost'] < 500)
condition_low_profit = (df['Revenue'] <= 1000) & (df['Cost'] < 500)
condition_high_loss = (df['Revenue'] > 1000) & (df['Cost'] >= 500)

df['Profit_Category'] = np.where(
    condition_high_profit, 'High Profit',
    np.where(
        condition_low_profit, 'Low Profit',
        np.where(
            condition_high_loss, 'High Loss', 'Low Loss'
        )
    )
)
print(df)

Output: The implementation of the code is given below:

    Product  Revenue  Cost Profit_Category
0  Widget A     1500  1000       High Loss
1  Widget B      800   700        Low Loss
2  Widget C     1200   500       High Loss
np.where multiple conditions in Python

This way we can use the np.where in Pandas Python to apply multiple conditions in a Dataframe.

Conclusion

The np.where in Pandas library is an invaluable tool for performing conditional logic on DataFrame columns in Python. It enables data analysts and scientists to efficiently apply single or multiple conditions to DataFrames, enhancing data manipulation and analysis tasks.

By mastering np.where in Pandas, we can maintain cleaner code, improve performance, and gain the flexibility needed to handle a wide array of data conditioning scenarios in Python.

You may also like to read: