The Python pandas DataFrame.where() function is designed to help with conditional data manipulation in DataFrames. It allows programmers to replace or mask values in a pandas DataFrame based on a condition.

Web Hosting
Secure, reliable hosting for your website
  • 99.9% uptime and super-fast loading
  • Advanced security features
  • Domain and email included

What is the syntax for pandas DataFrame.where()?

The where() function can accept up to five parameters and has the following syntax:

DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None)
python

When applied to a DataFrame, only the values that meet the specified condition (cond) will remain as they are. Any other values will be replaced with what you specify in the other parameter.

What parameters can be used with pandas DataFrame.where()?

Pandas DataFrame.where() accepts different parameters that facilitate flexible data management and modification:

Parameter Description Default Value
cond Condition that must be met in order for DataFrame values to remain unchanged
other The value that replaces values that do not meet the condition
inplace When set to True, the operation will directly modify the DataFrame False
axis Specifies which axis the condition applies to (axis=0 for rows, axis=1 for columns) None
level Defines the level of a multi-index where the condition should be applied None

How to use pandas DataFrame.where()

The where() function can be useful in various scenarios that require conditional data manipulation. This could include data cleaning or creating new columns based on conditions.

Replacing values based on conditions

Suppose you have a DataFrame with a company’s sales results, and you only want to display the positive results. Negative results, on the other hand, should be replaced with 0. You can do this with pandas DataFrame.where(). First, let’s create a DataFrame:

import pandas as pd
# Create a sample DataFrame
data = {
    'Region': ['North', 'South', 'East', 'West'],
    'Sales_Q1': [15000, -5000, 3000, -1000],
    'Sales_Q2': [20000, 25000, -7000, 5000]
}
df = pd.DataFrame(data)
print(df)
python

The code above produces the following DataFrame:

Region    Sales_Q1    Sales_Q2
0    North         15000         20000
1    South         -5000         25000
2     East            3000         -7000
3     West         -1000            5000

Using where(), you can replace all negative values with 0. Make sure that only columns with numeric values are included, otherwise, it won’t work.

# Replacing values using conditions
df_positive = df.copy()
df_positive[['Sales_Q1', 'Sales_Q2']] = df[['Sales_Q1', 'Sales_Q2']].where(df[['Sales_Q1', 'Sales_Q2']] > 0, 0)
print(df_positive)
python

The resulting DataFrame, df_positive, only display positive sales results and replaces negative values with 0:

Region    Sales_Q1    Sales_Q2
0    North         15000         20000
1    South                 0         25000
2     East            3000                 0
3     West                 0            5000

Masking values using conditions

Pandas DataFrame.where() can also be used to mask values. This means only specific parts of a DataFrame will be displayed. In the following example, we only want to show values that are above a certain threshold (in this case, 10000). Remember, you need to make sure that you’re only evaluating columns with numerical data:

# Only display values over 10000
df_masked = df.copy()
df_masked[['Sales_Q1', 'Sales_Q2']] = df[['Sales_Q1', 'Sales_Q2']].where(df[['Sales_Q1', 'Sales_Q2']] > 10000)
print(df_masked)
python

The resulting DataFrame, df_masked, only displays values that are greater than 10000. NaN is used in place of all other values:

Region    Sales_Q1    Sales_Q2
0    North     15000.0     20000.0
1    South             NaN     25000.0
2     East             NaN             NaN
3     West             NaN             NaN
Was this article helpful?
Go to Main Menu