How to apply conditions in pandas DataFrames with where()
The Python pandas DataFrame.where()
function is designed to help with conditional data manipulation in DataFrames. It allows programmers to replace or mask values in a pandas DataFrame based on a condition.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for pandas DataFrame.where()
?
The where()
function can accept up to five parameters and has the following syntax:
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None)
pythonWhen applied to a DataFrame, only the values that meet the specified condition (cond
) will remain as they are. Any other values will be replaced with what you specify in the other
parameter.
What parameters can be used with pandas DataFrame.where()
?
Pandas DataFrame.where()
accepts different parameters that facilitate flexible data management and modification:
Parameter | Description | Default Value |
---|---|---|
cond
|
Condition that must be met in order for DataFrame values to remain unchanged | |
other
|
The value that replaces values that do not meet the condition | |
inplace
|
When set to True , the operation will directly modify the DataFrame
|
False
|
axis
|
Specifies which axis the condition applies to (axis=0 for rows, axis=1 for columns)
|
None
|
level
|
Defines the level of a multi-index where the condition should be applied | None
|
How to use pandas DataFrame.where()
The where()
function can be useful in various scenarios that require conditional data manipulation. This could include data cleaning or creating new columns based on conditions.
Replacing values based on conditions
Suppose you have a DataFrame with a company’s sales results, and you only want to display the positive results. Negative results, on the other hand, should be replaced with 0
. You can do this with pandas DataFrame.where()
. First, let’s create a DataFrame:
import pandas as pd
# Create a sample DataFrame
data = {
'Region': ['North', 'South', 'East', 'West'],
'Sales_Q1': [15000, -5000, 3000, -1000],
'Sales_Q2': [20000, 25000, -7000, 5000]
}
df = pd.DataFrame(data)
print(df)
pythonThe code above produces the following DataFrame:
Region Sales_Q1 Sales_Q2
0 North 15000 20000
1 South -5000 25000
2 East 3000 -7000
3 West -1000 5000
Using where()
, you can replace all negative values with 0
. Make sure that only columns with numeric values are included, otherwise, it won’t work.
# Replacing values using conditions
df_positive = df.copy()
df_positive[['Sales_Q1', 'Sales_Q2']] = df[['Sales_Q1', 'Sales_Q2']].where(df[['Sales_Q1', 'Sales_Q2']] > 0, 0)
print(df_positive)
pythonThe resulting DataFrame, df_positive
, only display positive sales results and replaces negative values with 0
:
Region Sales_Q1 Sales_Q2
0 North 15000 20000
1 South 0 25000
2 East 3000 0
3 West 0 5000
Masking values using conditions
Pandas DataFrame.where()
can also be used to mask values. This means only specific parts of a DataFrame will be displayed. In the following example, we only want to show values that are above a certain threshold (in this case, 10000
). Remember, you need to make sure that you’re only evaluating columns with numerical data:
# Only display values over 10000
df_masked = df.copy()
df_masked[['Sales_Q1', 'Sales_Q2']] = df[['Sales_Q1', 'Sales_Q2']].where(df[['Sales_Q1', 'Sales_Q2']] > 10000)
print(df_masked)
pythonThe resulting DataFrame, df_masked
, only displays values that are greater than 10000
. NaN
is used in place of all other values:
Region Sales_Q1 Sales_Q2
0 North 15000.0 20000.0
1 South NaN 25000.0
2 East NaN NaN
3 West NaN NaN