How to search DataFrames using pandas isin()
The Python pandas function DataFrame.isin()
is designed to quickly and efficiently check whether certain values exist in a DataFrame. This function is particularly useful for checking for multiple values at once.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for pandas isin()
?
Pandas isin()
takes one parameter and looks like this:
DataFrame.isin(values)
pythonThe values
parameter can be a Python list, a Python dictionary or another DataFrame. It contains the values you want to search for in the DataFrame.
If you’re working with pandas Series instead of DataFrames, you can use the equivalent function Series.isin()
.
How to use isin()
with DataFrames in pandas
You can use isin()
for different purposes. In addition to checking for values, it can also be used to filter DataFrames.
Checking for values in a column
First, let’s take a look at a DataFrame that contains information about different people and where they live.
import pandas as pd
# Creating a DataFrame
data = {
'Name': ['Amir', 'Bella', 'Charlize', 'David'],
'City': ['Nottingham', 'London', 'Cardiff', 'Hull']
}
df = pd.DataFrame(data)
print(df)
pythonThe DataFrame looks like this:
Name City
0 Amir Nottingham
1 Bella London
2 Charlize Cardiff
3 David Hull
Now, we want to use pandas isin()
to check whether the cities in the City column appear in a separate list of cities we’ve created. Once we’ve created the list with the reference cities, we’ll run the function on the DataFrame column ‘City’:
# Cities for the list to be compared to
cities_to_check_against = ['Cardiff', 'Hull', 'Middlesbrough']
# Using the isin() method
result = df['City'].isin(cities_to_check_against)
print(result)
pythonThe result is a series of Boolean values indicating whether each city in the City column is present in the cities_to_check_against
list:
0 False
1 False
2 True
3 True
Name: City, dtype: bool
Filtering a DataFrame using isin()
You can also use pandas isin()
to filter a DataFrame, keeping only the rows with cities that appear in the cities_to_check_against
list.
# Filtering a DataFrame using isin()
filtered_df = df[df['City'].isin(cities_to_check_against)]
print(filtered_df)
pythonThe result is a DataFrame that contains only the rows with cities that are also in the cities_to_check_against
list:
Name City
2 Charlize Cardiff
3 David Hull
Checking multiple columns in a DataFrame
For more complex filtering operations, you can also use pandas isin()
with dictionaries. In the following example, you’ll see how you can use a dictionary to simultaneously check multiple columns of a DataFrame. First, we’ll add a column to the original DataFrame and then use isin()
:
# Creating a DataFrame
data = {
'Name': ['Amir', 'Bella', 'Charlize', 'David'],
'City': ['Nottingham', 'London', 'Cardiff', 'Hull'],
'Age': [25, 30, 35, 40]
}
df = pd.DataFrame(data)
# Dictionary with values that the DataFrame should be checked against
values_to_check_against = {
'City': ['Cardiff', 'Hull'],
'Age': [30, 40]
}
# Using isin() with a dictionary
result = df.isin(values_to_check_against)
print(result)
pythonIn this case, calling isin()
returns a DataFrame with Boolean values, which indicate whether the conditions have been met in each column:
Name City Age
0 False False False
1 False False True
2 False True False
3 False True True