How to identify missing values with the pandas isna() function
The Python pandas function DataFrame.isna()
helps users identify missing data (NaN
or None
) in a DataFrame. This can be especially useful for seeing if data needs to be cleaned up before beginning analysis.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for pandas isna()
?
Since pandas isna()
doesn’t take any parameters, its syntax is quite straightforward:
DataFrame.isna()
pythonHow to use the pandas isna()
function
When isna()
is applied to a DataFrame, it creates a new DataFrame with Boolean values. If a value in the original DataFrame is missing (e.g., marked as NaN
or None
), isna()
will show True
where the value is located. Otherwise, the function will display False
.
If, in addition to identifying NaN
or None
values, you also want to remove them, check out the pandas dropna() function. If you don’t want to remove these values, but instead systematically replace them, the fillna() function is a useful tool for doing so.
Identifying missing values in a DataFrame
The following example uses a DataFrame with data about different individuals, where some information is missing.
import pandas as pd
# Create DataFrame example
data = {
'Name': ['Alice', 'Bob', None, 'David'],
'Age': [25, None, 35, 40],
'City': ['Nottingham', 'London', 'Cardiff', None]
}
df = pd.DataFrame(data)
print(df)
pythonThe DataFrame looks like this:
Name Age City
0 Alice 25.0 Nottingham
1 Bob NaN London
2 None 35.0 Cardiff
3 David 40.0 None
The information that is missing has been marked as None
or NaN
. To see exactly which values are missing, you can call isna()
on the DataFrame.
# Applying pandas isna()
missing_values = df.isna()
print(missing_values)
pythonThe function call returns a new DataFrame, where missing values from the original data are marked as True
, while values that are present are marked as False
. Here’s the output:
Name Age City
0 False False False
1 False True False
2 True False False
3 False False True
Counting the amount of missing values per column
It can also be useful to know how many values are missing in each column to help you decide how to handle them. You can use isna()
along with Python’s sum()
function to count the number of missing values in each column.
# Count missing values per column
missing_count = df.isna().sum()
print(missing_count)
pythonThis shows you the number of missing values in each column:
Name 1
Age 1
City 1
dtype: int64