The Python pandas DataFrame.dropna() function is used to remove all rows or columns containing missing values (NaN) from a DataFrame. This makes it especially crucial for preparing and cleaning data.

Web Hosting
Secure, reliable hosting for your website
  • 99.9% uptime and super-fast loading
  • Advanced security features
  • Domain and email included

What is the syntax for pandas dropna()?

The dropna() function accepts up to five parameters. Here’s its syntax:

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False, ignore_index=False)
python

Important parameters for dropna()

You can use parameters to influence the behaviour of the pandas DataFrame.dropna() function. Here’s an overview of the most important ones:

Parameter Description Default Value
axis Determines whether rows (0 or index) or columns (1 or columns) will be removed 0
how Specifies whether all (all) or only some (any) values must be NaN any
thresh Specifies the minimum number of non-NaN values a row or column must have to avoid being removed; cannot be combined with how optional
subset Specifies which rows or columns should be considered optional
inplace Determines whether the operation is performed on the original DataFrame False
ignore_index If True, the remaining axis is labelled from 0 to n-1 False

How to use pandas DataFrame.dropna()

Pandas dropna() is used to clean data before it’s analysed. The removal of rows or columns with missing values helps to prevent biases in statistical evaluations. Since missing values can also lead to problems with data visualisation, using the function is also advantageous when creating charts and reports.

Removing rows with missing values

In the following example, we’ll take a look at a DataFrame containing NaN values:

import pandas as pd
import numpy as np
# Creating a DataFrame with sample data
data = {
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
print(df)
python

The DataFrame looks like this:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
2  NaN  NaN  11
3  4.0  8.0  12

Next, we’re going to apply the pandas dropna() function:

## Remove all rows that contain at least one NaN value
df_cleaned = df.dropna()
print(df_cleaned)
python

Running the code above produces the following result:

A    B  C
0  1.0  5.0  9
3  4.0  8.0 12

Since all the other rows contain NaN values, only the zeroth and third rows remain.

Removing columns with missing values

Similarly, you can remove columns with missing values by setting the axis parameter to 1:

## Remove all columns that contain at least one NaN value
df_cleaned_columns = df.dropna(axis=1)
print(df_cleaned_columns)
python

Column C is the only column that remains, since it’s the only one that doesn’t contain NaN values:

C
0   9
1  10
2  11
3  12

Using thresh

If you want to remove rows that contain fewer than two non-NaN values, you can use the thresh parameter:

## Only keeps rows that have 2 or more non-NaN values
df_thresh = df.dropna(thresh=2)
print(df_thresh)
python

Running the code produces the following output:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12

Row 1 is not removed from the output because it contains 2 non-NaN values (2.0 and 10).

Using subset

The subset parameter allows you to specify the columns where the program should look for missing values. Only rows that contain missing values in the columns that have been specified will be removed.

## Removes all rows where column A contains a NaN value
df_subset = df.dropna(subset=['A'])
print(df_subset)
python

Here, only the second row is removed. The NaN value in the first row is ignored due to the subset parameter, which only takes column A into consideration:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12
Was this article helpful?
Go to Main Menu