How to clean data in pandas with dropna()

IONOS editorial team26/06/20253 mins

Contents

The Python pandas DataFrame.dropna() function is used to remove all rows or columns containing missing values (NaN) from a DataFrame. This makes it especially crucial for preparing and cleaning data.

Web Hosting

Secure, reliable hosting for your website

99.9% uptime and super-fast loading
Advanced security features
Domain and email included

What is the syntax for pandas `dropna()`?

The dropna() function accepts up to five parameters. Here’s its syntax:

DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False, ignore_index=False)

python

Important parameters for `dropna()`

You can use parameters to influence the behaviour of the pandas DataFrame.dropna() function. Here’s an overview of the most important ones:

Parameter	Description	Default Value
`axis`	Determines whether rows (0 or `index`) or columns (1 or `columns`) will be removed	0
`how`	Specifies whether all (`all`) or only some (`any`) values must be NaN	`any`
`thresh`	Specifies the minimum number of non-NaN values a row or column must have to avoid being removed; cannot be combined with `how`	optional
`subset`	Specifies which rows or columns should be considered	optional
`inplace`	Determines whether the operation is performed on the original DataFrame	`False`
`ignore_index`	If `True`, the remaining axis is labelled from 0 to n-1	`False`

How to use pandas `DataFrame.dropna()`

Pandas dropna() is used to clean data before it’s analysed. The removal of rows or columns with missing values helps to prevent biases in statistical evaluations. Since missing values can also lead to problems with data visualisation, using the function is also advantageous when creating charts and reports.

Removing rows with missing values

In the following example, we’ll take a look at a DataFrame containing NaN values:

import pandas as pd
import numpy as np
# Creating a DataFrame with sample data
data = {
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
print(df)

python

The DataFrame looks like this:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
2  NaN  NaN  11
3  4.0  8.0  12

Next, we’re going to apply the pandas dropna() function:

## Remove all rows that contain at least one NaN value
df_cleaned = df.dropna()
print(df_cleaned)

python

Running the code above produces the following result:

A    B  C
0  1.0  5.0  9
3  4.0  8.0 12

Since all the other rows contain NaN values, only the zeroth and third rows remain.

Removing columns with missing values

Similarly, you can remove columns with missing values by setting the axis parameter to 1:

## Remove all columns that contain at least one NaN value
df_cleaned_columns = df.dropna(axis=1)
print(df_cleaned_columns)

python

Column C is the only column that remains, since it’s the only one that doesn’t contain NaN values:

Using `thresh`

If you want to remove rows that contain fewer than two non-NaN values, you can use the thresh parameter:

## Only keeps rows that have 2 or more non-NaN values
df_thresh = df.dropna(thresh=2)
print(df_thresh)

python

Running the code produces the following output:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12

Row 1 is not removed from the output because it contains 2 non-NaN values (2.0 and 10).

Using `subset`

The subset parameter allows you to specify the columns where the program should look for missing values. Only rows that contain missing values in the columns that have been specified will be removed.

## Removes all rows where column A contains a NaN value
df_subset = df.dropna(subset=['A'])
print(df_subset)

python

Here, only the second row is removed. The NaN value in the first row is ignored due to the subset parameter, which only takes column A into consideration:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12

Was this article helpful?

Mr. Kosalshutterstock

How to index pandas DataFrames

Pandas DataFrame indexing is a powerful tool for efficient and effective data handling. With various methods, you can target specific data and subsets of your DataFrame. In this article, we’ll explore what the pandas DataFrame index is, how to access column and row data using…

Python Pandas

Mr. Kosalshutterstock

What is Python pandas any() and how does it work?

In pandas, the DataFrame any() method is an efficient tool to quickly check if there is at least one true value along an axis of a DataFrame. This method is especially helpful for data analysis and validation. In this article, we’ll show you what the syntax for this function is,…

Python Pandas

ESB Professionalshutterstock

How to use Pandas DataFrame to manipulate tables quickly in Python

The Pandas module is one of the most powerful tools for data manipulation in Python. One of the central data structures in Pandas is the DataFrame. DataFrames can be used to manipulate two-dimensional, structured data efficiently. We explain the structure of the data structure as…

Python Pandas

What is the Python pandas property iloc[]?

When working with DataFrames in Python pandas, not all rows or columns of a DataFrame are always relevant for data analysis. The pandas DataFrame property iloc[] is a useful tool for selecting rows or columns using their indices. In this article, we’ll take a look at the syntax…

Python Pandas

REDPIXEL.PLShutterstock

How to calculate averages with pandas mean()

The pandas `DataFrame.mean()` function calculates averages in a DataFrame. It can be used to find average values for rows or columns, and offers flexibility when it comes to handling NaN values. In this article, we’ll look at the syntax of the function, the parameters it takes…

Python Pandas

How to clean data in pandas with dropna()

What is the syntax for pandas dropna()?

Important parameters for dropna()

How to use pandas DataFrame.dropna()

Removing rows with missing values

Removing columns with missing values

Using thresh

Using subset

What is the syntax for pandas `dropna()`?

Important parameters for `dropna()`

How to use pandas `DataFrame.dropna()`

Using `thresh`

Using `subset`