How to calculate averages with pandas mean()

Contents

The DataFrame.mean() function in Python pandas is used to calculate averages across one or more axes of a DataFrame. Pandas mean() is essential for analysing numerical data. In addition to computing average values, it also offers insights on the distribution of data.

Web hosting

The hosting your website deserves at an unbeatable price

Loading 3x faster for happier customers
Rock-solid 99.99% uptime and advanced protection
Only at IONOS: up to 500 GB included

What is the syntax for `DataFrame.mean()`?

The pandas mean() function accepts up to three parameters and has the following syntax:

DataFrame.mean(axis=None, skipna=True, numeric_only=None)

python

What parameters can be used with pandas `Dataframe.mean`?

You can use different parameters to customise how pandas DataFrame.mean() works.

Parameter	Description	Default Value
`axis`	Specifies whether the calculation is done over rows (`axis=0`) or columns (`axis=1`)	`0`
`skipna`	If set to `True`, NaN values will be ignored	`True`
`numeric_only`	If set to `True`, only numeric data types will be included in the calculation	`False`

How to use pandas `mean()`

You can apply the pandas DataFrame.mean() function to both columns and rows.

Calculating average values for columns

First, we’re going to create a pandas DataFrame with some numerical data:

import pandas as pd
data = {
    'A': [1, 2, 3, 4],
    'B': [4, 5, 6, 7],
    'C': [7, 8, 9, 10]
}
df = pd.DataFrame(data)
print(df)

python

The resulting DataFrame looks like this:

A  B    C
0  1  4    7
1  2  5    8
2  3  6    9
3  4  7  10

To calculate the average of each column, you can use the pandas mean() function. By default, the axis parameter is set to 0, which corresponds to columns.

column_means = df.mean()
print(column_means)

python

The code above calculates the mean for each column (A, B and C) by finding the sum of the elements in the respective column and then dividing it by the number of elements in the column. The result is the following pandas Series:

A    2.5
B    5.5
C    8.5
dtype: float64

Calculating average values for rows

If you want to find the average for rows, simply set the axis parameter to 1:

row_means = df.mean(axis=1)
print(row_means)

python

Pandas mean() calculates row averages by dividing the sum of elements in a row by the number of elements it has. Calling the function above produces the following output:

0    4.0
1    5.0
2    6.0
3    7.0
dtype: float64

Handling NaN values

In this example, we’ll use a different DataFrame, which contains NaN values:

import pandas as pd
import numpy as np
data = {
    'A': [1, 2, np.nan, 4],
    'B': [4, np.nan, 6, 7],
    'C': [7, 8, 9, np.nan]
}
df = pd.DataFrame(data)
print(df)

python

The code above produces the following DataFrame:

A    B    C
0  1.0  4.0  7.0
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  4.0  7.0  NaN

When calculating the averages for columns, the skipna parameter determines whether NaN values should be included or ignored. By default, skipna is set to True, so df.mean() automatically ignores NaN values. If you want to include NaN values, you need to add skipna=False as a parameter. Doing so will cause any column with at least one NaN to return NaN as its mean.

mean_with_nan = df.mean() 
print(mean_with_nan)

python

Calling df.mean() produces the following output:

A    2.333333
B    5.666667
C    8.000000
dtype: float64

10 Years Digital Guide: A Success Story

Stay on top of AI!

How to merge DataFrames with pandas merge()

The pandas DataFrame merge() method offers developers different ways to combine data from different sources. By using parameters, users can perform different types of join operations for their data analysis. In this article, we’ll look at the syntax of the pandas merge()…

Python Pandas

Mr. Kosalshutterstock

What is Pandas fillna() and how to use it

The Pandas fillna() method is a function used to handle missing values. Various parameters can be used with the function, offering flexibility when replacing NaN values. In this article, we’ll take a look at this function, its syntax and parameters and how to customise…

Python Pandas

How to identify missing values with the pandas isna() function

The pandas isna() function is a useful tool for identifying missing data in a DataFrame. With its simple syntax, it quickly gives you a clear overview of missing values, helping you take action when data needs to be cleaned. In this article, you’ll learn what pandas isna() is and…

Python Pandas

BEST-BACKGROUNDSShutterstock

How to search DataFrames using pandas isin()

Pandas isin() is a helpful function for data analysis. With its straightforward syntax and versatile applications, it allows you to efficiently check for values in a DataFrame. Whether you’re verifying single columns, filtering DataFrames or conducting more complex analyses with…

Python Pandas

NDAB Creativityshutterstock

What is pandas groupby() and how to use it

The pandas DataFrame.groupby() function is a powerful tool for organising data. It allows you to group data according to specific criteria, making it easier to perform complex aggregations and transformations. By using this method effectively, you can streamline your analysis…

Python Pandas

UndreyShutterstock

How to filter for distinct values with pandas DataFrame[].unique()

With pandas DataFrame[].unique(), you can identify distinct values in a column of a DataFrame. It returns a numpy array, helping you to handle large datasets more efficiently. The method is especially helpful if you want to have an overview of the information in a column without…

Python Pandas

How to calculate averages with pandas mean()

What is the syntax for DataFrame.mean()?

What para­met­ers can be used with pandas Dataframe.mean?

How to use pandas mean()

Cal­cu­lat­ing average values for columns

Cal­cu­lat­ing average values for rows

Handling NaN values

What is the syntax for `DataFrame.mean()`?

What parameters can be used with pandas `Dataframe.mean`?

How to use pandas `mean()`

Calculating average values for columns

Calculating average values for rows