How to calculate averages with pandas mean()
The DataFrame.mean()
function in Python pandas is used to calculate averages across one or more axes of a DataFrame. Pandas mean()
is essential for analysing numerical data. In addition to computing average values, it also offers insights on the distribution of data.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for DataFrame.mean()
?
The pandas mean()
function accepts up to three parameters and has the following syntax:
DataFrame.mean(axis=None, skipna=True, numeric_only=None)
pythonWhat parameters can be used with pandas Dataframe.mean
?
You can use different parameters to customise how pandas DataFrame.mean()
works.
Parameter | Description | Default Value |
---|---|---|
axis
|
Specifies whether the calculation is done over rows (axis=0 ) or columns (axis=1 )
|
0
|
skipna
|
If set to True , NaN values will be ignored
|
True
|
numeric_only
|
If set to True , only numeric data types will be included in the calculation
|
False
|
How to use pandas mean()
You can apply the pandas DataFrame.mean()
function to both columns and rows.
Calculating average values for columns
First, we’re going to create a pandas DataFrame with some numerical data:
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [4, 5, 6, 7],
'C': [7, 8, 9, 10]
}
df = pd.DataFrame(data)
print(df)
pythonThe resulting DataFrame looks like this:
A B C
0 1 4 7
1 2 5 8
2 3 6 9
3 4 7 10
To calculate the average of each column, you can use the pandas mean()
function. By default, the axis parameter is set to 0, which corresponds to columns.
column_means = df.mean()
print(column_means)
pythonThe code above calculates the mean for each column (A, B and C) by finding the sum of the elements in the respective column and then dividing it by the number of elements in the column. The result is the following pandas Series:
A 2.5
B 5.5
C 8.5
dtype: float64
Calculating average values for rows
If you want to find the average for rows, simply set the axis
parameter to 1:
row_means = df.mean(axis=1)
print(row_means)
pythonPandas mean()
calculates row averages by dividing the sum of elements in a row by the number of elements it has. Calling the function above produces the following output:
0 4.0
1 5.0
2 6.0
3 7.0
dtype: float64
Handling NaN values
In this example, we’ll use a different DataFrame, which contains NaN values:
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4],
'B': [4, np.nan, 6, 7],
'C': [7, 8, 9, np.nan]
}
df = pd.DataFrame(data)
print(df)
pythonThe code above produces the following DataFrame:
A B C
0 1.0 4.0 7.0
1 2.0 NaN 8.0
2 NaN 6.0 9.0
3 4.0 7.0 NaN
When calculating the averages for columns, the skipna
parameter determines whether NaN values should be included or ignored. By default, skipna
is set to True
, so df.mean()
automatically ignores NaN values. If you want to include NaN values, you need to add skipna=False
as a parameter. Doing so will cause any column with at least one NaN to return NaN as its mean.
mean_with_nan = df.mean()
print(mean_with_nan)
pythonCalling df.mean()
produces the following output:
A 2.333333
B 5.666667
C 8.000000
dtype: float64