The Python pandas function DataFrame.describe() is used to generate a statistical summary of the numerical columns in a DataFrame. This summary includes key statistical metrics like mean, standard deviation, minimum, maximum and different percentiles.

Web Hosting
Secure, reliable hosting for your website
  • 99.9% uptime and super-fast loading
  • Advanced security features
  • Domain and email included

What is the syntax for pandas’ describe() function?

The basic syntax of describe() for DataFrames is simple. It looks like this:

DataFrame.describe(percentiles=None, include=None, exclude=None)
python

Important parameters for pandas’ DataFrame.describe()

Using the following parameters, you can adjust the output of describe():

Parameter Description Default value
percentiles Lists the percentiles that should be included in the summary [.25, .5, .75]
include Specifies which data types to include in the description; possible values are numpy.number, numpy.object, all or None None
exclude Specifies which data types to exclude from the description; functions like the include parameter None
Definition

Statistical percentiles are values that divide a sorted dataset into equal parts, showing what percentage of data points fall below a specific threshold. These include metrics like the median (50th percentile), the 25th percentile and the 75th percentile. This information helps to provide a clearer picture of data distribution.

Examples of how to use pandas describe()

If you need a quick overview of the key statistical metrics of a dataset, the pandas DataFrame.describe() function is extremely useful.

Example 1: Statistical summary of numerical data

In the following example, we take a look at the DataFrame df, which contains different types of sales data.

import pandas as pd
import numpy as np
# Example DataFrame with sales data
data = {
    'Product': ['A', 'B', 'C', 'D', 'E'],
    'Quantity': [10, 20, 15, 5, 30],
    'Price': [100, 150, 200, 80, 120],
    'Revenue': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(data)
print(df)
python

Now, you can use pandas describe() to get a statistical summary of the numerical data in the columns:

summary = df.describe()
print(summary)
python

The output of the pandas DataFrame.describe() function is as follows:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
25%    10.000000  100.000000  1000.000000
50%    15.000000  120.000000  3000.000000
75%    20.000000  150.000000  3000.000000
max    30.000000  200.000000  3600.000000

The key metrics shown in the output are:

  • count: Number of non-NaN (Not a Number) entries
  • mean: Average of the values (also accessible via DataFrame.mean())
  • std: Standard deviation of the values
  • min, 25%, 50%, 75%, max: Minimum, 25th percentile, median (50th percentile), 75th percentile, and maximum values

Example 2: Customising percentiles

You can customise the percentiles in the pandas DataFrame.describe() output with the percentiles parameter:

# Statistical summary with custom percentiles
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)
python

This function call provides the following output:

Quantity       Price      Revenue
count   5.000000    5.000000     5.000000
mean   16.000000  130.000000  2200.000000
std     9.617692   46.904158  1407.124728
min     5.000000   80.000000   400.000000
10%     7.000000   88.000000   640.000000
50%    15.000000  120.000000  3000.000000
90%    26.000000  180.000000  3360.000000
max    30.000000  200.000000  3600.000000

In the output, 10%, 50% and 90% are included instead of the standard percentiles output in the previous example.

Was this article helpful?
Go to Main Menu