What is pandas groupby() and how to use it

Contents

With the Python pandas DataFrame.groupby() function, you can group data based on specific criteria and perform various aggregations and transformations to the data.

Web hosting

The hosting your website deserves at an unbeatable price

Loading 3x faster for happier customers
Rock-solid 99.99% uptime and advanced protection
Only at IONOS: up to 500 GB included

What is the syntax for pandas `DataFrame.groupby()`?

Pandas groupby() accepts up to four parameters. The basic syntax is as follows:

DataFrame.groupby(by=None, level=None, as_index=True, sort=True, group_keys=True, dropna=True)

python

Important parameters for `groupby`

Parameter	Description	Default Value
`by`	Key or Python list of keys to group by; not to be combined with `level`	`None`
`level`	Used for MultiIndex to specify one or more levels for grouping	`None`
`as_index`	If `True`, the group keys are set as the index of the resulting DataFrame	`True`
`group_keys`	If `True`, the group keys are included in the index of the groups	`True`
`dropna`	Specifies whether to exclude groups with NaN values	`True`

How to use pandas `DataFrame.groupby()`

The pandas groupby() function is particularly useful for analysing and summarising large datasets, helping to identify patterns or anomalies.

Grouping and aggregating

Below is an example sales dataset containing information about the sale date, product sold and quantity sold:

import pandas as pd
# Sample sales dataset
data = {
    'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Quantity': [10, 20, 15, 25, 10]
}
df = pd.DataFrame(data)
print(df)

python

The resulting DataFrame looks like this:

Date Product  Quantity
0  2021-01-01       A       10
1  2021-01-01       B       20
2  2021-01-02       A       15
3  2021-01-02       B       25
4  2021-01-03       A       10

Next, we’ll group the dataset by product using pandas groupby(). Then, we’ll calculate the total quantity sold for each product using the sum() function:

# Group by product and calculate the sum of the quantity sold
summe = df.groupby('Product')['Quantity'].sum()
print(total)

The result shows the total number of units sold for each product:

Product
A    35
B    45
Name: Quantity, dtype: int64

Multiple aggregations

In the following example, we’re going to use an extended dataset that also includes revenue:

data = {
    'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Quantity': [10, 20, 15, 25, 10],
    'Revenue': [100, 200, 150, 250, 100]
}
df = pd.DataFrame(data)
print(df)

python

The DataFrame looks like this:

Date Product  Quantity  Revenue
0  2021-01-01       A       10      100
1  2021-01-01       B       20      200
2  2021-01-02       A       15      150
3  2021-01-02       B       25      250
4  2021-01-03       A       10      100

Using pandas DataFrame.groupby(), we’re going to group the data by product and then use the agg() function to calculate the total quantity and revenue, as well as the average revenue per product.

# Group by product and apply multiple aggregations
groups = df.groupby('Product').agg({
    'Quantity': 'sum',
    'Revenue': ['sum', 'mean']
})
print(groups)

Here’s the result:

Quantity Revenue        
          sum    sum    mean
Product                  
A          35    350  116.666667
B          45    450  225.000000

10 Years Digital Guide: A Success Story

Stay on top of AI!

How to filter for distinct values with pandas DataFrame[].unique()

With pandas DataFrame[].unique(), you can identify distinct values in a column of a DataFrame. It returns a numpy array, helping you to handle large datasets more efficiently. The method is especially helpful if you want to have an overview of the information in a column without…

Python Pandas

Gorodenkoffshutterstock

How to apply conditions in pandas DataFrames with where()

With pandas DataFrame.where(), you can modify data in your DataFrame using conditions. By creating conditions to determine which values to keep and which ones to replace, you can efficiently clean, extract or transform data in a DataFrame. In this article, we’ll take a look at…

Python Pandas

GaudiLabShutterstock

How to select data from pandas DataFrames with loc[]

The pandas DataFrame feature loc[] offers an easy way to extract data using labels. It’s especially useful when working with data where the positions of rows and columns aren’t always predictable. In this article, we’ll go over the syntax for pandas loc[], how to use it and what…

Python Pandas

ra2 studioShutterstock

How to create a pandas table

Displaying a pandas DataFrame as a table is an essential task that can be accomplished in various ways. Whether it’s simple console output, a formatted HTML table or different standard formats like plain text and GitHub Markdown, pandas offers many ways for creating tables. In…

Python Pandas

Ranjit Karmakarshutterstock

What is the pandas DataFrame describe() method?

The pandas DataFrame.describe() method offers a quick way to generate a comprehensive statistical summary of numerical data in a DataFrame. With the ability to adjust percentiles and specify data types, it’s highly flexible and suited to a wide range of analysis. In this article,…

Python Pandas

OhSuratShutterstock

How to load files into Python with pandas read_csv()

Python pandas read_csv() is a powerful function for quickly and efficiently accessing the contents of CSV files in Python. The function is flexible and offers numerous parameters so you can customise the loading process to suit your needs. Understanding pandas read_csv() is…

Python Pandas

What is pandas groupby() and how to use it

What is the syntax for pandas DataFrame.groupby()?

Important para­met­ers for groupby

How to use pandas DataFrame.groupby()

Grouping and ag­greg­at­ing

Multiple ag­greg­a­tions

What is the syntax for pandas `DataFrame.groupby()`?

Important parameters for `groupby`

How to use pandas `DataFrame.groupby()`

Grouping and aggregating

Multiple aggregations