With the Python pandas DataFrame.groupby() function, you can group data based on specific criteria and perform various ag­greg­a­tions and trans­form­a­tions to the data.

Web hosting
The hosting your website deserves at an un­beat­able price
  • Loading 3x faster for happier customers
  • Rock-solid 99.99% uptime and advanced pro­tec­tion
  • Only at IONOS: up to 500 GB included

What is the syntax for pandas DataFrame.groupby()?

Pandas groupby() accepts up to four para­met­ers. The basic syntax is as follows:

DataFrame.groupby(by=None, level=None, as_index=True, sort=True, group_keys=True, dropna=True)
python

Important para­met­ers for groupby

Parameter De­scrip­tion Default Value
by Key or Python list of keys to group by; not to be combined with level None
level Used for Mul­ti­In­dex to specify one or more levels for grouping None
as_index If True, the group keys are set as the index of the resulting DataFrame True
group_keys If True, the group keys are included in the index of the groups True
dropna Specifies whether to exclude groups with NaN values True

How to use pandas DataFrame.groupby()

The pandas groupby() function is par­tic­u­larly useful for analysing and sum­mar­ising large datasets, helping to identify patterns or anomalies.

Grouping and ag­greg­at­ing

Below is an example sales dataset con­tain­ing in­form­a­tion about the sale date, product sold and quantity sold:

import pandas as pd
# Sample sales dataset
data = {
    'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Quantity': [10, 20, 15, 25, 10]
}
df = pd.DataFrame(data)
print(df)
python

The resulting DataFrame looks like this:

Date Product  Quantity
0  2021-01-01       A       10
1  2021-01-01       B       20
2  2021-01-02       A       15
3  2021-01-02       B       25
4  2021-01-03       A       10

Next, we’ll group the dataset by product using pandas groupby(). Then, we’ll calculate the total quantity sold for each product using the sum() function:

# Group by product and calculate the sum of the quantity sold
summe = df.groupby('Product')['Quantity'].sum()
print(total)

The result shows the total number of units sold for each product:

Product
A    35
B    45
Name: Quantity, dtype: int64

Multiple ag­greg­a­tions

In the following example, we’re going to use an extended dataset that also includes revenue:

data = {
    'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
    'Product': ['A', 'B', 'A', 'B', 'A'],
    'Quantity': [10, 20, 15, 25, 10],
    'Revenue': [100, 200, 150, 250, 100]
}
df = pd.DataFrame(data)
print(df)
python

The DataFrame looks like this:

Date Product  Quantity  Revenue
0  2021-01-01       A       10      100
1  2021-01-01       B       20      200
2  2021-01-02       A       15      150
3  2021-01-02       B       25      250
4  2021-01-03       A       10      100

Using pandas DataFrame.groupby(), we’re going to group the data by product and then use the agg() function to calculate the total quantity and revenue, as well as the average revenue per product.

# Group by product and apply multiple aggregations
groups = df.groupby('Product').agg({
    'Quantity': 'sum',
    'Revenue': ['sum', 'mean']
})
print(groups)

Here’s the result:

Quantity Revenue        
          sum    sum    mean
Product                  
A          35    350  116.666667
B          45    450  225.000000
Go to Main Menu