With the pandas function DataFrame.merge(), you can combine Data­Frames using keys. This allows you to ef­fi­ciently combine data from various sources, enabling you to carry out more com­pre­hens­ive analyses.

Web hosting
The hosting your website deserves at an un­beat­able price
  • Loading 3x faster for happier customers
  • Rock-solid 99.99% uptime and advanced pro­tec­tion
  • Only at IONOS: up to 500 GB included

What is the syntax for pandas merge()?

The Python pandas DataFrame merge() method can accept a range of different para­met­ers, allowing de­velopers to specify how Data­Frames should be combined. The general syntax of the merge() function is as follows:

DataFrame.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
python
Note

The pandas merge() function is similar to the JOIN operation in SQL. Having ex­per­i­ence with re­la­tion­al databases like SQL can make it easier to un­der­stand how the pandas DataFrame merge() method words. Keep in mind, though, that there are some dif­fer­ences. In pandas, if both key columns have values that evaluate to null, those values will also be merged.

What para­met­ers can be used with pandas merge?

The various para­met­ers accepted by merge() allow you to not only specify which pandas Data­Frames to combine but also which type of join to use as well as other details.

Parameter De­scrip­tion Default Value
left The first DataFrame to be merged
right The second DataFrame to be merged
how The type of join operation to perform (inner, outer, left, right) inner
on The column(s) or index level(s) to use as keys; must be present in both Data­Frames
left_on The column(s) or index level(s) of the left DataFrame that should be used as the key(s)
right_on The column(s) or index level(s) of the right DataFrame that should be used as the key(s)
left_index If True, the index of the left DataFrame is used as the key False
right_index If True, the index of the right DataFrame is used as the key False
sort If True, the resulting DataFrame’s keys are sorted lex­ico­graph­ic­ally False
suffixes Suffixes used to dis­tin­guish columns with the same name ("_x", "_y")
copy If False, a copy is avoided True

How to use pandas merge()

Below are some examples to help il­lus­trate how to use pandas merge() ef­fect­ively.

INNER JOIN

An INNER JOIN combines two pandas Data­Frames and only returns the rows where the keys match in both Data­Frames. To get a better idea of how this works, let’s create two Data­Frames:

import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
    'Key': ['A', 'B', 'C'],
    'Value1': [1, 2, 3]
})
df2 = pd.DataFrame({
    'Key': ['B', 'C', 'D'],
    'Value2': [4, 5, 6]
})
print(df1)
print(df2)
python

The two resulting Data­Frames look like this:

Key    Value1
0     A            1
1     B            2
2     C            3
    Key    Value2
0     B            4
1     C            5
2     D            6

Now, we can perform an INNER JOIN using the merge() function:

# INNER JOIN
result = pd.merge(df1, df2, how='inner', on='Key')
print(result)
python

The output shows that only the rows that have keys B and C are included in the DataFrame. This is because these two keys exist in both of the original Data­Frames.

Key    Value1    Value2
0     B            2            4
1     C            3            5

OUTER JOIN

An OUTER JOIN also merges two Data­Frames, but unlike INNER JOIN, it returns all the rows and fills in missing values with NaN.

# OUTER JOIN
result = pd.merge(df1, df2, how='outer', on='Key')
print(result)
python

The DataFrame below includes all the rows from both Data­Frames. NaN is used for the missing values in key A, which is only in df1 and key D, which is only in df2.

Key    Value1    Value2
0     A        1.0        NaN
1     B        2.0        4.0
2     C        3.0        5.0
3     D        NaN        6.0
Note

The other standard variants of JOIN work in a similar manner.

Using left_on and right_on

Sometimes, two Data­Frames have different column names for their keys. In this case, you can use the left_on and right_on para­met­ers to specify which columns to use. First, let’s create two new Data­Frames:

df3 = pd.DataFrame({
    'Key': ['A', 'B', 'C'],
    'Value1': [1, 2, 3]
})
df4 = pd.DataFrame({
    'Key2': ['B', 'C', 'D'],
    'Value2': [4, 5, 6]
})
print(df3)
print(df4)
python

The two Data­Frames look like this:

Key    Value1
0     A            1
1     B            2
2     C            3
    Key2    Value2
0        B            4
1        C            5
2        D            6

We can use the left_on and right_on para­met­ers to perform the JOIN operation using different keys:

# Join with different key column names
result = pd.merge(df3, df4, how='inner', left_on='Key', right_on='Key2')
print(result)
python

By ex­pli­citly using left_on='Key' and right_on='Key2', the cor­res­pond­ing key columns are utilised for the merge.

Key    Value1 Key2    Value2
0     B            2        B            4
1     C            3        C            5

Using indices as keys

You can also use the indices of Data­Frames as keys by setting the left_index and right_index para­met­ers to True. First, let’s create two new Data­Frames with indices:

df5 = pd.DataFrame({
    'Value1': [1, 2, 3]
}, index=['A', 'B', 'C'])
df6 = pd.DataFrame({
    'Value2': [4, 5, 6]
}, index=['B', 'C', 'D'])
print(df5)
print(df6)
python

Here are the Data­Frames:

Value1
A        1
B        2
C        3
    Value2
B        4
C        5
D        6

Now, we can perform a JOIN operation using the indices:

# JOIN with indices
result = pd.merge(df5, df6, how='inner', left_index=True, right_index=True)
print(result)
python

The result is a JOIN that uses the indices from the Data­Frames:

Value1  Value2
B        2        4
C        3        5
Go to Main Menu