How to loop through DataFrames with pandas iterrows()
The Python pandas function DataFrame.iterrows()
is used to iterate over rows in a pandas DataFrame. For each row, it provides a Python tuple that contains the row index and a Series object with the row’s data.
- 99.9% uptime and super-fast loading
- Advanced security features
- Domain and email included
What is the syntax for pandas iterrows()
?
The basic syntax of pandas DataFrame.iterrows()
is simple since the function doesn’t take any parameters:
df.iterrows()
pythonIn this code example, df
is the DataFrame you want to iterate through.
How to use the pandas iterrows()
function
The DataFrame.iterrows()
function is typically used when you need to process data row by row. It’s often combined with Python for-loops.
Adding up values in a column
Let’s look at an example DataFrame that contains the columns Name, Age and Score:
import pandas as pd
# Creating an example DataFrame
data = {'Name': ['Anna', 'Ben', 'Clara'],
'Age': [23, 35, 29],
'Score': [88, 92, 85]}
df = pd.DataFrame(data)
print(df)
pythonThe code above results in the following DataFrame:
Name Age Score
0 Anna 23 88
1 Ben 35 92
2 Clara 29 85
Now, let’s calculate the sum of the scores. We can use pandas DataFrame.iterrows()
to do this:
# Calculating the total score
total_score = 0
for index, row in df.iterrows():
total_score += row['Score']
print(f"The total score is: {total_score}")
pythonIn this example, we used the pandas iterrows()
function to loop through each row, adding up the values in the Score column one by one. This produces the following result:
The total score is: 265
When using pandas iterrows()
, it’s important not to directly modify the data you’re iterating over. Depending on the data type, doing so may lead to unexpected results and unintended behavior.
Processing rows using conditions
The iterrows()
function can also be used to apply conditions to individual rows in your DataFrame. For example, let’s say you want to retrieve the names of everyone over 30 years old in the DataFrame from the last example:
# Retrieving names of people over 30 years old
names = []
for index, row in df.iterrows():
if row['Age'] > 30:
names.append(row['Name'])
print(f"People over 30 years old: {names}")
pythonIn this example, we used DataFrame.iterrows()
to go through each row of data. Inside the for-loop, it checks the values in the Age column and only stores the names of people over 30 years old in the Python list names
. This is done using the Python append() function. Here’s the result:
People over 30 years old: ['Ben']
While it’s easy to use DataFrames.iterrows()
, keep in mind that it may not run efficiently on large DataFrames. In many cases, other options like apply()
or vectorised calculations can be used to achieve better performance.