The itertuples()
method in Pandas is used to iterate over the rows of a DataFrame.
Example
import pandas as pd
# create a DataFrame
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
# use itertuples() to iterate over rows
for row in df.itertuples():
print(row)
'''
Output
Pandas(Index=0, A=1, B=3)
Pandas(Index=1, A=2, B=4)
'''
itertuples() Syntax
The syntax of the itertuples()
method in Pandas is:
df.itertuples(index=True, name='Pandas')
Note: A typical use of itertuples()
would look like this:
for row in df.itertuples(index=True, name='Pandas'):
# do something using row
itertuples() Arguments
The itertuples()
method takes following arguments:
index
(optional) - a boolean to specify whether to include or exclude indexname
(optional) - specifies the name of the namedtuple to be returned. If set toNone
, a regular tuple is returned instead.
Note: A namedtuple is a subclass of tuples with named fields. It's part of the collections module and provides a way to create tuple-like objects.
itertuples() Return Value
The itertuples()
method returns an iterator that yields namedtuples for each row in the DataFrame.
Example 1: Basic Iteration Using itertuples()
import pandas as pd
# create a DataFrame
data = {'Column1': [1, 2, 3],
'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# use itertuples() to iterate over rows
for row in df.itertuples():
# access data from each row
print(row.Column1, row.Column2)
Output
1 A 2 B 3 C
In the above example, we have used the itertuples()
to loop over rows of the df DataFrame.
In each iteration of the loop, the code retrieves the values from Column1
and Column2
of the DataFrame using row.Column1
and row.Column2
.
Example 2: Using itertuples() with and without the index Argument
import pandas as pd
# create a DataFrame
data = {'Column1': [10, 20, 30], 'Column2': ['A', 'B', 'C']}
df = pd.DataFrame(data)
# iterating with index=True
print("Iterating with index=True:")
for row in df.itertuples(index=True):
print(row)
# iterating with index=False
print("\nIterating with index=False:")
for row in df.itertuples(index=False):
print(row)
Output
Iterating with index=True: Pandas(Index=0, Column1=10, Column2='A') Pandas(Index=1, Column1=20, Column2='B') Pandas(Index=2, Column1=30, Column2='C') Iterating with index=False: Pandas(Column1=10, Column2='A') Pandas(Column1=20, Column2='B') Pandas(Column1=30, Column2='C')
Here,
index=True
- includes the df DataFrame's index as the first element of each tupleindex=False
, excludes the index from the tuples, showing only the data from the DataFrame's columns.
Example 3: Provide Custom Name for the NamedTuple
import pandas as pd
# create a DataFrame
df = pd.DataFrame({'Column1': [1, 2, 3], 'Column2': ['A', 'B', 'C']})
# use itertuples() with a custom name for the namedtuple
for row in df.itertuples(name='RowData'):
# access elements in the namedtuple
print(f"Index: {row.Index}, Column1: {row.Column1}, Column2: {row.Column2}")
Output
Index: 0, Column1: 1, Column2: A Index: 1, Column1: 2, Column2: B Index: 2, Column1: 3, Column2: C
In this example, we have used itertuples()
to iterate over the rows of the df DataFrame.
The name
argument is set to RowData
. This means each row is represented as a namedtuple called RowData
.
Inside the loop, we accessed the index of the row with row.Index
, and the data in Column1
and Column2
with row.Column1
and row.Column2
, respectively.
Note: Naming the row as RowData in itertuples()
enhances code readability by clearly indicating that each iteration deals with row data, making it easier to understand and maintain.