The first()
method in Pandas is used to select the first n rows of data from each group of a DataFrame.
Example
import pandas as pd
# sample DataFrame
data = {
'Group': ['A', 'B', 'A', 'B'],
'Data': [1, 2, 3, 4]
}
df = pd.DataFrame(data)
# group by 'Group' and get the first row for each group
first_rows = df.groupby('Group').first()
print(first_rows)
'''
Output
Data
Group
A 1
B 2
'''
first() Syntax
The syntax of the first()
method in Pandas is:
df.first(offset)
first() Arguments
The first()
method takes following arguments:
offset
- offset length of the data that will be selected
first() Return Value
The first()
method in Pandas returns a DataFrame object that contains the first n rows for each group, considering the index of the DataFrame is sorted.
Example1: Use first() for Grouped Data Selection
import pandas as pd
# create a sample DataFrame
data = {
'Group': ['A', 'A', 'A', 'B', 'B', 'C', 'C'],
'Value': [1, 2, 3, 4, 5, 6, 7],
'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-03-01', '2021-01-02', '2022-01-01', '2022-01-02']
}
df = pd.DataFrame(data)
# group by 'Group' column
grouped = df.groupby('Group')
# use first() to get the first entry for each group
first_entries = grouped.first()
print(first_entries)
Output
Value Date Group A 1 2021-01-01 B 4 2021-03-01 C 6 2022-01-01
In the above example, we have created the df DataFrame and grouped df by the Group
column using the groupby()
method.
Then the first()
method is applied to the grouped object, and the result is printed out, showing the first occurrence of each Group
along with the corresponding Value
and Date
.
Example 2: First Entries of a Time Series Dataframe
import pandas as pd
# create a sample time series data
dates = pd.date_range('20210101', periods=6, freq='D')
data = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=dates)
# get the first three days of data
first_days = data.first('3D')
print(first_days)
Output
A
2021-01-01 1
2021-01-02 2
2021-01-03 3
Here, first we have created a range of 6 consecutive dates starting from January 1st, 2021, with a daily frequency D
, using pd.date_range()
.
And, we created the data DataFrame with single columns A
and the index
is set to dates.
Then we used data.first('3D')
to select the first three days of the time series.
Note: To learn more about how to create date ranges, please visit Pandas date_range().
Example 3: first() on Sorted Groups
import pandas as pd
data = {
'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
'Value': [2, 1, 4, 3, 6, 5],
'Date': ['2021-01-02', '2021-01-01', '2021-01-02', '2021-01-01', '2021-01-02', '2021-01-01']
}
# create DataFrame and sort using sort_values()
df = pd.DataFrame(data).sort_values(by=['Group', 'Date'])
# group by 'Group' and get first entry per group
grouped = df.groupby('Group')
first_entries = grouped.first()
print(first_entries)
Output
Value Date
Group
A 1 2021-01-01
B 3 2021-01-01
C 5 2021-01-01
In the above example, the data is first sorted by Group
and then by Date
columns using sort_values()
. It is then grouped by the Group
column.
The first()
method is applied to each group, which will select the first occurrence of each group based on the sorted order.
Note: To learn more about how we sort values, please visit Pandas sort_values().