The loc[]
property in Pandas is used to select data from a DataFrame based on labels or conditions.
Example
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
# select the row with label 1 (second row)
selected_row = df.loc[1]
print(selected_row)
'''
Output
Name Bob
Age 30
City Los Angeles
Name: 1, dtype: object
'''
loc[] Syntax
The syntax of the loc[]
property in Pandas is:
loc[rows, columns]
loc[] Arguments
The loc[]
property takes following arguments:
rows
- specifies data selection criteria for rows, which can be labels, boolean conditions, or slicescolumns
- specifies selection criteria for columns, which can be labels, boolean conditions, or slices.
loc[] Return Value
The loc[]
property in Pandas returns a DataFrame, depending on how we use it and what we're selecting.
Example 1: Select Single Row by Label
import pandas as pd
# create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
# select a single row by label
selected_row = df.loc['A']
print(selected_row)
Output
Name Alice
Age 25
City New York
Name: A, dtype: object
In the above example, we have used the loc[]
property to select a single row from the DataFrame.
The index
parameter specifies custom row labels A
, B
, C
, and D
for the DataFrame.
And the A
label is used as the argument to loc[]
, which specifies that we want to select the row with the label A
.
Example 2: Select Multiple Rows by Labels
import pandas as pd
# create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
# select multiple rows by labels
selected_rows = df.loc[['A', 'C']]
print(selected_rows)
Output
Name Age City A Alice 25 New York C Charlie 35 Chicago
Here, first we created the df DataFrame with row labels A
, B
, C
, and D
. Then, we used loc[['A', 'C']]
to select multiple rows by the labels A
and C
.
Hence, the selected_rows DataFrame contains the rows with labels A
and C
.
Example 3: Select Specific Rows and Columns
import pandas as pd
# create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
# select specific rows and columns
selected_data = df.loc[['A', 'C'], ['Name', 'Age']]
print(selected_data)
Output
Name Age A Alice 25 C Charlie 35
In the above example, we first created the df DataFrame with row labels A
, B
, C
, and D
and columns Name
, Age
, and City
.
- To select specific rows, we pass a list of row labels
A
andC
as the first argument to theloc[]
property. - To select specific columns, we pass a list of column names
Name
andAge
as the second argument to theloc[]
property.
Hence, the output shows the selected rows A
and C
with the columns Name
and Age
.
Example 4: Slice Rows and Select Specific Columns
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
# slice rows and select specific columns
selected_data = df.loc['B':'C', ['Name', 'Age']]
print(selected_data)
Output
Name Age B Bob 30 C Charlie 35
In this example, the loc['B':'C', ['Name', 'Age']]
property slices rows from B
to C
, inclusive, and selects specific columns Name
and Age
.
So, selected_data includes rows B
and C
and only the columns Name
and Age
.
Note: To learn more about how slicing works, please visit Pandas Indexing and Slicing.
Example 5: Select all Rows for Specific Columns
import pandas as pd
# create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data, index=['A', 'B', 'C', 'D'])
# select all rows for specific columns
selected_columns = df.loc[:, ['Name', 'Age']]
print(selected_columns)
Output
Name Age A Alice 25 B Bob 30 C Charlie 35 D David 28
Here, the loc[:, ['Name', 'Age']]
property selects all rows from the DataFrame with only the specified columns Name
and Age
.
This will give us a DataFrame containing all rows for the Name
and Age
columns.
Example 6: Select Specific Rows for all Columns
import pandas as pd
data = {'Student_ID': [101, 102, 103, 104, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Score': [85, 92, 78, 88, 95]}
df = pd.DataFrame(data)
# select rows 1 and 3 for all columns
selected_rows = df.loc[[1, 3], :]
print(selected_rows)
Output
Student_ID Name Score
1 102 Bob 92
3 104 David 88
Here, loc[[1, 3], :]
allows us to select the first and third row while including all columns.
Example 7: Select Rows by Boolean Condition
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Denver', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
# select rows where Age is greater than or equal 30
selected_rows = df.loc[df['Age'] >= 30]
print(selected_rows)
Output
Name Age City 1 Bob 30 Denver 2 Charlie 35 Chicago
Here, we have used the loc[df['Age'] >= 30]
property to select rows from the df DataFrame where the Age
column has a value greater than or equal to 30.
The selected_rows DataFrame will display the rows where the Age
is greater than or equal to 30.