Certification Courses

Created with over a decade of experience and thousands of feedback.

Learn Python

Learn HTML

Learn JavaScript

Learn SQL

Learn DSA

View all Courses on

Learn C

Learn C++

Learn Java

Pandas Select

Pandas select refers to the process of extracting specific portions of data from a DataFrame.

Data selection involves choosing specific rows and columns based on labels, positions, or conditions.

Pandas provides various methods, such as basic indexing, slicing, boolean indexing, and querying, to efficiently extract, filter, and transform data, enabling users to focus on relevant information for analysis and decision-making.

Select Data Using Indexing and Slicing

In Pandas, we can use square brackets and their labels or positions to select the data we want.

Let's look at an example.

import pandas as pd

# create a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 22, 27, 29],
    'Salary': [50000, 60000, 45000, 55000, 52000]
}

df = pd.DataFrame(data)

# selecting a single column
name_column = df['Name']

print("Selecting single column: Name")
print(name_column)
print()

# selecting multiple columns
age_salary_columns = df[['Age', 'Salary']]

print("Selecting multiple columns: Age and Salary")
print(age_salary_columns.to_string(index=False))
print()

# selecting rows using slicing
selected_rows = df[1:4]

print("Selecting rows 1 to 3")
print(selected_rows.to_string(index=False))
print()

Output

Selecting single column: Name
0      Alice
1        Bob
2    Charlie
3      David
4        Eve
Name: Name, dtype: object

Selecting multiple columns: Age and Salary
Age  Salary
25   50000
30   60000
22   45000
27   55000
29   52000

Selecting rows 1 to 3
Name      Age  Salary
Bob        30    60000
Charlie    22    45000
David      27    55000

In the above example, we have created a DataFrame named df that is using dictionary data containing three columns: Name, Age, and Salary. Each column is represented by a list of values.

Then we,

selected a single column Name using df['Name']
selected multiple columns Age and Salary using df[['Age', 'Salary']]
selected rows from 1 to 3 using slicing df[1:4]

Note: The .to_string(index=False) is used to display values without the index.

Using loc and iloc to Select Data

The loc and iloc methods in Pandas are used to access data by using label or integer index.

loc selects rows and columns with specific labels
iloc selects rows and columns at specific index

Let's take a look at an example.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 22, 27, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'San Francisco']
}

df = pd.DataFrame(data)
print(f"Original DataFrame \n {df} \n") 

# loc to select rows and columns by labels
# select rows 1 to 3 and columns Name and Age
selected_data_loc = df.loc[1:3, ['Name', 'Age']]

print(selected_data_loc.to_string(index = False))
print() 

# iloc to select rows and columns by index
# select rows 1 to 3 and columns 0 and 2 
selected_data_iloc = df.iloc[1:4, [0, 2]]

print(selected_data_iloc.to_string(index = False))

Output

Original DataFrame 
       Name  Age           City
0    Alice   25       New York
1      Bob   30    Los Angeles
2  Charlie   22        Chicago
3    David   27        Houston
4    Emily   29  San Francisco 

Name     Age
Bob       30
Charlie   22
David     27

Name        City
Bob      Los Angeles
Charlie  Chicago
David    Houston

Here,

Using df.loc[1:3, ['Name', 'Age']] - selects rows 1 to 3 and columns Name and Age from df
Using df.iloc[1:4, [0, 2]] - selects rows 1 to 3 and columns at index positions 0 and 2 from df

Select Rows Based on Specific Criteria

In Pandas, we can use boolean conditions to filter rows based on specific criteria. For example,

import pandas as pd

# creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24],
    'Gender': ['Female', 'Male', 'Male', 'Male', 'Female']
}

df = pd.DataFrame(data)

# select rows where Age is greater than 25
selected_rows = df[df['Age'] > 25]

print(selected_rows)

Output

     Name  Age Gender
1    Bob   30   Male
3  David   28   Male

In this example, we have selected the rows where the age is greater than 25.

The boolean indexing is done using the condition

df['Age'] > 25

This creates a boolean mask. And when this mask is applied to the DataFrame, it selects only the rows where the condition is True.

query() to Select Data

The query() method in Pandas allows you to select data using a more SQL-like syntax.

Let's take a look at an example.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 22, 28, 35],
    'Score': [85, 90, 75, 80, 95]
}

df = pd.DataFrame(data)

# select the rows where the age is greater than 25
selected_rows = df.query('Age > 25')

print(selected_rows.to_string(index = False))

Output


Name  Age  Score
Bob    30     90
David  28     80
Eva    35     95

In this example, the query Age > 25 selects the rows where the Age column's values are greater than 25.

Select Rows Based on a List of Values

Pandas provides us with the method named isin() to filter rows based on a list of values. For example,

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'Age': [25, 30, 22, 28, 24]
}

df = pd.DataFrame(data)

# create a list of names to select
names_to_filter = ['Bob', 'David']

# use isin() to select rows based on the 'Name' column
selected_rows = df[df['Name'].isin(names_to_filter)]

print(selected_rows.to_string(index = False))

Output

Name  Age
Bob   30
David 28

In this example, we want to select only the rows where the name is either Bob or David.

We created a list names_to_filter with the names we want to filter by and then used the isin() method to filter the rows based on the values in the Name column.

Introduction
Select Data Using Indexing and Slicing
Using loc and iloc to Select Data
Select Rows Based on Specific Criteria
query() to Select Data
Select Rows Based on a List of Values

Previous Tutorial:

Pandas Indexing and Slicing

Next Tutorial:

Pandas Multiindex

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO

Interactive Courses
Certificates
AI Help
2000+ Challenges

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python
programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Select

Select Data Using Indexing and Slicing

Using loc and iloc to Select Data

Select Rows Based on Specific Criteria

query() to Select Data

Select Rows Based on a List of Values

Table of Contents

Popular Tutorials

Popular Examples

Reference Materials

Certification Courses

Become a certified Python programmer.

Popular Tutorials

Reference Materials

Popular Examples

Introduction

Dataframe Operations and Manipulations

Data Import and Export

Data Cleaning

Data Analysis and Aggregation

Data Visualization

Pandas Select

Select Data Using Indexing and Slicing

Using loc and iloc to Select Data

Select Rows Based on Specific Criteria

query() to Select Data

Select Rows Based on a List of Values

Table of Contents

Become a certified Python
programmer.