Pandas sort()

The sort_values() method in Pandas is used to sort a DataFrame by one or more columns.

Example

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28]}

df = pd.DataFrame(data)

# sort df by 'Age' column in ascending order df_sorted = df.sort_values(by='Age')
print(df_sorted) ''' Output Name Age 2 Charlie 22 0 Alice 25 3 David 28 1 Bob 30 '''

sort_values() Syntax

The syntax of the sort_values() method in Pandas is:

df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

sort_values() Arguments

The sort_values() method takes following arguments:

  • by - column name or a list of column names by which we want to sort the DataFrame
  • axis (optional) - specifies if we want to sort by rows or columns
  • ascending (optional) - boolean or a list of booleans that determines the sorting order
  • inplace (optional) - boolean that determines whether to sort the DataFrame in place or return a new sorted DataFrame
  • kind (optional) - specifies the sorting algorithm to use
  • na_position (optional) - determines where NaN values should be placed during sorting
  • ignore_index (optional) - boolean that determines whether to reset the index of the resulting DataFrame

sort_values() Return Value

The sort_values() method in Pandas returns a new DataFrame that contains the sorted data based on the specified criteria.


Example1: Sort Column in Descending Order

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28]}

df = pd.DataFrame(data)

# sort df by 'Age' column in descending order df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)

Output

     Name  Age
1      Bob   30
3    David   28
0    Alice   25
2  Charlie   22

In the above example, we have used the sort_values() method to sort the df DataFrame by the Age column in descending order ascending=False.

This means that the individuals will be arranged in the DataFrame with the oldest person at the top.


Example 2: Sort DataFrame by Multiple Columns

import pandas as pd

data = {'Name': ['Eve', 'Frank', 'Grace', 'Hank'],
        'Age': [28, 22, 30, 25],
        'Score': [75, 80, 85, 90]}

df = pd.DataFrame(data)

# sort DataFrame by 'Age' and then by 'Score' (Both in ascending order) df1 = df.sort_values(by=['Age', 'Score'])
print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n") print(df1.to_string(index=False)) print()
# sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False])
print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n") print(df2.to_string(index=False))

Output

Sorting by 'Age' (ascending) and then by 'Score' (ascending):

Name  Age  Score
Frank   22     80
 Hank   25     90
  Eve   28     75
Grace   30     85

Sorting by 'Age' (ascending) and then by 'Score' (descending):

Name  Age  Score
Frank   22     80
 Hank   25     90
  Eve   28     75
Grace   30     85

Here,

  1. df1 shows the default sorting behavior (both columns Age and Score are in ascending order).
  2. df2 shows custom sorting, where Age is in ascending and Score is in descending order.

Example 3: Sort DataFrame Based on Rows or Columns

import pandas as pd

data = {'A': [3, 1, 2, 4],
        'B': [9, 7, 8, 6]}
df = pd.DataFrame(data)

# sort the DataFrame by rows based on column 'A' values in ascending order df_sorted_rows = df.sort_values(by='A', axis=0)
print("Sorted by rows based on 'A' values:") print(df_sorted_rows)
# sort the DataFrame by columns based on the values in the first row (index 0) df_sorted_columns = df.sort_values(by=0, axis=1, ascending=False, ignore_index=True)
print("\nSorted by columns based on values in the first row:") print(df_sorted_columns)

Output

Sorted by rows based on 'A' values:
   A  B
1  1  7
2  2  8
0  3  9
3  4  6
Sorted by columns based on values in the first row:
    0  1
0  9  3
1  7  1
2  8  2
3  6  4

In the above example, we first sorted the df DataFrame by rows axis=0 based on the values in column A in ascending order.

Then, we sorted the same DataFrame by column axis=1 based on the values in the first row index 0 in descending order.

Here, the ignore_index=True parameter is used when sorting by column A. As a result, the original row indices (0, 1, 2, 3) are discarded, and the sorted DataFrame has a new sequential row index (0, 1, 2, 3).

This can be helpful when you want to maintain a clean, sequential index after sorting your DataFrame.


Example 4: Specify Sorting Algorithm to Sort DataFrame

Pandas by default uses the quicksort algorithm for the sort_values() method. If we want to specify a different sorting algorithm, you can use the kind parameter.

import pandas as pd

# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 28]}

df = pd.DataFrame(data)

# sort df by 'Age' column using merge sort algorithm
df_sorted = df.sort_values(by='Age', kind='mergesort')

print(df_sorted)

Output

    Name  Age
2  Charlie   22
0    Alice   25
3    David   28
1      Bob   30

Here, the kind='mergesort' parameter is used to specify the merge sort algorithm for sorting the DataFrame by the Age column in ascending order.

Note: We can replace 'mergesort' with other available sorting algorithms like 'quicksort', 'heapsort', or 'stable' as needed.


Example 5: Determine the Placement of Missing Values During Sorting Operation

The na_position argument is used to determine the placement of missing values during the sorting operation.

  • na_position='last' (default) - missing values are placed at the end of the sorted column
  • as_index='first' - missing values are placed at the beginning of the sorted column

Let's look at an example.

import pandas as pd

data = {'A': [3, 1, 2, None, 4],
        'B': [9, None, 8, 6, 7]}
df = pd.DataFrame(data)

# sort df by column 'A' in ascending order with missing values at the end df_sorted_last = df.sort_values(by='A', na_position='last')
print("Sorted by 'A' with missing values at the end:") print(df_sorted_last)
# sort df by column 'B' in ascending order with missing values at the beginning df_sorted_first = df.sort_values(by='B', na_position='first')
print("\nSorted by 'B' with missing values at the beginning:") print(df_sorted_first)

Output

Sorted by 'A' with missing values at the end:
     A    B
1  1.0  NaN
2  2.0  8.0
0  3.0  9.0
4  4.0  7.0
3  NaN  6.0
Sorted by 'B' with missing values at the beginning:
     A    B
1  1.0  NaN
3  NaN  6.0
4  4.0  7.0
2  2.0  8.0
0  3.0  9.0

Here,

  • df.sort_values(by='A', na_position='last') - missing values are placed at the end of column A
  • df.sort_values(by='B', na_position='first') - missing values are placed at the beginning of column B