The sort_values()
method in Pandas is used to sort a DataFrame by one or more columns.
Example
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column in ascending order
df_sorted = df.sort_values(by='Age')
print(df_sorted)
'''
Output
Name Age
2 Charlie 22
0 Alice 25
3 David 28
1 Bob 30
'''
sort_values() Syntax
The syntax of the sort_values()
method in Pandas is:
df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)
sort_values() Arguments
The sort_values()
method takes following arguments:
by
- column name or a list of column names by which we want to sort the DataFrameaxis
(optional) - specifies if we want to sort by rows or columnsascending
(optional) - boolean or a list of booleans that determines the sorting orderinplace
(optional) - boolean that determines whether to sort the DataFrame in place or return a new sorted DataFramekind
(optional) - specifies the sorting algorithm to usena_position
(optional) - determines whereNaN
values should be placed during sortingignore_index
(optional) - boolean that determines whether to reset the index of the resulting DataFrame
sort_values() Return Value
The sort_values()
method in Pandas returns a new DataFrame that contains the sorted data based on the specified criteria.
Example1: Sort Column in Descending Order
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column in descending order
df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)
Output
Name Age 1 Bob 30 3 David 28 0 Alice 25 2 Charlie 22
In the above example, we have used the sort_values()
method to sort the df DataFrame by the Age
column in descending order ascending=False
.
This means that the individuals will be arranged in the DataFrame with the oldest person at the top.
Example 2: Sort DataFrame by Multiple Columns
import pandas as pd
data = {'Name': ['Eve', 'Frank', 'Grace', 'Hank'],
'Age': [28, 22, 30, 25],
'Score': [75, 80, 85, 90]}
df = pd.DataFrame(data)
# sort DataFrame by 'Age' and then by 'Score' (Both in ascending order)
df1 = df.sort_values(by=['Age', 'Score'])
print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n")
print(df1.to_string(index=False))
print()
# sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order
df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False])
print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n")
print(df2.to_string(index=False))
Output
Sorting by 'Age' (ascending) and then by 'Score' (ascending): Name Age Score Frank 22 80 Hank 25 90 Eve 28 75 Grace 30 85 Sorting by 'Age' (ascending) and then by 'Score' (descending): Name Age Score Frank 22 80 Hank 25 90 Eve 28 75 Grace 30 85
Here,
- df1 shows the default sorting behavior (both columns
Age
andScore
are in ascending order). - df2 shows custom sorting, where
Age
is in ascending andScore
is in descending order.
Example 3: Sort DataFrame Based on Rows or Columns
import pandas as pd
data = {'A': [3, 1, 2, 4],
'B': [9, 7, 8, 6]}
df = pd.DataFrame(data)
# sort the DataFrame by rows based on column 'A' values in ascending order
df_sorted_rows = df.sort_values(by='A', axis=0)
print("Sorted by rows based on 'A' values:")
print(df_sorted_rows)
# sort the DataFrame by columns based on the values in the first row (index 0)
df_sorted_columns = df.sort_values(by=0, axis=1, ascending=False, ignore_index=True)
print("\nSorted by columns based on values in the first row:")
print(df_sorted_columns)
Output
Sorted by rows based on 'A' values: A B 1 1 7 2 2 8 0 3 9 3 4 6 Sorted by columns based on values in the first row: 0 1 0 9 3 1 7 1 2 8 2 3 6 4
In the above example, we first sorted the df DataFrame by rows axis=0
based on the values in column A
in ascending order.
Then, we sorted the same DataFrame by column axis=1
based on the values in the first row index 0 in descending order.
Here, the ignore_index=True
parameter is used when sorting by column A
. As a result, the original row indices (0, 1, 2, 3) are discarded, and the sorted DataFrame has a new sequential row index (0, 1, 2, 3).
This can be helpful when you want to maintain a clean, sequential index after sorting your DataFrame.
Example 4: Specify Sorting Algorithm to Sort DataFrame
Pandas by default uses the quicksort algorithm for the sort_values()
method. If we want to specify a different sorting algorithm, you can use the kind
parameter.
import pandas as pd
# create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 28]}
df = pd.DataFrame(data)
# sort df by 'Age' column using merge sort algorithm
df_sorted = df.sort_values(by='Age', kind='mergesort')
print(df_sorted)
Output
Name Age 2 Charlie 22 0 Alice 25 3 David 28 1 Bob 30
Here, the kind='mergesort'
parameter is used to specify the merge sort algorithm for sorting the DataFrame by the Age
column in ascending order.
Note: We can replace 'mergesort' with other available sorting algorithms like 'quicksort'
, 'heapsort'
, or 'stable'
as needed.
Example 5: Determine the Placement of Missing Values During Sorting Operation
The na_position
argument is used to determine the placement of missing values during the sorting operation.
na_position='last'
(default) - missing values are placed at the end of the sorted columnas_index='first'
- missing values are placed at the beginning of the sorted column
Let's look at an example.
import pandas as pd
data = {'A': [3, 1, 2, None, 4],
'B': [9, None, 8, 6, 7]}
df = pd.DataFrame(data)
# sort df by column 'A' in ascending order with missing values at the end
df_sorted_last = df.sort_values(by='A', na_position='last')
print("Sorted by 'A' with missing values at the end:")
print(df_sorted_last)
# sort df by column 'B' in ascending order with missing values at the beginning
df_sorted_first = df.sort_values(by='B', na_position='first')
print("\nSorted by 'B' with missing values at the beginning:")
print(df_sorted_first)
Output
Sorted by 'A' with missing values at the end: A B 1 1.0 NaN 2 2.0 8.0 0 3.0 9.0 4 4.0 7.0 3 NaN 6.0 Sorted by 'B' with missing values at the beginning: A B 1 1.0 NaN 3 NaN 6.0 4 4.0 7.0 2 2.0 8.0 0 3.0 9.0
Here,
df.sort_values(by='A', na_position='last')
- missing values are placed at the end of columnA
df.sort_values(by='B', na_position='first')
- missing values are placed at the beginning of columnB