Pandas Sort

Sorting is a fundamental operation in data manipulation and analysis that involves arranging data in a specific order.

Sorting is crucial for tasks such as organizing data for better readability, identifying patterns, making comparisons, and facilitating further analysis.


Sort DataFrame in Pandas

In Pandas, we can use the sort_values() function to sort a DataFrame. For example,

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [28, 22, 25]}
df = pd.DataFrame(data)

# sort DataFrame by Age in ascending order
sorted_df = df.sort_values(by='Age')

print(sorted_df.to_string(index=False))

Output

Name    Age
Bob     22
Charlie 25
Alice   28

In the above example, df.sort_values(by='Age') sorts the df DataFrame based on the values in the Age column in ascending order. And the result is stored in the sorted_df variable.

To sort values in descending order, we use the ascending parameter as:

sorted_df = df.sort_values(by='Age', ascending=False)

The output would be:

Name     Age
Alice    28
Charlie  25
Bob      22

Note: The .to_string(index=False) is used to display values without the index.


Sort Pandas DataFrame by Multiple Columns

We can also sort DataFrame by multiple columns in Pandas. When we sort a Pandas DataFrame by multiple columns, the sorting is done with a priority given to the order of the columns listed.

To sort by multiple columns in Pandas, you can pass the desired columns as a list to the by parameter in the sort_values() method. Here's how we do it.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 22, 30, 22],
        'Score': [85, 90, 75, 80]}

df = pd.DataFrame(data)

# 1. Sort DataFrame by 'Age' and then by 'Score' (Both in ascending order)
df1 = df.sort_values(by=['Age', 'Score'])

print("Sorting by 'Age' (ascending) and then by 'Score' (ascending):\n")
print(df1.to_string(index=False))

print()
# 2. Sort DataFrame by 'Age' in ascending order, and then by 'Score' in descending order
df2 = df.sort_values(by=['Age', 'Score'], ascending=[True, False])

print("Sorting by 'Age' (ascending) and then by 'Score' (descending):\n")
print(df2.to_string(index=False))

Output

Name    Age  Score
Bob     22    90
David   22    80
Alice   25    85
Charlie 30    75

Here,

  1. df1 shows the default sorting behavior (both columns in ascending order).
  2. df2 shows custom sorting, where Age is in ascending and Score is in descending order.

Sort Pandas Series

In Pandas, we can use the sort_values() function to sort a Series. For example,

import pandas as pd

ages = pd.Series([28, 22, 25], name='Age')

# sort Series in ascending order
sorted_ages = ages.sort_values()

print(sorted_ages.to_string(index=False))

Output

22
25
28

Here, ages.sort_values() sorts the ages Series in ascending order. The sorted result is assigned to the sorted_ages variable.


#index Sort Pandas DataFrame Using sort_index()

We can also sort by the index of a DataFrame in Pandas using the sort_index() function.

The sort_index() function is used to sort a DataFrame or Series by its index. This is useful for organizing data in a logical order, improving query performance, and ensuring consistent data representation.

Let's look at an example.

import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [28, 22, 25]}
# create a DataFrame with a non-sequential index
df = pd.DataFrame(data, index=[2, 0, 1])

print("Original DataFrame:")
print(df.to_string(index=True))
print("\n")

# sort DataFrame by index in ascending order
sorted_df = df.sort_index()

print("Sorted DataFrame by index:")
print(sorted_df.to_string(index=True))

Output

Original DataFrame:
     Name    Age
2    Alice   28
0    Bob     22
1    Charlie 25

Sorted DataFrame by index:
    Name    Age
0   Bob     22
1   Charlie 25
2   Alice   28

In the above example, we have created the df DataFrame with a non-sequential index from the data dictionary.

The index parameter is specified as [2, 0, 1], meaning that the rows will not have a default sequential index (0, 1, 2), but rather the provided non-sequential index.

Then we sorted the df DataFrame by its index in ascending order using the sort_index() method.