The where() method in Pandas is used to replace values in a DataFrame based on a condition.
Example
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [2, 3, 4, 5]
})
# use where() to replace values less than 3 with 0
df_modified = df.where(df >= 3, other=0)
print(df_modified)
'''
Output
A B
0 0 0
1 0 3
2 3 4
3 4 5
'''
where() Syntax
The syntax of the where() method in Pandas is:
df.where(cond, other=NaN, inplace=False, axis=None, level=None)
where() Arguments
The where() method takes following arguments:
cond- the condition we want to check for.other(optional) - the value to replace with where the condition isFalse. By default, it isNaN.inplace(optional) - ifTrue, it will modify the DataFrame in place. By default, it'sFalse, which means it will return a new DataFrame.axis(optional) - specifies whether to apply the condition along rows or columns.level(optional) - alignment level ifotheris a Series or DataFrame.
where() Return Value
The where() method returns a new DataFrame with the original data where the condition is True and the specified replacement value where the condition is False.
Example 1: Use where() to Conditionally Replace Values
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
})
# replace values in column 'A'
# where the condition is False (if the values are not equal to 2)
result = df['A'].where(df['A'] == 2, other=-1)
print(result)
Output
0 -1 1 2 2 -1 Name: A, dtype: int64
In this example, we are using the where() method to replace values in the A column.
So only the value in column A that equals 2 remains unchanged, while all other values in the same column are replaced with -1.
If we don't use the other argument as
# without other argument
result = df['A'].where(df['A'] == 2)
All the values in result that do not meet the condition (df['A'] == 2) will be replaced with NaN by default.
Hence, the output will be
0 NaN 1 2.0 2 NaN Name: A, dtype: float64
Example 2: Use of axis Argument in where()
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# define a condition
condition = df > 2
# create a Series for replacing
replacement_series = pd.Series([-1, -2, -3])
# replace values in columns using corresponding values from the Series
result_axis_0 = df.where(condition, other=replacement_series, axis=0)
# replace values in rows using corresponding values from the Series
result_axis_1 = df.where(condition, other=replacement_series, axis=1)
print("Replacement with axis=0:")
print(result_axis_0)
print("\nReplacement with axis=1:")
print(result_axis_1)
Output
Replacement with axis=0:
A B C
0 -1 4 7
1 -2 5 8
2 3 6 9
Replacement with axis=1:
A B C
0 NaN 4 7
1 NaN 5 8
2 3.0 6 9
Here,
- With
axis=0, -1 replaces values in columnAthat are not>2, -2 replaces values in columnB, and -3 replaces values in columnC. - With
axis=1, replacements are made row-wise and since your other series do not cover all columns, we getNaNfor the columns without a corresponding replacement value.
Example 3: Use of level argument in where()
import pandas as pd
# create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_product([['A', 'B'], [1, 2]], names=['Upper', 'Lower'])
df = pd.DataFrame({'Value': [10, 1, 20, 2]}, index=index)
# define condition to keep the numbers
# that are not 1 at the 'Upper' level 'A'
condition = df['Value'] != 1
# apply where condition at level 'Upper'
level_example = df.where(condition, other=-1, level='Upper')
print(level_example)
Output
Value
Upper Lower
A 1 10
2 -1
B 1 20
2 2
In the above example, the where() method replaces values with -1 in the DataFrame where the condition df['Value'] != 1 is False.
The condition is checked across all levels of the MultiIndex, unaffected by the level argument which only aligns the other value for the replacement.
Thus, all occurrences of 1 in the DataFrame are replaced by -1.