The where()
method in Pandas is used to replace values in a DataFrame based on a condition.
Example
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [2, 3, 4, 5]
})
# use where() to replace values less than 3 with 0
df_modified = df.where(df >= 3, other=0)
print(df_modified)
'''
Output
A B
0 0 0
1 0 3
2 3 4
3 4 5
'''
where() Syntax
The syntax of the where()
method in Pandas is:
df.where(cond, other=NaN, inplace=False, axis=None, level=None)
where() Arguments
The where()
method takes following arguments:
cond
- the condition we want to check for.other
(optional) - the value to replace with where the condition isFalse
. By default, it isNaN
.inplace
(optional) - ifTrue
, it will modify the DataFrame in place. By default, it'sFalse
, which means it will return a new DataFrame.axis
(optional) - specifies whether to apply the condition along rows or columns.level
(optional) - alignment level ifother
is a Series or DataFrame.
where() Return Value
The where()
method returns a new DataFrame with the original data where the condition is True
and the specified replacement value where the condition is False
.
Example 1: Use where() to Conditionally Replace Values
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
})
# replace values in column 'A'
# where the condition is False (if the values are not equal to 2)
result = df['A'].where(df['A'] == 2, other=-1)
print(result)
Output
0 -1 1 2 2 -1 Name: A, dtype: int64
In this example, we are using the where()
method to replace values in the A
column.
So only the value in column A
that equals 2 remains unchanged, while all other values in the same column are replaced with -1.
If we don't use the other
argument as
# without other argument
result = df['A'].where(df['A'] == 2)
All the values in result that do not meet the condition (df['A'] == 2
) will be replaced with NaN
by default.
Hence, the output will be
0 NaN 1 2.0 2 NaN Name: A, dtype: float64
Example 2: Use of axis Argument in where()
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# define a condition
condition = df > 2
# create a Series for replacing
replacement_series = pd.Series([-1, -2, -3])
# replace values in columns using corresponding values from the Series
result_axis_0 = df.where(condition, other=replacement_series, axis=0)
# replace values in rows using corresponding values from the Series
result_axis_1 = df.where(condition, other=replacement_series, axis=1)
print("Replacement with axis=0:")
print(result_axis_0)
print("\nReplacement with axis=1:")
print(result_axis_1)
Output
Replacement with axis=0:
A B C
0 -1 4 7
1 -2 5 8
2 3 6 9
Replacement with axis=1:
A B C
0 NaN 4 7
1 NaN 5 8
2 3.0 6 9
Here,
- With
axis=0
, -1 replaces values in columnA
that are not>
2, -2 replaces values in columnB
, and -3 replaces values in columnC
. - With
axis=1
, replacements are made row-wise and since your other series do not cover all columns, we getNaN
for the columns without a corresponding replacement value.
Example 3: Use of level argument in where()
import pandas as pd
# create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_product([['A', 'B'], [1, 2]], names=['Upper', 'Lower'])
df = pd.DataFrame({'Value': [10, 1, 20, 2]}, index=index)
# define condition to keep the numbers
# that are not 1 at the 'Upper' level 'A'
condition = df['Value'] != 1
# apply where condition at level 'Upper'
level_example = df.where(condition, other=-1, level='Upper')
print(level_example)
Output
Value
Upper Lower
A 1 10
2 -1
B 1 20
2 2
In the above example, the where()
method replaces values with -1 in the DataFrame where the condition df['Value'] != 1
is False
.
The condition is checked across all levels of the MultiIndex, unaffected by the level
argument which only aligns the other
value for the replacement.
Thus, all occurrences of 1 in the DataFrame are replaced by -1.