The fillna() method in Pandas is used to fill missing (NaN) values in a DataFrame.
Example
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# fill missing values with a constant value say 0
df_filled = df.fillna(0)
print(df_filled)
'''
Output
A B
0 1.0 0.0
1 2.0 2.0
2 0.0 3.0
3 4.0 0.0
4 5.0 5.0
'''
fillna() Syntax
The syntax of the fillna() method in Pandas is:
df.fillna(value, method=None, axis=None, inplace=False, limit=None)
fillna() Arguments
The fillna() method takes following arguments:
value- specifies the value that we want to use for filling missing valuesmethod(optional) - allows us to specify a method for filling missing valuesaxis(optional) - specifies the axis along which the filling should be performedinplace(optional) - if set toTrue, it will modify the original DataFrame. IfFalse(default), it will return a new DataFrame with missing values filledlimit(optional) - limits the number of replacements for forward and backward filling
fillna() Return Value
The fillna() method returns a new DataFrame with missing values filled according to the specified parameters.
Example 1: Fill Missing Values With Constant Value
import pandas as pd
# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
'B': [None, 2, 13, None, 65]}
df = pd.DataFrame(data)
constant_value = 0
# fill missing values with a constant value
df_filled = df.fillna(constant_value)
print(df_filled)
Output
A B
0 10.0 0.0
1 20.0 2.0
2 0.0 13.0
3 25.0 0.0
4 55.0 65.0
In the above example, we have set constant_value to 0. The fillna() method replaces all missing values in the df DataFrame with this constant value.
The missing values are replaced with 0 in the resulting DataFrame.
Note: We can replace 0 with any other constant value of our choice to fill missing values with that value in our DataFrame.
Example 2: Fill Missing Values With a Dictionary
import pandas as pd
# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
'B': [None, 2, 13, None, 65]}
df = pd.DataFrame(data)
# define a dictionary with values for filling missing values
# replace 'A' column missing values with 0 and 'B' missing values with 42
fill_values = {'A': 0, 'B': 42}
# fill missing values with the values from the dictionary
df_filled = df.fillna(fill_values)
print(df_filled)
Output
A B
0 10.0 42.0
1 20.0 2.0
2 0.0 13.0
3 25.0 42.0
4 55.0 65.0
Here, we have replaced missing values of 'A' column with value 0 and replaced missing values of 'B' column with value a constant value 42.
When inplace=True is used, the df DataFrame is directly updated, eliminating the need for a new DataFrame to hold the changes. .
Example 3: Use Different Methods for Filling Missing Values
We can use the method parameter to specify a method for filling missing values. If we set
method='ffill'- it implements forward filling, where missing values are filled with the preceding non-missing valuemethod='bfill'- it implements backward filling, where missing values are filled with the next non-missing value
Let's look at an example.
import pandas as pd
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# forward fill missing values
df_ffill = df.fillna(method='ffill')
print(df_ffill)
# backward fill missing values
df_bfill = df.fillna(method='bfill')
print(df_bfill)
Output
A B
0 1.0 NaN
1 2.0 2.0
2 2.0 3.0
3 4.0 3.0
4 5.0 5.0
A B
0 1.0 2.0
1 2.0 2.0
2 4.0 3.0
3 4.0 5.0
4 5.0 5.0
Here, while forward filling missing values using method='ffill',
- For column
'A', it fills the missing value in row 2 with the previous non-missing value (2.0 from row 1). - For column
'B', it fills the missing values in rows 0 and 3 with the previous non-missing values (1.0 from row 0 and 3.0 from row 2).
And, while backward filling missing values using method='bfill',
- For column
'A', it fills the missing value in row 2 with the next non-missing value (4.0 from row 3). - For column
'B', it fills the missing values in rows 0 and 3 with the next non-missing values (2.0 from row 1 and 5.0 from row 4).
Example 4: Specify Axis Along Which Filling Should be Performed
- To fill missing values along rows (column-wise), we set
axis=0(or we can omit the axis parameter sinceaxis=0is the default behavior) - To fill missing values along columns (row-wise), we can set
axis=1.
Let's look at an example.
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# fill missing values along rows (column-wise)
df_filled_rows = df.fillna(101, axis=0)
print("Filled along rows (column-wise):\n", df_filled_rows)
# fill missing values along columns (row-wise)
df_filled_columns = df.fillna(202, axis=1)
print("\nFilled along columns (row-wise):\n", df_filled_columns)
Output
Filled along rows (column-wise):
A B
0 1.0 101.0
1 2.0 2.0
2 101.0 3.0
3 4.0 101.0
4 5.0 5.0
Filled along columns (row-wise):
A B
0 1.0 202.0
1 2.0 2.0
2 202.0 3.0
3 4.0 202.0
4 5.0 5.0
In the above example, we use the fillna() method to fill missing values with 101 and 202 along rows (column-wise) and along columns (row-wise) respectively.
Example 5: Use of as_index Argument in fillna()
The as_index() argument is used to specify whether grouping columns should be treated as index columns or not.
as_index=True- grouped columns become the index of the resulting DataFrameas_index=False- grouped columns remain as regular columns in the resulting DataFrame
Let's look at an example.
import pandas as pd
# create a sample DataFrame with missing values
data = {'A': [1, 2, None, None, 5, None],
'B': [None, 10, 11, None, 14, 15]}
df = pd.DataFrame(data)
# fill missing values forward with a limit of 1
df_filled_forward = df.fillna(method='ffill', limit=1)
print("DataFrame filled forward:")
print(df_filled_forward)
# fill missing values backward with a limit of 1
df_filled_backward = df.fillna(method='bfill', limit=1)
print("\nDataFrame filled backward:")
print(df_filled_backward)
Output
DataFrame filled forward:
A B
0 1.0 NaN
1 2.0 10.0
2 2.0 11.0
3 NaN 11.0
4 5.0 14.0
5 5.0 15.0
DataFrame filled backward:
A B
0 1.0 10.0
1 2.0 10.0
2 NaN 11.0
3 5.0 14.0
4 5.0 14.0
5 NaN 15.0
In this example, the limit parameter is set to 1 for both forward and backward filling.
As a result, only a maximum of one consecutive missing value will be filled in either direction from any given position.
This allows us to control how many missing values are replaced in a consecutive sequence while leaving the rest of the missing values unchanged.