The fillna()
method in Pandas is used to fill missing (NaN) values in a DataFrame.
Example
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# fill missing values with a constant value say 0
df_filled = df.fillna(0)
print(df_filled)
'''
Output
A B
0 1.0 0.0
1 2.0 2.0
2 0.0 3.0
3 4.0 0.0
4 5.0 5.0
'''
fillna() Syntax
The syntax of the fillna()
method in Pandas is:
df.fillna(value, method=None, axis=None, inplace=False, limit=None)
fillna() Arguments
The fillna()
method takes following arguments:
value
- specifies the value that we want to use for filling missing valuesmethod
(optional) - allows us to specify a method for filling missing valuesaxis
(optional) - specifies the axis along which the filling should be performedinplace
(optional) - if set toTrue
, it will modify the original DataFrame. IfFalse
(default), it will return a new DataFrame with missing values filledlimit
(optional) - limits the number of replacements for forward and backward filling
fillna() Return Value
The fillna()
method returns a new DataFrame with missing values filled according to the specified parameters.
Example 1: Fill Missing Values With Constant Value
import pandas as pd
# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
'B': [None, 2, 13, None, 65]}
df = pd.DataFrame(data)
constant_value = 0
# fill missing values with a constant value
df_filled = df.fillna(constant_value)
print(df_filled)
Output
A B 0 10.0 0.0 1 20.0 2.0 2 0.0 13.0 3 25.0 0.0 4 55.0 65.0
In the above example, we have set constant_value to 0. The fillna()
method replaces all missing values in the df DataFrame with this constant value.
The missing values are replaced with 0 in the resulting DataFrame.
Note: We can replace 0 with any other constant value of our choice to fill missing values with that value in our DataFrame.
Example 2: Fill Missing Values With a Dictionary
import pandas as pd
# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
'B': [None, 2, 13, None, 65]}
df = pd.DataFrame(data)
# define a dictionary with values for filling missing values
# replace 'A' column missing values with 0 and 'B' missing values with 42
fill_values = {'A': 0, 'B': 42}
# fill missing values with the values from the dictionary
df_filled = df.fillna(fill_values)
print(df_filled)
Output
A B 0 10.0 42.0 1 20.0 2.0 2 0.0 13.0 3 25.0 42.0 4 55.0 65.0
Here, we have replaced missing values of 'A'
column with value 0 and replaced missing values of 'B'
column with value a constant value 42.
When inplace=True
is used, the df DataFrame is directly updated, eliminating the need for a new DataFrame to hold the changes. .
Example 3: Use Different Methods for Filling Missing Values
We can use the method
parameter to specify a method for filling missing values. If we set
method='ffill'
- it implements forward filling, where missing values are filled with the preceding non-missing valuemethod='bfill'
- it implements backward filling, where missing values are filled with the next non-missing value
Let's look at an example.
import pandas as pd
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# forward fill missing values
df_ffill = df.fillna(method='ffill')
print(df_ffill)
# backward fill missing values
df_bfill = df.fillna(method='bfill')
print(df_bfill)
Output
A B 0 1.0 NaN 1 2.0 2.0 2 2.0 3.0 3 4.0 3.0 4 5.0 5.0 A B 0 1.0 2.0 1 2.0 2.0 2 4.0 3.0 3 4.0 5.0 4 5.0 5.0
Here, while forward filling missing values using method='ffill'
,
- For column
'A'
, it fills the missing value in row 2 with the previous non-missing value (2.0 from row 1). - For column
'B'
, it fills the missing values in rows 0 and 3 with the previous non-missing values (1.0 from row 0 and 3.0 from row 2).
And, while backward filling missing values using method='bfill'
,
- For column
'A'
, it fills the missing value in row 2 with the next non-missing value (4.0 from row 3). - For column
'B'
, it fills the missing values in rows 0 and 3 with the next non-missing values (2.0 from row 1 and 5.0 from row 4).
Example 4: Specify Axis Along Which Filling Should be Performed
- To fill missing values along rows (column-wise), we set
axis=0
(or we can omit the axis parameter sinceaxis=0
is the default behavior) - To fill missing values along columns (row-wise), we can set
axis=1
.
Let's look at an example.
import pandas as pd
# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)
# fill missing values along rows (column-wise)
df_filled_rows = df.fillna(101, axis=0)
print("Filled along rows (column-wise):\n", df_filled_rows)
# fill missing values along columns (row-wise)
df_filled_columns = df.fillna(202, axis=1)
print("\nFilled along columns (row-wise):\n", df_filled_columns)
Output
Filled along rows (column-wise): A B 0 1.0 101.0 1 2.0 2.0 2 101.0 3.0 3 4.0 101.0 4 5.0 5.0 Filled along columns (row-wise): A B 0 1.0 202.0 1 2.0 2.0 2 202.0 3.0 3 4.0 202.0 4 5.0 5.0
In the above example, we use the fillna()
method to fill missing values with 101 and 202 along rows (column-wise) and along columns (row-wise) respectively.
Example 5: Use of as_index Argument in fillna()
The as_index()
argument is used to specify whether grouping columns should be treated as index columns or not.
as_index=True
- grouped columns become the index of the resulting DataFrameas_index=False
- grouped columns remain as regular columns in the resulting DataFrame
Let's look at an example.
import pandas as pd
# create a sample DataFrame with missing values
data = {'A': [1, 2, None, None, 5, None],
'B': [None, 10, 11, None, 14, 15]}
df = pd.DataFrame(data)
# fill missing values forward with a limit of 1
df_filled_forward = df.fillna(method='ffill', limit=1)
print("DataFrame filled forward:")
print(df_filled_forward)
# fill missing values backward with a limit of 1
df_filled_backward = df.fillna(method='bfill', limit=1)
print("\nDataFrame filled backward:")
print(df_filled_backward)
Output
DataFrame filled forward: A B 0 1.0 NaN 1 2.0 10.0 2 2.0 11.0 3 NaN 11.0 4 5.0 14.0 5 5.0 15.0 DataFrame filled backward: A B 0 1.0 10.0 1 2.0 10.0 2 NaN 11.0 3 5.0 14.0 4 5.0 14.0 5 NaN 15.0
In this example, the limit
parameter is set to 1 for both forward and backward filling.
As a result, only a maximum of one consecutive missing value will be filled in either direction from any given position.
This allows us to control how many missing values are replaced in a consecutive sequence while leaving the rest of the missing values unchanged.