Pandas fillna()

The fillna() method in Pandas is used to fill missing (NaN) values in a DataFrame.

Example

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
             'B': [None, 2, 3, None, 5]}

df = pd.DataFrame(data)

# fill missing values with a constant value say 0 df_filled = df.fillna(0)
print(df_filled) ''' Output A B 0 1.0 0.0 1 2.0 2.0 2 0.0 3.0 3 4.0 0.0 4 5.0 5.0 '''

fillna() Syntax

The syntax of the fillna() method in Pandas is:

df.fillna(value, method=None, axis=None, inplace=False, limit=None)

fillna() Arguments

The fillna() method takes following arguments:

  • value - specifies the value that we want to use for filling missing values
  • method (optional) - allows us to specify a method for filling missing values
  • axis (optional) - specifies the axis along which the filling should be performed
  • inplace (optional) - if set to True, it will modify the original DataFrame. If False (default), it will return a new DataFrame with missing values filled
  • limit (optional) - limits the number of replacements for forward and backward filling

fillna() Return Value

The fillna() method returns a new DataFrame with missing values filled according to the specified parameters.


Example 1: Fill Missing Values With Constant Value

import pandas as pd

# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
        'B': [None, 2, 13, None, 65]}

df = pd.DataFrame(data)

constant_value = 0

# fill missing values with a constant value df_filled = df.fillna(constant_value)
print(df_filled)

Output

     A     B
0  10.0   0.0
1  20.0   2.0
2   0.0  13.0
3  25.0   0.0
4  55.0  65.0

In the above example, we have set constant_value to 0. The fillna() method replaces all missing values in the df DataFrame with this constant value.

The missing values are replaced with 0 in the resulting DataFrame.

Note: We can replace 0 with any other constant value of our choice to fill missing values with that value in our DataFrame.


Example 2: Fill Missing Values With a Dictionary

import pandas as pd

# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
        'B': [None, 2, 13, None, 65]}

df = pd.DataFrame(data)

# define a dictionary with values for filling missing values
# replace 'A' column missing values with 0 and 'B' missing values with 42
fill_values = {'A': 0, 'B': 42}  

# fill missing values with the values from the dictionary df_filled = df.fillna(fill_values)
print(df_filled)

Output

      A     B
0  10.0  42.0
1  20.0   2.0
2   0.0  13.0
3  25.0  42.0
4  55.0  65.0

Here, we have replaced missing values of 'A' column with value 0 and replaced missing values of 'B' column with value a constant value 42.

When inplace=True is used, the df DataFrame is directly updated, eliminating the need for a new DataFrame to hold the changes. .


Example 3: Use Different Methods for Filling Missing Values

We can use the method parameter to specify a method for filling missing values. If we set

  • method='ffill' - it implements forward filling, where missing values are filled with the preceding non-missing value
  • method='bfill' - it implements backward filling, where missing values are filled with the next non-missing value

Let's look at an example.

import pandas as pd

data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, None, 5]}

df = pd.DataFrame(data)

# forward fill missing values
df_ffill = df.fillna(method='ffill')
print(df_ffill)

# backward fill missing values
df_bfill = df.fillna(method='bfill')
print(df_bfill)

Output

    A    B
0  1.0  NaN
1  2.0  2.0
2  2.0  3.0
3  4.0  3.0
4  5.0  5.0
     A    B
0  1.0  2.0
1  2.0  2.0
2  4.0  3.0
3  4.0  5.0
4  5.0  5.0

Here, while forward filling missing values using method='ffill',

  1. For column 'A', it fills the missing value in row 2 with the previous non-missing value (2.0 from row 1).
  2. For column 'B', it fills the missing values in rows 0 and 3 with the previous non-missing values (1.0 from row 0 and 3.0 from row 2).

And, while backward filling missing values using method='bfill',

  1. For column 'A', it fills the missing value in row 2 with the next non-missing value (4.0 from row 3).
  2. For column 'B', it fills the missing values in rows 0 and 3 with the next non-missing values (2.0 from row 1 and 5.0 from row 4).

Example 4: Specify Axis Along Which Filling Should be Performed

  1. To fill missing values along rows (column-wise), we set axis=0 (or we can omit the axis parameter since axis=0 is the default behavior)
  2. To fill missing values along columns (row-wise), we can set axis=1.

Let's look at an example.

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, None, 5]}
df = pd.DataFrame(data)

# fill missing values along rows (column-wise) df_filled_rows = df.fillna(101, axis=0)
print("Filled along rows (column-wise):\n", df_filled_rows)
# fill missing values along columns (row-wise) df_filled_columns = df.fillna(202, axis=1)
print("\nFilled along columns (row-wise):\n", df_filled_columns)

Output

Filled along rows (column-wise):
      A      B
0    1.0  101.0
1    2.0    2.0
2  101.0    3.0
3    4.0  101.0
4    5.0    5.0
Filled along columns (row-wise):
       A      B
0    1.0  202.0
1    2.0    2.0
2  202.0    3.0
3    4.0  202.0
4    5.0    5.0

In the above example, we use the fillna() method to fill missing values with 101 and 202 along rows (column-wise) and along columns (row-wise) respectively.


Example 5: Use of as_index Argument in fillna()

The as_index() argument is used to specify whether grouping columns should be treated as index columns or not.

  • as_index=True - grouped columns become the index of the resulting DataFrame
  • as_index=False - grouped columns remain as regular columns in the resulting DataFrame

Let's look at an example.

import pandas as pd

# create a sample DataFrame with missing values
data = {'A': [1, 2, None, None, 5, None],
        'B': [None, 10, 11, None, 14, 15]}
df = pd.DataFrame(data)

# fill missing values forward with a limit of 1 df_filled_forward = df.fillna(method='ffill', limit=1)
print("DataFrame filled forward:") print(df_filled_forward)
# fill missing values backward with a limit of 1 df_filled_backward = df.fillna(method='bfill', limit=1)
print("\nDataFrame filled backward:") print(df_filled_backward)

Output

DataFrame filled forward:
     A     B
0  1.0   NaN
1  2.0  10.0
2  2.0  11.0
3  NaN  11.0
4  5.0  14.0
5  5.0  15.0

DataFrame filled backward:
      A     B
0  1.0  10.0
1  2.0  10.0
2  NaN  11.0
3  5.0  14.0
4  5.0  14.0
5  NaN  15.0

In this example, the limit parameter is set to 1 for both forward and backward filling.

As a result, only a maximum of one consecutive missing value will be filled in either direction from any given position.

This allows us to control how many missing values are replaced in a consecutive sequence while leaving the rest of the missing values unchanged.