Pandas dropna()

The dropna() method in Pandas is used to drop missing (NaN) values from a DataFrame.

Example

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4, 5],
        'B': [None, 2, 3, None, 5]}

df = pd.DataFrame(data)

# drop missing values df_dropped = df.dropna()
print(df_dropped) ''' Output A B 1 2.0 2.0 4 5.0 5.0 '''

dropna() Syntax

The syntax of the dropna() method in Pandas is:

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

dropna() Arguments

The dropna() method takes following arguments:

  • axis (optional) - specifies whether to drop rows or columns
  • how (optional) - determines the condition for dropping
  • thresh (optional) - specifies a minimum number of non-null values required to keep the row/column
  • subset (optional) - allows us to specify a subset of columns to consider when dropping rows with missing values
  • inplace (optional) - If True, modifies the original DataFrame in place; if False, returns a new DataFrame.

dropna() Return Value

The dropna() method returns a new DataFrame with missing values dropped according to the specified parameters.


Example1: Drop Missing Values

import pandas as pd

# create a DataFrame with missing values
data = {'A': [10, 20, None, 25, 55],
        'B': [None, 2, 13, None, 65]}

df = pd.DataFrame(data)

# drop missing values df_dropped = df.dropna()
print(df_dropped)

Output

      A     B
1  20.0   2.0
4  55.0  65.0

In the above example, we have used the dropna() method to remove rows containing missing values from the df DataFrame and store the result in the df_dropped DataFrame.

The df_dropped contains only the rows from df that don't have any missing values.


Example 2: Use axis Argument to Drop Rows and Columns Containing Missing Values

import pandas as pd

# create a DataFrame with missing values
data = {'A': [1, 2, None, 4],
        'B': [None, 2, 3, 4],
        'C': [1, 2, 3, 4]}
df = pd.DataFrame(data)

# original DataFrame remains unchanged
print("Original DataFrame:")
print(df)
print()

# drop rows with any missing values and create a new DataFrame df_rows_dropped = df.dropna(axis=0, inplace=False)
print("DataFrame with rows dropped:") print(df_rows_dropped) print()
# drop columns with any missing values and create a new DataFrame df_columns_dropped = df.dropna(axis=1, inplace=False)
print("DataFrame with columns dropped:") print(df_columns_dropped)

Output

Original DataFrame:
    A     B    C
0  1.0   NaN   1
1  2.0    2.0  2
2  NaN  3.0    3
3  4.0    4.0  4

DataFrame with rows dropped:
    A    B   C
1  2.0  2.0  2
3  4.0  4.0  4

DataFrame with columns dropped:
   C
0  1
1  2
2  3
3  4

Here,

  • Rows with any missing values are dropped using axis=0, and the result is stored in df_rows_dropped.
  • Columns with any missing values are dropped using axis=1, and the result is stored in df_columns_dropped.

Also, the use of inplace=False argument ensures that the original DataFrame remains unchanged and the results are stored in new DataFrames.


Example 3: Determine Condition for Dropping

import pandas as pd

data = {'A': [1, 2, None, 4],
        'B': [None, 2, None, 4]}

df = pd.DataFrame(data)

# drop rows with any missing values result_any = df.dropna(how='any')
print("Using how='any':") print(result_any) print()
# drop rows with all missing values result_all = df.dropna(how='all')
print("\nUsing how='all':") print(result_all)

Output

Using how='any':
     A    B
1  2.0  2.0
3  4.0  4.0

Using how='all':
     A    B
0  1.0  NaN
1  2.0  2.0
3  4.0  4.0

Here, when

  • how='any' (default) - rows containing any missing values are dropped, leaving only the rows where both columns 'A' and 'B' have non-null values.
  • how='all' - rows containing all missing values are removed, and only rows with at least one non-null value in any column are kept.

Example 4: Drop Rows Based on Threshold

import pandas as pd

# creating a DataFrame with some NaN values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

# use dropna() with thresh parameter # Keeping only those rows which have at least 3 non-NaN values cleaned_df_rows = df.dropna(thresh=3)
print("\nDataFrame after dropping rows with less than 3 non-NaN values:") print(cleaned_df_rows)

Output

Original DataFrame:
    A    B    C   D
0  1.0  5.0   9  13.0
1  2.0  NaN  10  14.0
2  NaN  NaN  11  15.0
3  4.0  8.0  12   NaN

DataFrame after dropping rows with less than 3 non-NaN values:
    A    B    C   D
0  1.0  5.0   9  13.0
1  2.0  NaN  10  14.0
3  4.0  8.0  12   NaN

In the above example, we have used the dropna(thresh=3) method to remove rows which do not have at least 3 non-NaN values.

Hence, row at index 2 is removed.


Example 5: Selectively Remove Rows Containing Missing Data

import pandas as pd

# creating a DataFrame with some NaN values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, 11, 12],
    'D': [13, 14, 15, None]
}
df = pd.DataFrame(data)

# use dropna() with subset parameter # drop the rows where NaN appears in column 'B' or 'D' cleaned_df = df.dropna(subset=['B', 'D'])
print("Original DataFrame:") print(df) print("\nDataFrame after dropping rows with NaN in columns 'B' or 'D':") print(cleaned_df)

Output

DataFrame dropped forward:
Original DataFrame:
    A     B     C    D
0  1.0   5.0   9   13.0
1  2.0   NaN    10  14.0
2  NaN   NaN    11  15.0
3  4.0   8.0    12   NaN

DataFrame after dropping rows with NaN in columns 'B' or 'D':
    A    B   C   D
0  1.0  5.0  9  13.0

Here, when we apply dropna(subset=['B', 'D']), it checks only columns B and D for missing values.

If any missing value is found in these columns, the corresponding row is removed.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community