Pandas mean()

The mean() method in Pandas is used to compute the arithmetic mean of a set of numbers.

Example

import pandas as pd

# sample DataFrame
data = {
    'Math': [85, 90, 78],
    'Physics': [92, 88, 84]
}

df = pd.DataFrame(data)

# compute the mean for each subject (column) mean_scores = df.mean()
print(mean_scores) ''' Output Math 84.333333 Physics 88.000000 dtype: float64 '''

mean() Syntax

The syntax of the mean() method in Pandas is:

df.mean(axis=0, skipna=True, level=None, numeric_only=None)

mean() Arguments

The mean() method takes following arguments:

  • axis (optional) - specifies axis along which the mean will be computed
  • skipna (optional) - determines whether to include or exclude missing values
  • level (optional) - compute the mean at a particular level
  • numeric_only (optional) - specifies whether to include only numeric columns in the computation or not.

mean() Return Value

The mean() method returns a series object that represents the average value for each column or each row.


Example 1: Compute mean() Along Different Axis

import pandas as pd

# sample DataFrame
data = {
    'Math': [85, 90, 78],
    'Physics': [92, 88, 84]
}

df = pd.DataFrame(data, index=['Alice', 'Bob', 'Charlie'])

print("DataFrame:")
print(df)
print()

# compute the mean for each subject (column) mean_scores = df.mean() print("\nMean scores for each subject:")
print(mean_scores) print()
# compute the mean score for each student (row) mean_scores_by_student = df.mean(axis=1)
print("\nMean scores for each student:") print(mean_scores_by_student)

Output

DataFrame:
         Math  Physics
Alice      85       92
Bob        90       88
Charlie    78       84

Mean scores for each subject:
Math       84.333333
Physics    88.000000
dtype: float64

Mean scores for each student:
Alice      88.5
Bob        89.0
Charlie    81.0
dtype: float64

In the above example,

  1. The mean() method without any arguments computes the mean for each column (i.e., the average score for each subject).
  2. The mean(axis=1) computes the mean across each row (i.e., the average score for each student).

Note: We can also pass axis=0 inside mean() to compute the mean of each column.


Example 2: Calculate Mean of a Specific Column

import pandas as pd

# sample DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30, 40],
    'B': [5, 15, 25, 35],
    'C': [1, 2, 3, 4]
})

# calculate the mean of column 'A' mean_A = df['A'].mean()
print(f"Mean of column 'A': {mean_A}")

Output

Mean of column 'A': 25.0

In this example, we've created the df DataFrame with three columns: A, B, and C.

Then, we used df['A'].mean() to compute the average of the values in column A, which resulted in a mean of 25.0.


Example 3: Use of numeric_only Argument in mean()

import pandas as pd

# sample DataFrame with a mix of numeric and non-numeric columns
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000],
    'Department': ['HR', 'IT', 'Finance', 'Admin']
})

# compute mean with numeric_only set to True mean_values_numeric = df.mean(numeric_only=True) print("Mean with numeric_only=True:")
print(mean_values_numeric) print()
# try to compute mean with numeric_only set to False try: mean_values_all = df.mean(numeric_only=False)
print("Mean with numeric_only=False:") print(mean_values_all) except TypeError as e: print(f"Error: {e}")

Output

Mean with numeric_only=True:
Age          32.5
Salary    65000.0
dtype: float64

ERROR!
Error: Could not convert ['AliceBobCharlieDavid' 'HRITFinanceAdmin'] to numeric

Here,

  • When numeric_only=True, mean() only computes the mean for the numeric columns, ignoring the non-numeric ones.
  • When numeric_only=False, it attempts to compute the mean for all columns, including non-numeric ones. This raises a TypeError because it's not possible to compute the mean of non-numeric data in this context.

Note: To learn more about exception handling, please visit Python Exception Handling.


Example 4: Effect of the skipna Argument on Calculating Averages

import pandas as pd

# sample DataFrame with missing values
df = pd.DataFrame({
    'A': [1, 2, 3, None],
    'B': [4, 5, None, 8],
    'C': [7, 10, 13, 19],
    'D': [None, 10, 11, 12]
})

# compute mean with skipna set to True (default behavior) mean_values_skipna_true = df.mean(skipna=True)
print("Mean with skipna=True:") print(mean_values_skipna_true) print()
# compute mean with skipna set to False mean_values_skipna_false = df.mean(skipna=False)
print("Mean with skipna=False:") print(mean_values_skipna_false)

Output

Mean with skipna=True:
A     2.000000
B     5.666667
C    12.250000
D    11.000000
dtype: float64

Mean with skipna=False:
A      NaN
B      NaN
C    12.25
D      NaN
dtype: float64

In this example,

  • With skipna=True, columns A and B averages are computed without considering the missing values, while column C has no None and column D average is computed considering the three valid numbers.
  • With skipna=False, columns A, B, and D contain None, so their means are NaN, while column C has no None, so its average is calculated.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community