The sum()
method in Pandas is used to calculate the sum of a DataFrame along a specific axis.
Example
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
# calculate the sum of each column
column_sum = df.sum()
print(column_sum)
'''
Output
A 6
B 15
dtype: int64
'''
sum() Syntax
The syntax of the sum()
method in Pandas is:
df.sum(axis=None, skipna=True, numeric_only=None, min_count=0)
sum() Arguments
The sum()
method takes following arguments:
axis
(optional) - specifies axis along which the sum will be computedskipna
(optional) - determines whether to include or exclude missing valuesnumeric_only
(optional) - specifies whether to include only numeric columns in the computation or notmin_count
(optional) - required number of valid values to perform the operation
sum() Return Value
The sum()
method returns the sum of the values along the specified axis.
Example 1: Compute Sum Along Different Axis
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# calculate the sum of each column
column_sum = df.sum()
# calculate the sum of each row
row_sum = df.sum(axis=1)
print("Sum of each column:")
print(column_sum)
print("\nSum of each row:")
print(row_sum)
Output
Sum of each column: A 6 B 15 C 24 dtype: int64 Sum of each row: 0 12 1 15 2 18 dtype: int64
In the above example,
column_sum = df.sum()
- calculates the sum of values in each column of the df DataFrame. Defaultaxis=0
means it operates column-wise.row_sum = df.sum(axis=1)
- calculates the sum of values in each row of df by settingaxis=1
, meaning it operates row-wise.
Note: We can also pass axis=0
inside sum()
to compute the sum of each column.
Example 2: Calculate Sum of a Specific Column
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# calculate the sum of column 'A'
sum_A = df['A'].sum()
# calculate the sum of column 'B'
sum_B = df['B'].sum()
print("sum of column A:", sum_A)
print("sum of column B:", sum_B)
Output
sum of column A: 6 sum of column B: 15
In this example, df['A']
selects column A
of the df DataFrame, and sum()
calculates the sum of its values. The same is done for column B
.
Example 3: Use of numeric_only Argument in sum()
import pandas as pd
# create a DataFrame with both numeric and non-numeric columns
data = {
'A': [10, 20, 30, 40],
'B': [5, 3, 2, 1],
'C': ['a', 'b', 'c', 'd'],
'D': [1.5, 2.5, 3.5, 4.5]
}
df = pd.DataFrame(data)
# sum only the numeric columns
summed = df.sum(numeric_only=True)
print(summed)
Output
A 100.0 B 11.0 D 12.0 dtype: float64
Here, when using numeric_only=True
, the sum is calculated only for columns A
, B
, and D
and column C
is excluded because it contains string data.
If we hadn't specified any value for numeric_only
as
summed_all = df.sum()
The output would be:
A 100 B 11 C abcd D 12.0 dtype: object
Example 4: Effect of skipna Argument on Calculating sum
import pandas as pd
# create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1, None, 3],
'B': [4, 5, None],
'C': [7, 8, 9]
})
# calculate the sum of each column, ignoring NaN values
sum_skipna_true = df.sum()
# calculate the sum of each column, including NaN values
sum_skipna_false = df.sum(skipna=False)
print("sum with skipna=True (default):")
print(sum_skipna_true)
print("\nsum with skipna=False:")
print(sum_skipna_false)
Output
sum with skipna=True (default): A 4.0 B 9.0 C 24.0 dtype: float64 sum with skipna=False: A NaN B NaN C 24.0 dtype: float64
In this example,
- With
skipna=True
- sums of columnsA
,B
, andC
are 4.0, 9.0, and 24.0, respectively, ignoringNone
values. - With
skipna=False
- sums of columnsA
andB
areNaN
due toNone
values, whileC
is 24.0.
Example 5: Calculate sums With Minimum Value Counts
import pandas as pd
# create a DataFrame with some missing values
df = pd.DataFrame({
'A': [1, None, 3],
'B': [4, 5, None],
'C': [None, None, 9]
})
# calculate the sum of each column with min_count set to 1
sum_min_count_1 = df.sum(min_count=1)
# calculate the sum of each column with min_count set to 2
sum_min_count_2 = df.sum(min_count=2)
# calculate the sum of each column with min_count set to 3
sum_min_count_3 = df.sum(min_count=3)
print("sum with min_count=1:\n", sum_min_count_1)
print("\nsum with min_count=2:\n", sum_min_count_2)
print("\nsum with min_count=3:\n", sum_min_count_3)
Output
sum with min_count=1: A 4.0 B 9.0 C 9.0 dtype: float64 sum with min_count=2: A 4.0 B 9.0 C NaN dtype: float64 sum with min_count=3: A NaN B NaN C NaN dtype: float64
Here,
- When
min_count=1
, the sum will be calculated if there is at least one non-missing value in the column. Here, all columns meet this criterion. - When
min_count=2
, the sum will be calculated if there are at least two non-missing values in the column. - When
min_count=3
, the sum will be calculated if there are at least three non-NA values in the column. None of the columns meets this criterion, so all results should beNaN
.