The cumsum()
method in Pandas is used to provide the cumulative sum of elements along a particular axis.
Example
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'A': [10, 20, 30, 40],
'B': [1, 2, 3, 4]
})
# compute cumulative sum along rows
df_cumsum = df.cumsum()
print(df_cumsum)
'''
Output
A B
0 10 1
1 30 3
2 60 6
3 100 10
'''
cumsum() Syntax
The syntax of the cumsum()
method in Pandas is:
cumsum(axis=None, skipna=True, *args, **kwargs)
cumsum() Arguments
The cumsum()
method takes following arguments:
axis
(optional) - specifies the axis along which the cumulative sum is computedskipna
(optional) - specifies whether to exclude null values or not*args
and*kwargs
(optional) - additional arguments and keyword arguments that can be passed to the function.
cumsum() Return Value
The cumsum()
method returns a cumulative sum of elements along the given axis.
Example 1: Get Cumulative Sum Using cumsum()
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'Sales': [150, 200, 250, 300],
'Expenses': [50, 60, 70, 80]
})
print("Original DataFrame:")
print(df)
print()
# compute the cumulative sum across rows (default behavior)
df_cumsum = df.cumsum()
print("DataFrame after cumsum:")
print(df_cumsum)
Output
Original DataFrame: Sales Expenses 0 150 50 1 200 60 2 250 70 3 300 80 DataFrame after cumsum: Sales Expenses 0 150 50 1 350 110 2 600 180 3 900 260
In the above example, we have created the df dataframe that represents sales and expenses over four time periods.
The cumsum()
method on this df DataFrame computes the cumulative sum over both columns: Sales
and Expenses
.
Note: The cumsum()
method is useful when we want to see the accumulated values over time.
Example 2: Compute Cumulative Sum Across Columns
import pandas as pd
# create a DataFrame
df = pd.DataFrame({
'Price': [10, 20, 30],
'Tax': [2, 4, 6],
'Discount': [1, 2, 3]
})
print("Original DataFrame:")
print(df)
print()
# compute the cumulative sum across columns
df_cumsum_col = df.cumsum(axis=1)
print("\nDataFrame after cumulative sum over columns:")
print(df_cumsum_col)
Output
Original DataFrame: Price Tax Discount 0 10 2 1 1 20 4 2 2 30 6 3 DataFrame after cumulative sum over columns: Price Tax Discount 0 10 12 13 1 20 24 26 2 30 36 39
Here, we have used df.cumsum(axis=1)
to compute the cumulative sum over the columns.
This means for each row, we're adding the values from left to right (across the columns).
Example 3: Handle Missing Data with skipna
In pandas, the skipna
parameter in cumsum()
determines whether to exclude missing values when performing the cumulative sum operation.
Let's look at an example.
import pandas as pd
# create a DataFrame with missing values
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4]
})
print("Original DataFrame:")
print(df)
# cumulative sum with skipna=True (default)
cumsum_skipna_true = df.cumsum(skipna=True)
print("\nCumulative sum with skipna=True:")
print(cumsum_skipna_true)
# cumulative sum with skipna=False
cumsum_skipna_false = df.cumsum(skipna=False)
print("\nCumulative sum with skipna=False:")
print(cumsum_skipna_false)
Output
Original DataFrame: A B 0 1.0 NaN 1 2.0 2.0 2 NaN 3.0 3 4.0 4.0 Cumulative sum with skipna=True: A B 0 1.0 NaN 1 3.0 2.0 2 NaN 5.0 3 7.0 9.0 Cumulative sum with skipna=False: A B 0 1.0 NaN 1 3.0 NaN 2 NaN NaN 3 NaN NaN
Here, when
skipna=True
(default) -cumsum()
skips the missing values during its computation, resulting in accumulated values wherever possibleskipna=False
-cumsum()
sets all subsequent values in the accumulation toNaN
for that column.