The var() method in Pandas computes the variance of a dataset. Variance is a measure of the dispersion of a set of data points around their mean value.
Example
import pandas as pd
# sample DataFrame
data = {'A': [1, 20, 333],
'B': [4, 5, 7]}
df = pd.DataFrame(data)
# calculate the variance
variance = df.var()
print(variance)
'''
Output
A 34759.000000
B 2.333333
dtype: float64
'''
var() Syntax
The syntax of the var() method in Pandas is:
df.var(axis=0, skipna=True, ddof=1, numeric_only=None, **kwargs)
var() Arguments
The var() method includes the following arguments:
axis(optional): specifies the axis to compute the variance alongskipna(optional): whether to exclude null values when computing the resultddof(optional): Delta Degrees of Freedom (The divisor used in calculations isN - ddof, whereNrepresents the number of elements)numeric_only(optional): whether to include only float, int, boolean columns**kwargs: additional keyword arguments.
var() Return Value
The var() method returns:
- a scalar for a Series
- a Series or DataFrame (depending on the input) for a DataFrame
Example 1: Simple Variance Calculation
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance
variance = df.var()
print(variance)
Output
A 10.0 B 8.2 dtype: float64
In this example, we calculated the variance for each column. The output is a Series containing variance values for each column of the df DataFrame.
Example 2: Variance with Different ddof
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance with ddof=0
variance = df.var(ddof=0)
print(variance)
Output
A 8.00 B 6.56 dtype: float64
In this example, we calculated the variance with different Delta Degrees of Freedom (ddof=0).
In statistical calculations, ddof is a parameter that affects the divisor used in the calculation. For example,
- when
ddof=0, the divisor isN - when
ddof=1, the divisor isN−1
where, N is the number of data points.
Example 3: Variance Excluding Null Values for Numeric Columns Only
import pandas as pd
data = {'A': [2, None, 6, 8, 10],
'B': [1, 3, 5, None, 8],
'C': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# calculate the variance excluding NA values
# for numeric columns only
variance = df.var(skipna=True, numeric_only=True)
print(variance)
Output
A 11.666667 B 8.916667 dtype: float64
Here, we calculated the variance while excluding null values using the skipna=True argument.
We also excluded the non-numeric column C using numeric_only=True.
Example 4: Variance of Rows
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance with axis=1
variance = df.var(axis=1)
print(variance)
Output
0 0.5 1 0.5 2 0.5 3 0.5 4 2.0 dtype: float64
In this example, we calculated variance data along the rows using the axis=1 argument.