The var()
method in Pandas computes the variance of a dataset. Variance is a measure of the dispersion of a set of data points around their mean value.
Example
import pandas as pd
# sample DataFrame
data = {'A': [1, 20, 333],
'B': [4, 5, 7]}
df = pd.DataFrame(data)
# calculate the variance
variance = df.var()
print(variance)
'''
Output
A 34759.000000
B 2.333333
dtype: float64
'''
var() Syntax
The syntax of the var()
method in Pandas is:
df.var(axis=0, skipna=True, ddof=1, numeric_only=None, **kwargs)
var() Arguments
The var()
method includes the following arguments:
axis
(optional): specifies the axis to compute the variance alongskipna
(optional): whether to exclude null values when computing the resultddof
(optional): Delta Degrees of Freedom (The divisor used in calculations isN - ddof
, whereN
represents the number of elements)numeric_only
(optional): whether to include only float, int, boolean columns**kwargs
: additional keyword arguments.
var() Return Value
The var()
method returns:
- a scalar for a Series
- a Series or DataFrame (depending on the input) for a DataFrame
Example 1: Simple Variance Calculation
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance
variance = df.var()
print(variance)
Output
A 10.0 B 8.2 dtype: float64
In this example, we calculated the variance for each column. The output is a Series containing variance values for each column of the df DataFrame.
Example 2: Variance with Different ddof
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance with ddof=0
variance = df.var(ddof=0)
print(variance)
Output
A 8.00 B 6.56 dtype: float64
In this example, we calculated the variance with different Delta Degrees of Freedom (ddof=0
).
In statistical calculations, ddof
is a parameter that affects the divisor used in the calculation. For example,
- when
ddof=0
, the divisor isN
- when
ddof=1
, the divisor isN−1
where, N
is the number of data points.
Example 3: Variance Excluding Null Values for Numeric Columns Only
import pandas as pd
data = {'A': [2, None, 6, 8, 10],
'B': [1, 3, 5, None, 8],
'C': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
# calculate the variance excluding NA values
# for numeric columns only
variance = df.var(skipna=True, numeric_only=True)
print(variance)
Output
A 11.666667 B 8.916667 dtype: float64
Here, we calculated the variance while excluding null values using the skipna=True
argument.
We also excluded the non-numeric column C
using numeric_only=True
.
Example 4: Variance of Rows
import pandas as pd
data = {'A': [2, 4, 6, 8, 10],
'B': [1, 3, 5, 7, 8]}
df = pd.DataFrame(data)
# calculate the variance with axis=1
variance = df.var(axis=1)
print(variance)
Output
0 0.5 1 0.5 2 0.5 3 0.5 4 2.0 dtype: float64
In this example, we calculated variance data along the rows using the axis=1
argument.