The std()
method in Pandas is used to compute the standard deviation of a given set of numeric values within a Series or DataFrame columns.
The standard deviation is a measure of the amount of variation or dispersion in a set of values.
Example
import pandas as pd
# sample DataFrame
data = {'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]}
df = pd.DataFrame(data)
# calculate the standard deviation
std_dev = df.std()
print(std_dev)
'''
Output
A 1.290994
B 1.290994
dtype: float64
'''
std() Syntax
The syntax of the std()
method in Pandas is:
df.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
std() Arguments
The std()
method in Pandas has the following arguments:
axis
(optional): the axis to operate onskipna
(optional): exclude NA/null valuesddof
(optional): Delta Degrees of Freedom. The divisor used in calculations isN - ddof
, whereN
represents the number of elements; default is 1numeric_only
(optional): include only float, int, boolean data
std() Return Value
The std()
method returns:
- A scalar, if applied to a single column of data.
- A Series, if applied to multiple columns.
Example 1: Standard Deviation on a Single Column
import pandas as pd
data = {'A': [1, 3, 5, 7],
'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)
# calculate the standard deviation of one column
std_dev_column_a = df['A'].std()
print(std_dev_column_a)
Output
2.581988897471611
In this example, we calculated the standard deviation of the values in column A
.
Example: Standard Deviation with Non-default ddof
import pandas as pd
data = {'A': [1, 3, 5, 7],
'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)
# calculate the standard deviation with ddof=0
std_dev_ddof_0 = df.std(ddof=0)
print(std_dev_ddof_0)
Output
A 2.236068 B 2.236068 dtype: float64
In this example, we set the ddof (Delta Degrees of Freedom) to 0 to change the divisor during the calculation from N - 1
to N
, where N
is the number of elements.
Example 3: Standard Deviation on DataFrame with NA Values
import pandas as pd
data = {'A': [1, 3, 5, None],
'B': [2, 4, None, 8]}
df = pd.DataFrame(data)
# calculate the standard deviation while skipping NA values
std_dev_skipna = df.std(skipna=True)
print(std_dev_skipna)
Output
A 2.00000 B 3.05505 dtype: float64
Here, by setting skipna=True
, the function skips over any NaN
values present in the data when calculating the standard deviation.
Example 4: Standard Deviation of Rows
import pandas as pd
data = {'A': [1, 3, 5, 7],
'B': [2, 4, 6, 8]}
df = pd.DataFrame(data)
# calculate the standard deviation with axis=1
std_dev_axis1 = df.std(axis=1)
print(std_dev_axis1)
Output
0 0.707107 1 0.707107 2 0.707107 3 0.707107 dtype: float64
In this example, we calculated the standard deviation of rows using the axis=1
argument.