Pandas corr()

The corr() method in Pandas is used to compute the pairwise correlation coefficients of columns.

A correlation coefficient is a statistical measure that describes the extent to which two variables are related to each other.

Example

import pandas as pd

# sample DataFrame with numeric data
data = {'A': [3, 2, 1],
        'B': [4, 6, 5],
        'C': [7, 18, 91]}

df = pd.DataFrame(data)

# compute correlation matrix
correlation_matrix = df.corr()

print(correlation_matrix)

'''
Output

          A        B         C
A  1.000000 -0.50000 -0.919953
B -0.500000  1.00000  0.120470
C -0.919953  0.12047  1.000000
'''

corr() Syntax

The syntax of the corr() method in Pandas is:

df.corr(method='pearson', min_periods=1, numeric_only=False)

corr() Arguments

The corr() method takes the following arguments:

  • method (optional): method to calculate correlation
  • min_periods (optional): minimum number of observations required per pair of columns to have a valid result
  • numeric_only (optional): whether to include only numeric data types

corr() Return Value

The corr() method returns a DataFrame containing correlation coefficients between columns.


Example 1: Default Pearson Correlation Coefficient

import pandas as pd

# sample DataFrame with numeric data
data = {'A': [3, 2, 1],
        'B': [4, 6, 5],
        'C': [7, 18, 91]}

df = pd.DataFrame(data)

# compute correlation matrix
correlation_matrix = df.corr()

print(correlation_matrix)

Output

          A        B         C
A  1.000000 -0.50000 -0.919953
B -0.500000  1.00000  0.120470
C -0.919953  0.12047  1.000000

In this example, we demonstrated the default use of the corr() method for calculating the Pearson correlation coefficient for each pair of columns.


Example 2: Kendall Tau Correlation Coefficient

import pandas as pd

# sample DataFrame with numeric data
data = {'A': [3, 2, 1],
        'B': [4, 6, 5],
        'C': [7, 18, 91]}

df = pd.DataFrame(data)

# compute correlation matrix
correlation_matrix = df.corr(method='kendall')

print(correlation_matrix)

Output

          A         B         C
A  1.000000 -0.333333 -1.000000
B -0.333333  1.000000  0.333333
C -1.000000  0.333333  1.000000

In this example, we calculated the Kendall Tau correlation coefficient for each pair of columns using method='kendall'.

To learn about correlation and different correlation methods in detail, please visit Pandas Correlation.


Example 3: Specify Minimum Number of Observations

import pandas as pd

# sample DataFrame with numeric data
data = {'A': [1, 2, 3, None, None],
        'B': [4, 7, None, None, None],
        'C': [7, 9, 8, None, None]}

df = pd.DataFrame(data)

# specify minimum number of observations required to perform computation
correlation_matrix = df.corr(min_periods=3)

print(correlation_matrix)

Output

     A   B    C
A  1.0 NaN  0.5
B  NaN NaN  NaN
C  0.5 NaN  1.0

In this example, the DataFrame df contains None values representing missing data. By setting min_periods=3, we specified that at least three non-null observations are required to compute a correlation coefficient for each pair of columns.

Here, since the B column contains only two non-null values, the correlation coefficients involving B are not calculated.


Example 4: Calculate Correlation for Numeric Data Only

import pandas as pd

# sample DataFrame
data = {'A': [3, 2, 'A', 1],
        'B': [4, 6, 5, 7],
        'C': [7, 18.5, 91, 55]}

df = pd.DataFrame(data)

# compute correlation matrix
correlation_matrix = df.corr(numeric_only=True)

print(correlation_matrix)

Output

         B        C
B  1.00000  0.24257
C  0.24257  1.00000

In this example, we used the numeric_only=True argument to skip the columns with non-numeric data. As a result, column A is excluded from the computation.

This argument is useful to avoid ValueError due to the presence of non-numeric data in the DataFrame.

Your builder path starts here. Builders don't just know how to code, they create solutions that matter.

Escape tutorial hell and ship real projects.

Try Programiz PRO
  • Real-World Projects
  • On-Demand Learning
  • AI Mentor
  • Builder Community