The Linear Correlation Coefficient

Concept sheet | Mathematics

One of the uses of a scatter plot is to predict future results. To quantify the accuracy of these estimates, we calculate the linear correlation coefficient.

Definition

​The linear correlation coefficient, generally denoted by |r|, quantifies the strength of the linear relationship between the two variables of a distribution. It can be determined by estimating from a graph or by using a mathematical formula.

The correlation coefficient will always have a value in the interval [|-1|, |1|].

The linear correlation coefficient of a distribution gives an idea of how the scatter plot looks, and vice versa. First off, the sign of the coefficient, positive or negative, indicates the direction of the slope of the regression line. To understand the correlation coefficient, here are three scatter plots that illustrate the extreme values, namely, |-1|, |0| and |1|.

This graph shows a scatter plot whose linear correlation is perfect and negative.
This graph shows a scatter plot with zero linear correlation.
This graph shows a scatter plot whose linear correlation is perfect and positive.

In other words, the closer the value of the linear correlation coefficient is to |1| or |-1|, the stronger the linear relationship between the two variables.

Conversely, the closer the value is to |0|, the weaker the linear relationship between the two variables.

The Linear Correlation Coefficient - Explanation

The Linear Correlation Coefficient - Explanation

Qualitative Assessment According to a Scatter Plot

To calculate the values of |r|, use a graph or calculate the value with a formula. On the other hand, to simply compare the linearity of a graph to another, just take a look at the scatter plot and the alignment of the points.

Examples
This graph shows a scatter plot with a strong and positive linear correlation.
This graph shows a scatter plot with a moderate and positive linear correlation.

Looking closely at these graphs, the points are more dispersed in the second scatter plot. Thus, the linear correlation coefficient is lower in this plot than in the first.

The difference between correlation coefficients can be seen clearly in the following scatter plots.

Negative Linear Correlations

This graph shows a scatter plot with a strong and negative linear correlation.
This graph shows a scatter plot with a negative and moderate linear correlation.
This graph shows a scatter plot with a negative and weak linear correlation.

Positive Linear Correlations

This graph shows a scatter plot with a strong and positive linear correlation.
This graph shows a scatter plot with a moderate and positive linear correlation.
This graph shows a scatter plot with a weak and positive linear correlation.

Depending on the value of the correlation coefficient, we see that the points of scatter plot become increasingly dispersed. On the other hand, it is always possible to find the direction of the scatter plot (positive or negative). When the points are so widely dispersed that it becomes impossible to determine their direction, the linear correlation coefficient is zero.

Qualitative Assessment Using a Double Entry (Two-Variable) Table

To simplify the visual representation of the collected data, the data is sometimes grouped into classes and placed in a double entry (two-variable) table. 

Important!

To go from a scatter plot to a double entry (two-variable) table, segment the scatter plot in order to clearly define each of the classes.

So, this scatter plot...

This image represents a scatter plot whose correlation is positive and strong.

... becomes the following double entry (two-variable) table.

This image shows a double-entry table with a strong and positive correlation, since the data is clustered near the diagonal.

Once this table is obtained, it is possible to predict the correlation of the data.

Example

According to the previous double entry (two-variable) table, the correlation is strong and positive.

It is positive, because the more the data increases in |x|, the more the data increases in |y|.

It is strong because the data is grouped near the diagonal of the double-entry table.

This image shows a double-entry (two-variable) table with a strong and positive correlation, because the data is clustered near the diagonal.

Note: if the data clusters around the other diagonal, i.e., the diagonal that starts at the bottom left and ends at the top right, then the correlation will be negative. 

Calculating the Linear Correlation Coefficient

By determining more precisely the value of the linear correlation coefficient, it is easier to quantify the correlation between two variables.

Formula

||r\approx\pm\left(1-\dfrac{w}{L}\right)||where

|L\!:| the length of the rectangle outlining the scatter plot
|w\!:| the width of the rectangle outlining the scatter plot

As for the sign of |r|, it is determined according to the direction of the scatter plot.

In general, this formula makes it possible to find a value that is fairly representative of the linear correlation coefficient. On the other hand, there are more sophisticated tools which accurately calculate this value.

Generally, the following values ​​will be used to qualify the linear correlation.

​Value of |r|

Strength of the linear relationship

Close to |0|

None

Near |\pm\, 0{.}50|

Weak

Near |\pm\, 0{.}75|

Moderate

Near |\pm\, 0{.}87|

Strong

Near |\pm\, 1|

Very strong

|\pm\, 1|

Perfect

Calculating the Linear Correlation Coefficient From a Graph

To associate a numerical value with the correlation coefficient, follow these 3 steps.

Rule
  1. Draw the scatter plot.

  2. Draw a rectangle and measure its length and width.

  3. Calculate the correlation coefficient using the formula.

Example
  1. Draw the scatter plot

    By placing each of the points in a Cartesian plane, the following scatter plot is obtained.

This graph shows a scatter plot with a moderate and positive linear correlation.
  1. Draw a rectangle and measure its length and width

    The rectangle must  contain each point and be as small as possible. When tracing the rectangle, use a set square  and measure the segments.

    Since there are no outliers or abnormal data, the following rectangle is obtained.

The graph shows a scatter plot with a positive correlation, outlined by a rectangle.
  1. Calculate the correlation coefficient using the formula

|r \approx \pm \left(1 - \dfrac{2.4}{6.2} \right)|
|r \approx \pm 0{.}61|
|r \approx 0{.}61|, since the scatter plot is positive.

Calculating the Linear Correlation Coefficient - Example

Calculating the Linear Correlation Coefficient - Example

Moments in the video:

  • 00:00-Presentation of Problem
  • 01:25-Graph the Scatter Plot
  • 02:37-Draw a Rectangle and Measure its Dimensions
  • 04:07-Calculate the Linear Correlation Coefficient

Calculating the Linear Correlation Coefficient Using Technological Tools

With graphing calculators or software such as spreadsheets, a much more precise correlation coefficient can be obtained. Just enter all the data in a table of values, select the correct function, and let the software do the calculations.

Find out more!

The formula for precisely calculating the linear correlation coefficient |r|, is the following. ||r=\dfrac{\sum\left(x-\overline{x}\right)\left(y-\overline{y}\right)}{\sqrt{\sum\left(x-\overline{x}\right)^{2}}\sqrt{\sum\left(y-\overline{y}\right)^{2}}}||

where

|x\!:| a value in the first distribution
|\overline{x}\!:| the mean of the first distribution
|y\!:| a value in the second distribution
|\overline{y}\!:| the mean of the second distribution
|\sum\!:| symbol that signifies the sum of...