Need up to 30 seconds to load.

in this video we're going to talk about

how to calculate the correlation

coefficient between two variables

but before we do that let's talk about

what the correlation coefficient tells

us

so let's say we have a straight line

and we have points on the line

the correlation coefficient

will be positive one

for this situation because all the

points are on a line and the line has a

positive slope

let's say this is x and this is y

it's a positive correlation because as x

increases y increases

there is a direct relationship between x

and y

now here's another scenario

here we have points

on the line but notice the line is going

down

and so the correlation coefficient will

be equal to negative one

as x increases y decreases

and so in that case we have an inverse

relationship

now in this example

the points are not necessarily on the

line

but they're close to it

and so these points they have somewhat

of a linear relationship

but not exactly one

so in this case r is going to be

somewhere between zero and one it's

positive because

this line is increasing it has a

positive slope

but it's not exactly one

if we were to put a number to it it

might be

0.8 or something

whereas let's say if you have

a similar line but let's say the points

are

more scattered

about that line

the r value will be less it might be 0.7

or 0.6

it could be completely different but

it's somewhere between zero and one

but the the point is this though

the closer that the points are

next to the line

r is going to be closer to one

these points they're further away from

the line so

r is going to be closer to zero than one

relative to this number

now sometimes

there won't be any correlation let's say

if

you have

just random points everywhere

in this case

r could be very close to zero for a

situation like that when there's no

apparent correlation

so the correlation coefficient

really tells us the strength

of

the linear relation between two

variables

if these two variables have no linear

relationship r is going to be close to

zero if there is a strong linear

relationship r is going to be close to

either positive one or negative one

depending on the slope of the line

now let's take a minute and calculate

the correlation coefficient

so we're going to make a table

containing

the columns

x

y

and then the product

of x y

and then x squared

followed by y squared

so for x we have the numbers one

two

three

four

five and six

so let's extend this a bit

right now let's fill in this table

so next let's put the y values which are

2

4

7

9

12 14.

moving on to the next column we need to

multiply x and y

so one times two

that's going to be two

and then if we multiply

two and four

we're going to get eight

next we're going to multiply 3 and 7

which is 21

and then 4 times 9

that's 36

5 times 12 is 60

and then six times fourteen six times

ten is sixty

six times four is twenty four when you

add sixty and twenty four that gives you

eighty four

now moving on to uh

the next column

x squared so we're going to square the

values that we see

in the x column 1 squared is 1 2 squared

is 4

3 squared is 9

4 squared is 16

5 squared is 25 6 squared is 36

now for y squared all we're going to do

is square the values

in the y column

so 2 squared is 4 4 squared is 16

7 squared is 49 9 squared is 81

12 squared is 144

and 14 squared is 196.

now our next step is to sum up each

column

so if we take the sum of the x values

it's going to be 1 plus 2 plus 3 plus 4

plus 5 plus 6

that's going to be 21.

let me put this in a different color

now let's take the sum of the y values 2

plus 4 plus 7

plus 9 plus 12 plus 14.

so that gives us a sum of 48.

now we need to determine the sum of the

product of x and y

so 2 plus 8 plus 21 plus 36 plus 60 plus

84.

so that's going to give us

211.

now the sum of the x squared values

1 plus 4 plus 9 plus 16 plus

25 plus 36

so that's going to be

91 and then the sum of the y square

values 4 plus 16

49 81 144 and then plus one is 196.

so that's going to be 490.

so this is the sum

of the x values

48 is the sum of the y values

and

211 that's the sum

of x y

91 is the sum

of

x squared and 490 is the sum

of y squared

so once we have those numbers in red

we can now plug in the information into

the formula to get the answer we need so

i'm going to delete

everything up to there

and here's the formula that we need

so the correlation coefficient r

which

in some equations is represented by the

greek symbol rho

it's equal to n

times the sum of

xy minus

the sum of x

times the sum of y

divided by

the square root

and inside the square root it's going to

be

n times the sum of

the x squared values

minus the sum of x values but

we're going to square that so be careful

with that difference

and then it's going to be n

times

the sum of the y squared values minus

the sum of the y values and then squared

so let's plug in everything into that

formula

so it's going to be

n

well we need to know what n is

and if you recall there were six x

values one two three four five six

so n is the number of values that we

have in one single column

so that's six in this example

and then times the sum of xy which is

211

minus the sum of the x values

so that's 21

and then the sum of the y values that's

48.

so this point we just got a

plug everything into this formula

and then it's going to be n

times the sum of the x squared values

which is 91

minus the sum of the x values squared so

that's 21

squared

then it's going to be n times

the sum of the y squared values which is

490

and then

minus

the sum of

y which is 48 but squared

so that's what we have so far

in this example now let's plug

everything in

so we can get rid of these numbers

6 times 211

is 1266

and 21 times 48

that's one thousand

and eight

now six times ninety one that is five

hundred and forty six

twenty one squared is

four hundred and forty one

now six times four ninety

is 29.40

and then 48 squared

that's 23.04

now let's subtract

1266

by 1008

so that's 258.

and then we have 546 minus 441

which is 105

and then 2940 minus 2304

that's 636.

so far we have r is equal to 258

divided by the square root

and now let's multiply those two numbers

so 105 times 636

that's

six thousand seven hundred and eighty

so if we take 258 and divide it by the

square root of sixty six thousand seven

eighty

we get an r value of point

nine nine eight

so this r value is very high

this indicates that

there is a very strong linear

relationship

between

the x and y variables that we have in

this problem

and the fact that it's positive

tells us that the slope is positive

that there's a direct relation between x

and y as x increases y increases

so that's basically it for this video

now you know how to calculate the

correlation coefficient between two

variables