Need up to 30 seconds to load.
in this video we're going to talk about
how to calculate the correlation
coefficient between two variables
but before we do that let's talk about
what the correlation coefficient tells
us
so let's say we have a straight line
and we have points on the line
the correlation coefficient
will be positive one
for this situation because all the
points are on a line and the line has a
positive slope
let's say this is x and this is y
it's a positive correlation because as x
increases y increases
there is a direct relationship between x
and y
now here's another scenario
here we have points
on the line but notice the line is going
down
and so the correlation coefficient will
be equal to negative one
as x increases y decreases
and so in that case we have an inverse
relationship
now in this example
the points are not necessarily on the
line
but they're close to it
and so these points they have somewhat
of a linear relationship
but not exactly one
so in this case r is going to be
somewhere between zero and one it's
positive because
this line is increasing it has a
positive slope
but it's not exactly one
if we were to put a number to it it
might be
0.8 or something
whereas let's say if you have
a similar line but let's say the points
are
more scattered
about that line
the r value will be less it might be 0.7
or 0.6
it could be completely different but
it's somewhere between zero and one
but the the point is this though
the closer that the points are
next to the line
r is going to be closer to one
these points they're further away from
the line so
r is going to be closer to zero than one
relative to this number
now sometimes
there won't be any correlation let's say
if
you have
just random points everywhere
in this case
r could be very close to zero for a
situation like that when there's no
apparent correlation
so the correlation coefficient
really tells us the strength
of
the linear relation between two
variables
if these two variables have no linear
relationship r is going to be close to
zero if there is a strong linear
relationship r is going to be close to
either positive one or negative one
depending on the slope of the line
now let's take a minute and calculate
the correlation coefficient
so we're going to make a table
containing
the columns
x
y
and then the product
of x y
and then x squared
followed by y squared
so for x we have the numbers one
two
three
four
five and six
so let's extend this a bit
right now let's fill in this table
so next let's put the y values which are
2
4
7
9
12 14.
moving on to the next column we need to
multiply x and y
so one times two
that's going to be two
and then if we multiply
two and four
we're going to get eight
next we're going to multiply 3 and 7
which is 21
and then 4 times 9
that's 36
5 times 12 is 60
and then six times fourteen six times
ten is sixty
six times four is twenty four when you
add sixty and twenty four that gives you
eighty four
now moving on to uh
the next column
x squared so we're going to square the
values that we see
in the x column 1 squared is 1 2 squared
is 4
3 squared is 9
4 squared is 16
5 squared is 25 6 squared is 36
now for y squared all we're going to do
is square the values
in the y column
so 2 squared is 4 4 squared is 16
7 squared is 49 9 squared is 81
12 squared is 144
and 14 squared is 196.
now our next step is to sum up each
column
so if we take the sum of the x values
it's going to be 1 plus 2 plus 3 plus 4
plus 5 plus 6
that's going to be 21.
let me put this in a different color
now let's take the sum of the y values 2
plus 4 plus 7
plus 9 plus 12 plus 14.
so that gives us a sum of 48.
now we need to determine the sum of the
product of x and y
so 2 plus 8 plus 21 plus 36 plus 60 plus
84.
so that's going to give us
211.
now the sum of the x squared values
1 plus 4 plus 9 plus 16 plus
25 plus 36
so that's going to be
91 and then the sum of the y square
values 4 plus 16
49 81 144 and then plus one is 196.
so that's going to be 490.
so this is the sum
of the x values
48 is the sum of the y values
and
211 that's the sum
of x y
91 is the sum
of
x squared and 490 is the sum
of y squared
so once we have those numbers in red
we can now plug in the information into
the formula to get the answer we need so
i'm going to delete
everything up to there
and here's the formula that we need
so the correlation coefficient r
which
in some equations is represented by the
greek symbol rho
it's equal to n
times the sum of
xy minus
the sum of x
times the sum of y
divided by
the square root
and inside the square root it's going to
be
n times the sum of
the x squared values
minus the sum of x values but
we're going to square that so be careful
with that difference
and then it's going to be n
times
the sum of the y squared values minus
the sum of the y values and then squared
so let's plug in everything into that
formula
so it's going to be
n
well we need to know what n is
and if you recall there were six x
values one two three four five six
so n is the number of values that we
have in one single column
so that's six in this example
and then times the sum of xy which is
211
minus the sum of the x values
so that's 21
and then the sum of the y values that's
48.
so this point we just got a
plug everything into this formula
and then it's going to be n
times the sum of the x squared values
which is 91
minus the sum of the x values squared so
that's 21
squared
then it's going to be n times
the sum of the y squared values which is
490
and then
minus
the sum of
y which is 48 but squared
so that's what we have so far
in this example now let's plug
everything in
so we can get rid of these numbers
6 times 211
is 1266
and 21 times 48
that's one thousand
and eight
now six times ninety one that is five
hundred and forty six
twenty one squared is
four hundred and forty one
now six times four ninety
is 29.40
and then 48 squared
that's 23.04
now let's subtract
1266
by 1008
so that's 258.
and then we have 546 minus 441
which is 105
and then 2940 minus 2304
that's 636.
so far we have r is equal to 258
divided by the square root
and now let's multiply those two numbers
so 105 times 636
that's
six thousand seven hundred and eighty
so if we take 258 and divide it by the
square root of sixty six thousand seven
eighty
we get an r value of point
nine nine eight
so this r value is very high
this indicates that
there is a very strong linear
relationship
between
the x and y variables that we have in
this problem
and the fact that it's positive
tells us that the slope is positive
that there's a direct relation between x
and y as x increases y increases
so that's basically it for this video
now you know how to calculate the
correlation coefficient between two
variables