#
PEP 6305 Measurement in
Health & Physical Education

##
Topic 6:
Correlation

##
Section 6.1

#

##
**n**
**This Topic has 2 Sections.
**

#

# Reading

##
n
Vincent & Weir, *Statistics in Kinesiology, 4th ed., *Chapter
8 “Correlation and Bivariate Regression” pp. 105-117

# Purpose

##
n
To discuss and demonstrate the concept and application of simple
bivariate correlation.

#

#
Correlation

#

## n
*Correlation* is a value that tells us **the degree to which two
variables are related**. If two variables are related, the value of one
variable can tell us something about the value of the other variable.

## n
Correlation analysis allows us to ask questions such as: How much are the values of one variable associated with
the values of another variable? How much does one variable change as another
variable changes?

###
¨
Correlation is measured by evaluating the extent to which the
deviations from the mean in one variable correspond to the deviations from the
mean in another variable.

###
¨
The *correlation coefficient* ranges from -1.00 to +1.00.
Correlation coefficients are interpreted by their ** ***magnitude* and *
***sign**,
discussed below.

##
n
Correlation has many uses, such as **identifying related characteristics**
(such as height and weight; taller people tend to be weigh more), **evaluating validity**
(whether a variable, such as skinfold thickneses, can be used as a measure of
another variable, such as body fat), or **predicting the value of one variable
using the values of a second variable** (such as using SAT scores as a
predictor of freshman-year college grades).

##
n
Just because two variables are correlated does __not__ mean
that a change in one variable *causes* a change in another variable.

###
¨
*Causality*, or determining if a factor caused an observed
effect, which is an important goal of science, is determined through __
__*
research design* and the systematic way the data are collected,
not by a statistical test.

#

#
Scatterplots

##
n
The concept of correlation can be demonstrated by using ** ***
scatterplots*.

###
¨
A *scatterplot* is a graph of data points for two variables,
with one variable on each axis.

###
¨
The data points are plotted in the field of the graph according to their values
for each variable. This produces a "scatter" of points; a more narrow scatter
pattern occurs when the correlation is high.

###
¨
This Web site has an applet that shows how the data points in a scatterplot
change with different values for the correlation coefficient.

##
n
Open R Commander and load the Dataset6305 file .

###
¨
Go to Graphs>Scatterplot...

###
¨
Highlight *height* in the *x-variable* box and
highlight *weight* in the *y-variable* box. UN-click the *
Marginal boxplots*, *Smooth line*, and *Show spread *boxes.
(we don't need that right now, and it clutters the graph) You can leave
everything else the way it is. Click OK.

###
¨
You should get a graph that looks something like this:

##

###

##
n
A correlation coefficient can have a positive or negative value, which is called the
sign
or
direction
of the correlation
and is one of the things used to interpret these coefficients.

##
n
The scatterplot of height and weight shows a **correlation **
with a** ***positive* sign indicating what is called a *
direct* association. This also shows that the *line of best fit*,
shown in green on this plot and which
describes
the linear relation between height and weight, has a **positive** slope (from
left to right, the line move up).

###
¨
As
the value of X (height) increases (i.e., as you move from left to right on the X
axis, getting taller), the value of Y (weight) also increases (i.e., moves toward the top on the
Y axis, getting heavier).

###
¨
Taller people tend to weigh more; shorter people tend to weigh less.
Weight is related to, or associated with, height.

##
n
A **correlation coefficient of +1.00** means that every subject’s
scores are *exactly* the same standardized distance and the same direction
from the means for both variables.

###
¨
Any **positive value less than 1.00** means that
most of the scores are in the same direction from the mean of both variables and
most of the standard distances from the means of both variables are similar. The
closer the value gets to 0, the less relation between variables and the more
random the points will be on the graph.

##

##
n
Use R Commander to create a scatterplot for prcntfat (percent body
fat) and aerobfit (aerobic
fitness):

##

##
n
This is an example of a *negative* correlation or **
***inverse* association. The line of best fit now has a **negative**
slope.

###
¨
As the value of X (aerobfit) increases (i.e., moves toward the
right on the X axis, getting more fit), the value of Y (prcntfat) decreases (i.e., moves toward the
bottom on the Y axis, getting less fat).

###
¨
Typically, fit people are more lean that unfit people, who tend to
have a higher relative amount of body fat.

##
n
A **correlation coefficient of -1.00** would mean that every subject’s
scores are the exactly same standardized distance but in opposite directions
from the means of both variables.

##
n
Note in both graphs that values of X are associated with more than one
value of Y. This spread or "scatter" of Y values around the line of best fit represents the
**magnitude** of the correlation, which in addition to the sign is
used to interpret the correlation.

###
¨
The closer that all of the points come to the line, the higher the
magnitude of the
correlation.

###
¨
For example,
some taller people weigh less that some shorter people. For
example, there is a person who is 180 cm tall who weighs only 60 kg; most people
who weigh 60 kg are considerably shorter--the line of best fit tells us that on
average they are 155 cm tall.

##
n
If the two variables are perfectly correlated (*r* = 1.00 or
-1.00) all of the points will lie exactly on the line of best fit.

##
n
If the two variables are completely uncorrelated, the points will
be scattered at random and the line of best fit will be perfectly flat (horizontal, no
slope). Use R Commander to create a scatterplot for height and age:

##

##
n
A correlation coefficient of 0 means that the two variables, age
and height, are unrelated to one another. As can be seen in this graph, older
people are not systematically taller or shorter than younger people.

##
n
A correlation coefficient provides the magnitude and direction of
a **linear association** between two variables, which means it is a
measure of whether a *line* can be used to depict the relation
between the two variables. Two variables, however, may be highly related, but in
a *non-linear* way.

##
n
For example, two variables may have a *curvilinear*
association:

##

##
n
Fitness level is relatively unrelated to body weight between
weights of 60 kg and 90 kg (the line is fairly horizontal or flat), but for body weight values
>90 kg fitness level decreases somewhat sharply. Higher values of X are
associated with lower values of Y, but the values of Y decrease **at a higher
rate** than accounted for by a simple line. A *curve* in this example is
a better depiction to the data points because it shows the accelerating rate of
decrease in Y as X increases.

##
n
A correlation coefficient __is not__ an accurate measure of the
association between two variables if the association is non-linear.

##
n
Non-linear associations can take on a number of forms, none of
which will be discussed here. The important thing to remember is that __if a
non-linear pattern is evident in the data, then the simple correlation
coefficient is __**inappropriate**. If you create and inspect a scatterplot
of your data, you can
evaluate whether the relation appears to be linear or non-linear.

##

__Click
to go to the next section (Section 6.2)__