Regression Analysis of Y on X
shows the relationship between two variables X and Y as a straight line that
minimizes, for a series of values of a predictive variable X,
the square of the difference between the expected value and observed values of the response variable Y.
The slope of the line is the regression coefficient (r),
which shows the strength of the predictability of Y from X. A slope of 1.0 indicates that Y is perfectly
predictable, and a slope of 0.0
that there is no relationship. Here, the slope of 0.5 indicates a strong but
imperfect relationship: small values of Y are typically associated
with small values of X,
and high with high, but note for example that of the
nine smallest values of X,
three are associated with the highest values of Y.
If the analysis is done as a test of association between X and Y rather than a prediction of Y by X, the correlation
coefficient (r2) should be used instead.
The calculations are
identical, but because r <
1, necessarily r2 < r. A properly-designed regression analysis requires that
the predictive X
variable be controlled, e.g., that the response Y is measured at discrete,
pre-determined values of X.
A common analytical
error is to present an association
analysis between two uncontrolled variables as a prediction analysis: X is plausibly argued to
cause Y, and the
result is evaluated by r
instead of r2, so
as to obtain a higher number and by implication a stronger
Figure ©2002 by Griffiths et al.; all text material ©2015 by
Steven M. Carr