Regression Analysis of Y on X

Regression analysis shows the relationship between two variables X and Y as a straight line that minimizes, for a series of values of a predictive variable X,  the square of the difference between the expected value and observed values of the response variable Y. The slope of the line is the regression coefficient (r), which shows the strength of the predictability of Y from X. A slope of 1.0 indicates that Y is perfectly predictable, and a slope of 0.0 that there is no relationship. Here, the slope of 0.5 indicates a strong but imperfect relationship: small values of Y are typically associated with small values of X, and high with high, but note for example  that of the nine smallest values of X, three are associated with the highest values of Y.

If the analysis is done as a test of association between X and Y rather than a prediction of Y by X, the correlation coefficient (r2) should be used instead. The calculations are identical, but because r < 1, necessarily r2 < r. A properly-designed regression analysis requires that the predictive X variable be controlled, e.g., that the response Y is measured at discrete, pre-determined values of X. A common analytical error is to present an association analysis between two uncontrolled variables as a prediction analysis:  X is plausibly argued to cause Y, and the result is evaluated by r instead of r2, so as to obtain a higher number and by implication a stronger prediction.

Figure ©2002 by Griffiths et al.; all text material ©2015 by Steven M. Carr