Limits of prediction from correlation & regression analyses

    The plot compares unadjusted final exam marks in two courses [Bio3250 (Principles of Genetics) and Bio3900 (Principles of Evolution)] for all persons who took both courses from the same instructor in successive Fall and Winter semesters. These marks are highly correlated: r2 = 0.51. The regression line with a slope of r = 0.7 uses 3250 marks (X) to predict performance in 3900 (Y) in the following semester.

Note that  several broad trends are apparent:
    (1) No-one with an A in 3250 got less than a B in 3900,
    (2) Most people with a B in 3250 get a B in 3900, and
    (3) No-one who receives less than a C in 3250 receives better than a B in 3900.

    Despite the high correlation, there is considerable scatter about the regression line. Notice that persons who got a B in 2250 got marks ranging from A ~ D  in 3900, and those who got C have a range of B ~ D in 3900. That is, despite the high correlation, there is a wide range of variation in individual performance.


All text material © 2011 by Steven M. Carr