1.1 Visualization

The first 5 rows of dat_recs can be visualized:

y x2 x3 x4 x5 x6 x7 x8
7.540 5 3 8 1 39 5 15
8.193 1 2 2 0 85 5 14
8.678 3 1 1 0 71 5 8
7.846 4 3 5 1 39 5 8
9.755 1 3 3 0 57 0 10

1.1.1 Box Plot

For each level of x2 a box indicating three quantiles (25%, 50%, 75%) of y is given. It shows that there is a tendency for y to decrease with x2 by looking at the median. The sizes of different boxes seem to vary with different values of x2. Besides, there are many observations when x2 is small. But it is assumed for now that the conditional variance is constant, which will be tested section 4. Three dat_recsa points with extreme values 36, 241 and 163 is discussed in sections 3 and 5.

The box plot of y by x6 is given. It can be seen that the tendency is not strictly linear and the condition variance is not stable. So we will regress y on x2 first and use x6 as the second regressor in section 6.