The questions involves the data set for Richmond townhouses obtained on 2014.11.03.
You will get a different subset than in the preceding WebWork question.
For your subset, the response variable is:
asking price divided by 10000:
askpr=c(53.9, 77.8, 73.9, 62.8888, 58.68, 60.8, 53.8, 41.99, 59.8, 47.8, 57.5, 74.8, 51.99, 65.99, 50.8, 25.9, 48.5, 50.8, 58.39, 71.99, 68.5, 79.8, 55.8, 61.5, 54.8, 44.8, 73.8, 26.99, 40.8, 33.7, 54.98, 81.9, 55.2, 53.8, 65.8, 46.8, 68.5, 40.8, 56.8, 78.8, 48.8, 47.9, 40.9, 56.88, 79.99, 50.5, 62.9, 51.68, 57.8, 52.4)
The explanatory variables are:
(i) finished floor area divided by 100
ffarea=c(11.84, 16.5, 15.15, 15.77, 13.96, 13.2, 10.95, 12.9, 17.63, 13.34, 13.46, 17.48, 12.09, 22.78, 16.6, 6.1, 14.8, 12.27, 15.09, 15.05, 13.59, 15.25, 13.06, 14.5, 15.46, 9.4, 17.54, 10.5, 12.26, 12, 13.06, 20.95, 15.3, 12.22, 13.45, 16.2, 15.76, 14, 15.5, 19.48, 14.8, 12.1, 16.06, 15.78, 22, 12.26, 14, 15.1, 13.84, 16.22)
(ii) age
age=c(15, 3, 0, 6, 9, 3, 18, 44, 26, 32, 10, 5, 7, 35, 23, 11, 24, 17, 8, 8, 2, 3, 0, 7, 41, 14, 9, 37, 29, 28, 1, 19, 9, 9, 1, 30, 4, 38, 23, 11, 50, 7, 25, 17, 20, 3, 5, 20, 10, 25)
(iii) monthly maintenance fee divided by 10
mfee=c(21, 25.4, 22.2, 35.7, 22, 18.9, 24.7, 23.2, 32, 24.5, 22.1, 29.7, 18.1, 57.4, 19.9, 17.1, 16.1, 25.2, 20.3, 22.3, 17, 35, 18.6, 18.7, 31, 23.3, 18.2, 28, 19.8, 25.9, 19.6, 34.8, 16.9, 18.5, 18.2, 16, 22.1, 23, 17.4, 20.4, 25, 18, 24.4, 17.3, 26.7, 18, 19.6, 24.5, 16, 36.4)
(iv) number of bedrooms
beds=c(2, 4, 4, 3, 3, 3, 2, 3, 5, 3, 3, 4, 3, 2, 4, 1, 3, 2, 4, 3, 3, 2, 3, 3, 3, 2, 4, 2, 3, 2, 3, 1, 3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 2, 4, 3, 3, 3, 3, 3, 3)
After you have copied the above R vectors into your R session, you can get a dataframe with
richmondtownh=data.frame(askpr,ffarea,age,mfee,beds)

The corresponding vectors for the holdout set are:
askpr.ho=c(54.8, 108.8, 86.8, 49.9, 45.99, 58.8, 57.8, 68.8, 68.8)
ffarea.ho=c(11.26, 23.98, 15.08, 15.6, 16.01, 17.37, 12.01, 16.9, 15.95)
age.ho=c(0, 16, 1, 20, 25, 26, 0, 8, 18)
mfee.ho=c(24.8, 36.9, 48.8, 27, 33.7, 31, 14.2, 19.4, 23.6)
beds.ho=c(2, 3, 3, 3, 3, 3, 3, 4, 3)

Create a second data frame:
holdout=data.frame(askpr.ho,ffarea.ho,age.ho,mfee.ho,beds.ho)
names(holdout)=names(richmondtownh) [to make variables names the same as before]


Use the ls.cvrmse() function from the course web site with 3 and 4 explanatory variables, when askpr is the response variable.
For 4 explanatory variables, all of the above are included.
For 3 explanatory variables, all of the above are included except beds .

Please use 3 decimal places for the answers below which are not integer-valued

Part a)
The values of residual SD or residual SE for the regressions with 3 and 4 explanatory variables are respectively:
3 explanatory:
4 explanatory:

Part b)
The values of the leave-one-out cross-validation RMS prediction error for the regressions with 3 and 4 explanatory variables are respectively:
3 explanatory:
4 explanatory:

Part c)
The values of the holdout RMS prediction error for the regressions with 3 and 4 explanatory variables are respectively:
3 explanatory:
4 explanatory:

Part d)
Do you get the same conclusion on which is better between the 3-explanatory versus 4-explanatory models from the summaries in (a), (b), (c).
(enter either Yes or No).

Part e)
Compare the three values for the models with 3 explanatory variables (respectively 4 explanatory); that is, values of residual SE, leave-one out root mean square error and holdout subset root mean square error for 3 explanatory (respectively, 4 explanatory).
What is the best explanation for the root mean square prediction errors to be larger for parts (b) and (c)?







Hint:

You can earn partial credit on this problem.