This is a problem on interpretation of regression equations which have categorical explanatory variables where slopes on non-categorical variables do not depend on the category. This model assumes that hyperplanes are parallel for different categories, and the regression coefficients for the binary dummy variables can be used to determine distances between hyperplanes for different categories.

Context of data set: American post-secondary schools in 2014-2015, where the annual in-state tuition is less than $ 20000. The response variable is the total number of applicants (in thousands). There were many explanatory variables in the complete data set, but only a few are included here. Region is a categorical variable for different parts of the United States. There is some merging of categories compared with the original data set. For your multiple regression output, check that your estimates are interpretable before you submit answers.

For your subset of the university applicant data set, the response variable is: applicants, in thousands (for the homework questions below, convert the response with natural logarithm).

applicants=c(2.953, 20.677, 10.111, 60.543, 26.496, 25.438, 7.408, 31.941, 16.958, 31.28, 9.679, 86.537, 66.515, 3.542, 32.19, 38.785, 10.991, 33.736, 20.756, 18.42, 21.873, 25.884, 73.782, 7.075, 31.021, 14.887, 5.111, 5.002, 12.92, 14.116, 49.776, 31.611, 28.518, 15.61, 19.814, 20.918, 16.689, 4.582, 14.944, 24.988, 30.629, 10.332, 44.76, 20.934, 18.107, 8.754, 36.788, 31.332, 16.125, 66.813, 10.039, 5.345, 5.465, 20.175, 11.552, 14.223, 20.923, 13.758, 14.933, 36.362, 36.101, 33.211, 21.359, 5.713, 35.822, 11.258, 50.299, 7.101, 20.443, 5.017, 10.217, 40.727, 73.448, 10.245, 25.194)

The explanatory variables are:
(i) per.admit (percentage admitted)
per.admit=c(55, 72, 52, 40, 62, 57, 87, 60, 83, 50, 63, 19, 37, 89, 71, 40, 81, 51, 52, 80, 66, 33, 16, 57, 29, 82, 76, 84, 70, 78, 32, 55, 44, 64, 51, 56, 76, 85, 66, 53, 53, 63, 45, 74, 60, 95, 53, 28, 84, 36, 42, 93, 83, 53, 83, 77, 77, 75, 76, 76, 58, 66, 75, 83, 59, 83, 50, 70, 52, 49, 60, 56, 33, 61, 68)

(ii) num.enroll (enrollment, in thousands)
num.enroll=c(5.695, 29.203, 9.233, 34.508, 37.485, 42.598, 28.515, 48.378, 25.912, 26.541, 20.517, 41.845, 30.051, 10.061, 61.642, 51.313, 31.515, 36.047, 21.857, 15.117, 35.158, 23.109, 37.565, 10.725, 23.732, 28.686, 7.099, 16.936, 30.69, 28.886, 43.625, 44.784, 16.695, 27.238, 20.611, 35.197, 35.313, 15.071, 30.297, 41.938, 28.617, 11.314, 51.147, 29.217, 20.655, 14.534, 58.322, 29.135, 29.477, 23.051, 12.602, 13.952, 15.805, 49.61, 11.286, 27.511, 16.571, 30.848, 28.628, 46.416, 21.498, 50.081, 24.096, 14.747, 45.14, 13.183, 47.04, 15.829, 33.989, 10.241, 13.979, 17.866, 30.709, 39.74, 22.68)

(iii) stfacratio (student/faculty ratio)
stfacratio=c(16, 18, 18, 18, 14, 18, 22, 16, 17, 16, 21, 16, 19, 17, 20, 17, 17, 21, 16, 19, 22, 18, 17, 18, 15, 15, 13, 17, 19, 19, 12, 18, 20, 19, 15, 18, 18, 19, 24, 25, 14, 11, 17, 18, 15, 19, 18, 13, 21, 17, 23, 18, 20, 26, 16, 18, 17, 17, 23, 17, 19, 17, 19, 19, 19, 20, 17, 17, 16, 15, 19, 18, 19, 26, 15)

(iv) avg.grant (average grant for financial aid, in thousands)
avg.grant=c(7.173, 11.848, 7.215, 15.528, 7.327, 8.036, 4.965, 10.726, 8.901, 10.265, 5.421, 17.423, 17.09, 4.412, 8.731, 8.727, 6.726, 10.736, 8.684, 11.579, 6.414, 8.798, 16.425, 7.745, 16.449, 10.526, 10.316, 5.896, 7.959, 7.035, 14.671, 13.821, 6.838, 6.372, 8.076, 8.227, 6.18, 6.003, 5.343, 5.976, 10.461, 9.924, 7.511, 5.715, 9.097, 5.969, 8.435, 13.447, 6.229, 16.958, 5.678, 5.036, 5.826, 5.055, 7.719, 7.609, 9.507, 7.736, 7.118, 8.834, 16.227, 9.747, 6.451, 4.89, 11.818, 7.746, 6.541, 8.591, 9.322, 6.974, 7.879, 17.287, 16.638, 6.234, 8.788)

(v) grad.rate (graduation rate, maybe this means within 4 or 5 years)
grad.rate=c(35, 58, 12, 81, 66, 82, 41, 79, 68, 82, 50, 92, 86, 45, 80, 79, 59, 67, 82, 78, 62, 79, 91, 43, 93, 67, 66, 34, 38, 61, 91, 81, 79, 53, 54, 79, 62, 47, 41, 57, 79, 69, 73, 63, 57, 50, 82, 89, 52, 80, 45, 49, 37, 49, 59, 58, 63, 56, 28, 75, 66, 79, 67, 53, 84, 43, 86, 44, 71, 40, 61, 74, 86, 40, 80)

(vi) region (5 categories are FarWest, Gl.NE for GreatLakesNewEngland, Mid for Middle/Central longitude, Southeast, West)
region=c('Mid', 'Southeast', 'West', 'FarWest', 'Mid', 'GLNE', 'FarWest', 'Mid', 'Southeast', 'GLNE', 'Southeast', 'FarWest', 'FarWest', 'Mid', 'West', 'West', 'West', 'Southeast', 'Southeast', 'GLNE', 'West', 'Southeast', 'FarWest', 'Southeast', 'Southeast', 'FarWest', 'GLNE', 'GLNE', 'GLNE', 'FarWest', 'GLNE', 'FarWest', 'Mid', 'Southeast', 'GLNE', 'Southeast', 'GLNE', 'West', 'Southeast', 'Southeast', 'Mid', 'Mid', 'Mid', 'GLNE', 'GLNE', 'Mid', 'GLNE', 'Southeast', 'GLNE', 'FarWest', 'Southeast', 'West', 'Southeast', 'Southeast', 'GLNE', 'Southeast', 'GLNE', 'Southeast', 'West', 'GLNE', 'FarWest', 'GLNE', 'FarWest', 'Mid', 'GLNE', 'GLNE', 'Mid', 'West', 'Southeast', 'Southeast', 'Mid', 'FarWest', 'FarWest', 'West', 'Mid')

You are to fit a multiple regression model with the response variable log(applicants), the natural logarithm of "applicants" and the 6 explanatory variables given above.
After you have copied the above R vectors into your R session, you can get a dataframe with

univ = data.frame(applicants, per.admit, num.enroll, stfacratio, avg.grant, grad.rate, region)

Please use 3 decimal places for the answers below which are not integer-valued.

For the regression being requested, you should find the most or all of the coefficients for per.admit, num.enroll, stfacratio, avg.grant, grad.rate to be statistically significant. Some of the regions might be significantly different from others but not all pairs of regions are significantly different from each other.

To answer the parts (a) and (b) below, two separate regressions could be done (with 2 different regions as the baseline categories). If you want to challenge yourself to answer them both based on one application of lm(), you need to use the cov.unscaled component of the summary of an lm object.

Part a)
The estimate of the signed distance of the hyperplane for region Southeast relative to FarWest is and its SE is

Part b)
The estimate of the signed distance of the hyperplane for region Southeast relative to GLNE is and its SE is

Part c)
What is the adjusted ?

Part d)
What is the residual SD (residual SE in R)?

Part e)
If interaction of region and num.enroll (i.e., the term region:num.enroll) were added to the lm statement, how many betas would be in the regression equation?

Hint:

You can earn partial credit on this problem.