WeBWorK Standalone Renderer

This is a question on interpretation of regression equations which have categorical explanatory variables where slopes on non-categorical variables do not depend on the category. This model assumes that hyperplanes are parallel for different categories, and the regression coefficients for the binary dummy variables can be used to determine distances between hyperplanes for different categories.

For your subset of the cereal data set, the response variable is: calories
calories=c(110, 90, 90, 150, 100, 100, 80, 110, 130, 120, 90, 100, 130, 110, 100, 110, 110, 110, 110, 110, 100, 100, 100, 110, 90, 150, 120, 110, 120, 110, 110, 110, 90, 110, 110, 100, 100, 110, 70, 110, 110, 120)

The explanatory variables are:
(i) protein
protein=c(3, 3, 3, 4, 2, 3, 2, 2, 3, 3, 2, 2, 3, 2, 3, 2, 1, 1, 2, 2, 3, 3, 3, 1, 3, 4, 3, 1, 1, 1, 3, 2, 2, 2, 3, 2, 2, 1, 4, 6, 1, 3)
(ii) fat
fat=c(1, 0, 0, 3, 0, 2, 0, 1, 2, 3, 0, 0, 2, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 3, 2, 1, 3, 1, 2, 1, 1, 2, 0, 1, 1, 1, 1, 2, 1, 1)
(iii) fiber
fiber=c(1.5, 5, 3, 3, 0, 2.5, 3, 0, 1.5, 3, 3, 1, 2, 0, 3, 0, 0, 0, 0, 0, 3, 3, 3, 0, 4, 3, 5, 0, 0, 0, 2, 1, 4, 1.5, 3, 2, 2, 0, 10, 2, 0, 6)
(iv) carbo
carbo=c(11.5, 13, 20, 16, 11, 10.5, 16, 21, 13.5, 13, 15, 18, 18, 21, 17, 21, 13, 23, 22, 12, 15, 16, 17, 14, 19, 16, 12, 12, 13, 13, 13, 16, 15, 10.5, 17, 11, 15, 15, 5, 17, 12, 11)
(v) sugars
sugars=c(10, 5, 0, 11, 15, 8, 0, 3, 10, 4, 5, 5, 8, 3, 3, 3, 12, 2, 3, 12, 5, 3, 3, 11, 0, 11, 10, 13, 9, 12, 7, 8, 6, 10, 3, 10, 6, 9, 6, 1, 13, 14)
(vi) mfr
mfr=c('G', 'P', 'N', 'R', 'P', 'G', 'N', 'G', 'G', 'P', 'N', 'R', 'G', 'G', 'G', 'G', 'G', 'R', 'R', 'G', 'P', 'G', 'R', 'P', 'N', 'R', 'P', 'G', 'G', 'P', 'G', 'G', 'R', 'G', 'P', 'G', 'G', 'G', 'N', 'G', 'G', 'P')

You are to fit a multiple regression model with the response variable 'calories' and
6 explanatory variables protein, fat, fiber, carbo, sugars, mfr.
After you have copied the above R vectors into your R session, you can get a dataframe with

cereal = data.frame(cbind(calories,protein,fat,fiber,carbo,sugars))
cereal$ mfr = mfr

Please use 3 decimal places for the answers below which are not integer-valued.

For the regression being requested, you should find the most or all of the coefficients for protein,fat,fiber,carbo,sugars to be statistically significant. Some of the manufacturers might be significantly different from others but not all pairs of manufacturers are significantly different from each other.

To answer the parts (a) and (b) below, two separate regressions could be done (with 2 different manufactors as the baseline categories). If you want to challenge yourself to answer them both based on one application of lm(), you need to use the cov.unscaled component of the summary of an lm object.

Part a)
The estimate of the signed distance of the hyperplane for manufacturer G relative to P is and its SE is

Part b)
The estimate of the signed distance of the hyperplane for manufacturer N relative to R is and its SE is

Part c)
What is the adjusted $R^2$ ?

Part d)
What is the residual SD (residual SE in R)?

Part e)
If interaction of mfr and sugars (mfr:sugars) were added to the lm statement, how many betas would be in the regression equation?

Hint:

You can earn partial credit on this problem.