This is a question on interpretation of regression equations which
have categorical explanatory variables where slopes on non-categorical
variables do not depend on the category. This model assumes that hyperplanes
are parallel for different categories, and the regression coefficients
for the binary dummy variables can be used to determine distances
between hyperplanes for different categories.
For your subset of the cereal data set, the response variable is:
calories
calories=c(110, 90, 90, 150, 100, 100, 80, 110, 130, 120, 90, 100, 130, 110, 100, 110, 110, 110, 110, 110, 100, 100, 100, 110, 90, 150, 120, 110, 120, 110, 110, 110, 90, 110, 110, 100, 100, 110, 70, 110, 110, 120)
The explanatory variables are:
(i) protein
protein=c(3, 3, 3, 4, 2, 3, 2, 2, 3, 3, 2, 2, 3, 2, 3, 2, 1, 1, 2, 2, 3, 3, 3, 1, 3, 4, 3, 1, 1, 1, 3, 2, 2, 2, 3, 2, 2, 1, 4, 6, 1, 3)
(ii) fat
fat=c(1, 0, 0, 3, 0, 2, 0, 1, 2, 3, 0, 0, 2, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 3, 2, 1, 3, 1, 2, 1, 1, 2, 0, 1, 1, 1, 1, 2, 1, 1)
(iii) fiber
fiber=c(1.5, 5, 3, 3, 0, 2.5, 3, 0, 1.5, 3, 3, 1, 2, 0, 3, 0, 0, 0, 0, 0, 3, 3, 3, 0, 4, 3, 5, 0, 0, 0, 2, 1, 4, 1.5, 3, 2, 2, 0, 10, 2, 0, 6)
(iv) carbo
carbo=c(11.5, 13, 20, 16, 11, 10.5, 16, 21, 13.5, 13, 15, 18, 18, 21, 17, 21, 13, 23, 22, 12, 15, 16, 17, 14, 19, 16, 12, 12, 13, 13, 13, 16, 15, 10.5, 17, 11, 15, 15, 5, 17, 12, 11)
(v) sugars
sugars=c(10, 5, 0, 11, 15, 8, 0, 3, 10, 4, 5, 5, 8, 3, 3, 3, 12, 2, 3, 12, 5, 3, 3, 11, 0, 11, 10, 13, 9, 12, 7, 8, 6, 10, 3, 10, 6, 9, 6, 1, 13, 14)
(vi) mfr
mfr=c('G', 'P', 'N', 'R', 'P', 'G', 'N', 'G', 'G', 'P', 'N', 'R', 'G', 'G', 'G', 'G', 'G', 'R', 'R', 'G', 'P', 'G', 'R', 'P', 'N', 'R', 'P', 'G', 'G', 'P', 'G', 'G', 'R', 'G', 'P', 'G', 'G', 'G', 'N', 'G', 'G', 'P')
You are to fit a multiple regression model with the response variable
'calories' and
6 explanatory variables protein, fat, fiber, carbo, sugars, mfr.
After you have copied the above R vectors into your R session,
you can get a dataframe with
cereal = data.frame(cbind(calories,protein,fat,fiber,carbo,sugars))
cereal$ mfr = mfr
Please use 3 decimal places for the answers below which are not
integer-valued.
For the regression being requested, you should find the most or all of the
coefficients for protein,fat,fiber,carbo,sugars to be statistically
significant. Some of the manufacturers might be significantly different from
others but not all pairs of manufacturers are significantly different from
each other.
To answer the parts (a) and (b) below, two separate regressions could
be done (with 2 different manufactors as the baseline categories). If
you want to challenge yourself to answer them both based on one application
of lm(),
you need to use the cov.unscaled component of the summary of an lm object.
Part a)
The estimate of the signed distance of the hyperplane for
manufacturer G relative to P is
and its SE is
Part b)
The estimate of the signed distance of the hyperplane for
manufacturer N relative to R is
and its SE is
Part c)
What is the adjusted ?
Part d)
What is the residual SD (residual SE in R)?
Part e)
If interaction of mfr and sugars (mfr:sugars) were added to the lm statement,
how many betas would be in the regression equation?
Hint:
You can earn partial credit on this problem.