A web-based business carried out a survey of its past customers in the following manner. Their database includes age group and email address for everyone in this large population. At random they selected 300 past customers, and emailed a questionnaire to them, with the incentive that a gift card would be provided to anyone who responded. (The cost of providing this incentive was the limiting factor in not being able to select more than 300 past customers.) One of the questionnaire items asked for annual income. The file linked below gives age group (5 possible categories) for all contacted customers, and the incomes of those who responded. Income is recorded as `NA' for those not responding.

LINK for EmailSurvey.csv ( [ the link to 'Download the csv file here' is broken ] )

(a) Note that the average income of the people who responded is . Using your knowledge and skills in the area of non-response, give a better estimate than this, for the average income of all past customers. Report your answer to the nearest dollar.

(b) Give a standard error for your estimate in (a) (also to the nearest dollar).

(c) Which of the following three statements would be the best assumption to justify the analysis you carried out above (and if it happens to be that two of the assumptions make the analysis valid, then regard the weaker of these two as better)?





You can earn partial credit on this problem.