If presented with a dataset, it can be dangerous to try to analyze it with little or no idea of how the data were obtained. To help reflect upon this, we consider how answers change for the same dataset, under different assumptions about how that dataset was collected. In the file ExamScores.csv linked below, each row corresponds to a student. The first column gives that student's grade on a standardized test (scored on a scale from 0 points to 100 points). The second column gives the geographic area in which the student lives. In particular, it indicates in which of six forward sortation areas (FSAs) the student lives, simply coded as in the file. (In actuality, each FSA is distinguished by the first 3 characters of the 6-character postal code.) In all parts that follow, answers are to be given on the scale of exam points, and should be reported to two decimal places.

LINK for ExamScores.csv ( [ the link to 'Download the csv file here' is broken ] )

(a) Presume that the (large) population of interest corresponds to all students in these six FSAs, and that the data arose from a simple random sample of size n = 200 from the population. Give an estimate of the population average test score.

(b) Give an appropriate standard error to accompany the estimate in (a).

(c) Presume that the (large) population of interest corresponds to all students in these six FSAs, and that the data arose from a stratified random sampling using proportional allocation. Give an estimate of the population average test score. .

(d) Give an appropriate standard error to accompany the estimate in (c).

(e) Presume that the (large) population of interest is comprised of all students from a large number of small FSAs, and that the data were obtained from a one-stage cluster sample of FSAs. Give an estimate of the population average test score. .

(f) Give an appropriate standard error to accompany the estimate in (e).

You can earn partial credit on this problem.