This is a question on interpretation of principal component analysis
when the input could be a correlation matrix (standardized variables)
or a covariance matrix (no scaling, in which case, pay attention to the
units and range of the variables).
A data set consists
of women national track records in the year 1984 for over 50 countries:
variables are record times
for 100m, 200m, 400m in seconds, and for 800m, 1500m, 3000m,
marathon (42195m) in minutes.
Your data variables are:
m100=c(12.9, 11.96, 11.61, 11.81, 10.79, 11.2, 11.13, 11.76, 11.45, 11.76, 11.29, 11.13, 12.3, 11.25, 11.84, 11.89, 11.95, 11.79, 11.22, 11.75, 11.01, 12.03, 11.96, 11.8, 11, 11.41, 11.98, 11.55, 11.31, 11.6, 11.46, 10.81, 11.43, 11.06, 12, 11.09, 12.14, 12.25, 11.43, 11.73, 12.74, 11.45, 12.25, 11.15, 11.85, 11.73, 11.45, 11.16, 11.58, 11.42, 11, 12.23, 11.95, 11.79)
m200=c(27.1, 24.49, 22.94, 24.22, 21.83, 22.35, 22.39, 23.54, 23.57, 25.08, 23, 22.21, 25, 22.81, 24.54, 23.62, 24.28, 24.05, 22.62, 24.46, 22.39, 24.96, 24.6, 23.98, 22.25, 23.04, 24.44, 23.13, 23.17, 24, 23.05, 21.71, 23.51, 22.19, 24.52, 21.97, 24.47, 25.07, 23.09, 24, 25.85, 23.31, 25.78, 22.59, 24.24, 23.88, 23.06, 22.82, 23.31, 23.52, 22.13, 24.21, 24.41, 24.08)
m400=c(60.4, 55.7, 54.5, 54.3, 50.62, 51.08, 50.14, 54.6, 54.9, 58.1, 52.01, 49.29, 55.08, 52.38, 56.09, 53.76, 53.6, 56.05, 52.5, 55.8, 49.75, 56.1, 58.25, 53.59, 50.06, 52, 56.45, 51.6, 52.8, 53.26, 53.3, 48.16, 53.24, 49.19, 54.9, 47.99, 55, 56.96, 50.62, 53.73, 58.73, 53.11, 51.2, 51.73, 55.34, 52.7, 51.5, 51.79, 53.12, 53.6, 50.46, 55.09, 54.97, 54.93)
m800=c(2.3, 2.15, 2.15, 2.09, 1.96, 1.98, 2.03, 2.19, 2.1, 2.27, 1.96, 1.95, 2.12, 1.99, 2.28, 2.04, 2.1, 2.24, 2.1, 2.2, 1.95, 2.07, 2.21, 2.05, 2, 2, 2.15, 2.02, 2.1, 2.11, 2.16, 1.93, 2.05, 1.89, 2.05, 1.89, 2.18, 2.24, 1.99, 2.09, 2.33, 2.02, 1.97, 2, 2.22, 2, 2.01, 2.02, 2.03, 2.03, 1.98, 2.19, 2.08, 2.07)
m1500=c(4.84, 4.42, 4.43, 4.16, 3.95, 4.13, 4.1, 4.6, 4.25, 4.79, 3.98, 3.99, 4.52, 4.06, 4.86, 4.25, 4.32, 4.74, 4.38, 4.72, 4.03, 4.38, 4.68, 4.14, 4.06, 4.14, 4.37, 4.18, 4.49, 4.35, 4.58, 3.96, 4.11, 3.87, 4.23, 4.14, 4.45, 4.84, 4.22, 4.35, 5.81, 4.07, 4.25, 4.14, 4.61, 4.15, 4.14, 4.12, 4.01, 4.18, 4.03, 4.69, 4.33, 4.35)
m3000=c(11.1, 9.62, 9.79, 8.84, 8.5, 9.08, 8.92, 10.16, 9.37, 10.9, 8.63, 8.97, 9.94, 9.01, 10.54, 9.59, 9.98, 9.89, 9.63, 10.28, 8.59, 9.64, 10.43, 9.02, 8.81, 8.88, 9.38, 8.76, 9.77, 9.46, 9.81, 8.75, 8.89, 8.45, 9.37, 8.92, 9.51, 10.69, 9.34, 9.2, 13.04, 8.77, 9.35, 8.98, 10.02, 9.2, 8.98, 8.84, 8.53, 8.71, 8.62, 10.46, 9.31, 9.87)
marathon=c(233.22, 164.65, 178.52, 151.2, 142.72, 152.37, 154.23, 200.37, 160.48, 261.13, 151.82, 160.82, 182.77, 152.48, 215.08, 158.53, 188.03, 203.88, 177.87, 168.45, 148.53, 174.68, 171.8, 162.6, 149.45, 157.85, 201.08, 145.48, 168.75, 165.42, 169.98, 157.68, 149.38, 151.22, 171.38, 158.85, 191.02, 233, 159.37, 150.5, 306, 153.42, 179.17, 155.27, 201.28, 181.05, 156.37, 154.48, 145.48, 151.75, 149.72, 182.17, 168.48, 182.2)
country=c('cookis', 'korea', 'argentin', 'portugal', 'usa', 'australi', 'finland', 'philippi', 'israel', 'mauritiu', 'italy', 'poland', 'singapor', 'netherla', 'guatemal', 'mexico', 'india', 'domrep', 'taipei', 'thailand', 'frg', 'luxembou', 'costa', 'spain', 'canada', 'belgium', 'turkey', 'nz', 'brazil', 'columbia', 'bermuda', 'gdr', 'ireland', 'ussr', 'chile', 'czech', 'burma', 'png', 'austria', 'japan', 'wsamoa', 'switzerl', 'dprkorea', 'france', 'indonesi', 'kenya', 'hungary', 'sweden', 'norway', 'denmark', 'gbni', 'malaysia', 'china', 'greece')
Make a data.frame with
track=data.frame(m100,m200,m400,m800,m1500,m3000,marathon)
Principal component analysis is to be applied
using both the sample covariance and sample correlation matrix of the
data as given,
also principal component analysis is applied to the velocity or speed data
(with unit of m/s).
To convert a record time in seconds to m/s, take the distance in metres
and divide by the record time.
To convert a record time in minutes to m/s, take the distance in metres
and divide by the record time and then divide by 60.
For the interpretation questions, you may want to plot the scores
of the first two principal components with an abbreviated country name
as a plotting symbol.
Part a)
The number of principal components to achieve 90% of the variation is
for the covariance matrix of record times
for the correlation matrix of record times
for the covariance matrix of velocities
Part b)
The absolute value of the coefficient of marathon in the
first
principal component is:
for the covariance matrix of record times
for the correlation matrix of record times
for the covariance matrix of velocities
Part c)
For the covariance matrix of record times, the interpretation of the
first principal component (linear combination with most variation) is:
(There can be more than one correct answer)
Part d)
For the correlation matrix of record times, the interpretation of the
first principal component is:
(There can be more than one correct answer)
Part e)
For the covariance matrix of velocities, the interpretation of the
second principal component is:
(There can be more than one correct answer)
Part f)
Why is it appropriate to do principal component analysis on the covariance
matrix of the
velocities but not the covariance matrix of the actual record times.
For the velocity data, which of the following are relevant for
principal component analysis?
(There can be more than one correct answer)
Hint:
You can earn partial credit on this problem.