The data file
./Data/moult.csv contains monthly values
of hair samples collected from the drain of a captive sea otter's pool
over a year. What a lovely job that must have been. This file has been
created from the plot on p40 of Kenyon's The Sea Otter in The Eastern
The plan here is to read in and recreate that plot, something like this:
showing the summer peak of moulting and the relative rates of moult during summer and winter.
If you feel confident, try reading the data in and making a plot. There are a few pitfalls to getting this exact plot, but achieving something similar might be simple.
Otherwise read on and try the commands given, and also explore other ideas and suggestions.
First we take a look at the data file.
month, shed jan,23 feb,38 mar,32 apr,46 may,54 jun,57 jul,54 aug,58 sep,50 oct,52 nov,44 dec,34
This will read in using
read.csv as before. So we do that, and have a quick look:
moult = read.csv("./Data/moult.csv") summary(moult)
month shed apr :1 Min. :23.0 aug :1 1st Qu.:37.0 dec :1 Median :48.0 feb :1 Mean :45.2 jan :1 3rd Qu.:54.0 jul :1 Max. :58.0 (Other):6
month shed 1 jan 23 2 feb 38 3 mar 32 4 apr 46 5 may 54 6 jun 57
 jan feb mar apr may jun jul aug sep oct nov dec Levels: apr aug dec feb jan jul jun mar may nov oct sep
Our month variable is an unordered categorical factor with 12 levels. Although the months are in the right order the labels are alphabetical which can mess things up.
Explore: Try making some plots using the month and see what happens....
To work round this for now, just assign a numeric month to each record. Then we can plot it either with base or ggplot graphics
moult$m = 1:12 plot(moult$m, moult$shed)
require(ggplot2) qplot(m, shed, data = moult)
Read Up: Check the help for the base graphics plot function and discover how to change the labels on the x and y axes
Kenyon fits what looks like a quadratic model to the points. The data are certainly not linear. Let's duplicate that with ggplot graphics
g = ggplot(moult, aes(x = m, y = shed)) g + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x, 2))
We can fit a quadratic polynomial and look at the fit information
m1 = lm(shed ~ poly(m, 2), data = moult) summary(m1)
Call: lm(formula = shed ~ poly(m, 2), data = moult) Residuals: Min 1Q Median 3Q Max -8.666 -1.820 0.576 2.333 5.522 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 45.17 1.18 38.41 2.7e-11 *** poly(m, 2)1 14.38 4.07 3.53 0.0064 ** poly(m, 2)2 -31.90 4.07 -7.83 2.6e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 4.07 on 9 degrees of freedom Multiple R-squared: 0.891, Adjusted R-squared: 0.867 F-statistic: 36.9 on 2 and 9 DF, p-value: 4.6e-05
By using the
predict function to evaluate the model at
new x-coordinates we can add a line to the scatterplot:
d = data.frame(m = seq(1, 12, len = 50)) plot(moult$m, moult$shed) lines(d$m, predict(m1, d))
The otter moult is expected to be a seasonal continuous process. So the quadratic model is clearly wrong. Let's fit a periodic function.
We do this by scaling our x-variable to be from 0 to 2*pi and specifying both sine and cosine terms in the formula:
mP = lm(shed ~ sin(2 * pi * (m - 1)/12) + cos(2 * pi * (m - 1)/12), moult) plot(moult$m, moult$shed) lines(d$m, predict(mP, d))
We can check our residuals for serial correlation and normality with some plots
qqnorm(scale(residuals(mP))) abline(0, 1)
Then we can make a pretty plot in ggplot for our paper!
g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 * pi * (x - 1)/12) + cos(2 * pi * (x - 1)/12))
Now let's really nail it and use dates. Add a new column to the data frame and make a simple plot.
moult$date = seq(as.Date("1963/1/1"), as.Date("1963/12/1"), "month") plot(moult$date, moult$shed)
Now we have dates, we can do the periodic fit but this time we have to scale by 365 since the underlying numbers for R's dates are days.
g = ggplot(moult, aes(x = date, y = shed)) g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 * pi * (x - 1)/365) + cos(2 * pi * (x - 1)/365))