Moulting Data

The data file ./Data/moult.csv contains monthly values of hair samples collected from the drain of a captive sea otter's pool over a year. What a lovely job that must have been. This file has been created from the plot on p40 of Kenyon's The Sea Otter in The Eastern Pacific Ocean.

The plan here is to read in and recreate that plot, something like this:

plot of chunk unnamed-chunk-1

showing the summer peak of moulting and the relative rates of moult during summer and winter.

Go it alone

If you feel confident, try reading the data in and making a plot. There are a few pitfalls to getting this exact plot, but achieving something similar might be simple.

Otherwise read on and try the commands given, and also explore other ideas and suggestions.

Walkthrough

First we take a look at the data file.

month, shed 
 jan,23 
 feb,38 
 mar,32 
 apr,46 
 may,54 
 jun,57 
 jul,54 
 aug,58 
 sep,50 
 oct,52 
 nov,44 
 dec,34 

This will read in using read.csv as before. So we do that, and have a quick look:

 moult = read.csv("./Data/moult.csv")
 summary(moult)
     month        shed     
 apr    :1   Min.   :23.0  
 aug    :1   1st Qu.:37.0  
 dec    :1   Median :48.0  
 feb    :1   Mean   :45.2  
 jan    :1   3rd Qu.:54.0  
 jul    :1   Max.   :58.0  
 (Other):6                 
 head(moult)
  month shed
1   jan   23
2   feb   38
3   mar   32
4   apr   46
5   may   54
6   jun   57
 moult$month
 [1] jan feb mar apr may jun jul aug sep oct nov dec
Levels: apr aug dec feb jan jul jun mar may nov oct sep

Our month variable is an unordered categorical factor with 12 levels. Although the months are in the right order the labels are alphabetical which can mess things up.

Explore: Try making some plots using the month and see what happens....

To work round this for now, just assign a numeric month to each record. Then we can plot it either with base or ggplot graphics

 moult$m = 1:12
 plot(moult$m, moult$shed)
plot of chunk unnamed-chunk-4
 require(ggplot2)
 qplot(m, shed, data = moult)
plot of chunk unnamed-chunk-4

Read Up: Check the help for the base graphics plot function and discover how to change the labels on the x and y axes

Model fitting

Kenyon fits what looks like a quadratic model to the points. The data are certainly not linear. Let's duplicate that with ggplot graphics

 g = ggplot(moult, aes(x = m, y = shed))
 g + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x, 
     2))
plot of chunk unnamed-chunk-5

Fitting with lm

We can fit a quadratic polynomial and look at the fit information

 m1 = lm(shed ~ poly(m, 2), data = moult)
 summary(m1)
Call:
lm(formula = shed ~ poly(m, 2), data = moult)

Residuals:
   Min     1Q Median     3Q    Max 
-8.666 -1.820  0.576  2.333  5.522 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    45.17       1.18   38.41  2.7e-11 ***
poly(m, 2)1    14.38       4.07    3.53   0.0064 ** 
poly(m, 2)2   -31.90       4.07   -7.83  2.6e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 4.07 on 9 degrees of freedom
Multiple R-squared: 0.891,	Adjusted R-squared: 0.867 
F-statistic: 36.9 on 2 and 9 DF,  p-value: 4.6e-05 

Plotting data and model

By using the predict function to evaluate the model at new x-coordinates we can add a line to the scatterplot:

 d = data.frame(m = seq(1, 12, len = 50))
 plot(moult$m, moult$shed)
 lines(d$m, predict(m1, d))
plot of chunk unnamed-chunk-7

Fitting a periodic function

The otter moult is expected to be a seasonal continuous process. So the quadratic model is clearly wrong. Let's fit a periodic function.

We do this by scaling our x-variable to be from 0 to 2*pi and specifying both sine and cosine terms in the formula:

 mP = lm(shed ~ sin(2 * pi * (m - 1)/12) + cos(2 * pi * (m - 1)/12), 
     moult)
 plot(moult$m, moult$shed)
 lines(d$m, predict(mP, d))
plot of chunk unnamed-chunk-8

Residuals

We can check our residuals for serial correlation and normality with some plots

 plot(1:12, scale(residuals(mP)))
plot of chunk unnamed-chunk-9
 qqnorm(scale(residuals(mP)))
 abline(0, 1)
plot of chunk unnamed-chunk-9

Pretty plot

Then we can make a pretty plot in ggplot for our paper!

 g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 * 
     pi * (x - 1)/12) + cos(2 * pi * (x - 1)/12))
plot of chunk unnamed-chunk-10

Using Dates

Now let's really nail it and use dates. Add a new column to the data frame and make a simple plot.

 moult$date = seq(as.Date("1963/1/1"), as.Date("1963/12/1"), "month")
 plot(moult$date, moult$shed)
plot of chunk dateit

Now we have dates, we can do the periodic fit but this time we have to scale by 365 since the underlying numbers for R's dates are days.

 g = ggplot(moult, aes(x = date, y = shed))
 g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 * 
     pi * (x - 1)/365) + cos(2 * pi * (x - 1)/365))
plot of chunk unnamed-chunk-11