# Moulting Data

The data file `./Data/moult.csv` contains monthly values of hair samples collected from the drain of a captive sea otter's pool over a year. What a lovely job that must have been. This file has been created from the plot on p40 of Kenyon's The Sea Otter in The Eastern Pacific Ocean.

The plan here is to read in and recreate that plot, something like this: showing the summer peak of moulting and the relative rates of moult during summer and winter.

# Go it alone

If you feel confident, try reading the data in and making a plot. There are a few pitfalls to getting this exact plot, but achieving something similar might be simple.

Otherwise read on and try the commands given, and also explore other ideas and suggestions.

# Walkthrough

First we take a look at the data file.

```month, shed
jan,23
feb,38
mar,32
apr,46
may,54
jun,57
jul,54
aug,58
sep,50
oct,52
nov,44
dec,34
```

This will read in using `read.csv` as before. So we do that, and have a quick look:

``` moult = read.csv("./Data/moult.csv")
summary(moult)
```
```     month        shed
apr    :1   Min.   :23.0
aug    :1   1st Qu.:37.0
dec    :1   Median :48.0
feb    :1   Mean   :45.2
jan    :1   3rd Qu.:54.0
jul    :1   Max.   :58.0
(Other):6
```
``` head(moult)
```
```  month shed
1   jan   23
2   feb   38
3   mar   32
4   apr   46
5   may   54
6   jun   57
```
``` moult\$month
```
```  jan feb mar apr may jun jul aug sep oct nov dec
Levels: apr aug dec feb jan jul jun mar may nov oct sep
```

Our month variable is an unordered categorical factor with 12 levels. Although the months are in the right order the labels are alphabetical which can mess things up.

Explore: Try making some plots using the month and see what happens....

To work round this for now, just assign a numeric month to each record. Then we can plot it either with base or ggplot graphics

``` moult\$m = 1:12
plot(moult\$m, moult\$shed)
``` ``` require(ggplot2)
qplot(m, shed, data = moult)
``` Read Up: Check the help for the base graphics plot function and discover how to change the labels on the x and y axes

## Model fitting

Kenyon fits what looks like a quadratic model to the points. The data are certainly not linear. Let's duplicate that with ggplot graphics

``` g = ggplot(moult, aes(x = m, y = shed))
g + geom_point() + stat_smooth(method = "lm", formula = y ~ poly(x,
2))
``` ## Fitting with `lm`

We can fit a quadratic polynomial and look at the fit information

``` m1 = lm(shed ~ poly(m, 2), data = moult)
summary(m1)
```
```Call:
lm(formula = shed ~ poly(m, 2), data = moult)

Residuals:
Min     1Q Median     3Q    Max
-8.666 -1.820  0.576  2.333  5.522

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)    45.17       1.18   38.41  2.7e-11 ***
poly(m, 2)1    14.38       4.07    3.53   0.0064 **
poly(m, 2)2   -31.90       4.07   -7.83  2.6e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.07 on 9 degrees of freedom
Multiple R-squared: 0.891,	Adjusted R-squared: 0.867
F-statistic: 36.9 on 2 and 9 DF,  p-value: 4.6e-05
```

## Plotting data and model

By using the `predict` function to evaluate the model at new x-coordinates we can add a line to the scatterplot:

``` d = data.frame(m = seq(1, 12, len = 50))
plot(moult\$m, moult\$shed)
lines(d\$m, predict(m1, d))
``` ## Fitting a periodic function

The otter moult is expected to be a seasonal continuous process. So the quadratic model is clearly wrong. Let's fit a periodic function.

We do this by scaling our x-variable to be from 0 to 2*pi and specifying both sine and cosine terms in the formula:

``` mP = lm(shed ~ sin(2 * pi * (m - 1)/12) + cos(2 * pi * (m - 1)/12),
moult)
plot(moult\$m, moult\$shed)
lines(d\$m, predict(mP, d))
``` ## Residuals

We can check our residuals for serial correlation and normality with some plots

``` plot(1:12, scale(residuals(mP)))
``` ``` qqnorm(scale(residuals(mP)))
abline(0, 1)
``` ## Pretty plot

Then we can make a pretty plot in ggplot for our paper!

``` g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 *
pi * (x - 1)/12) + cos(2 * pi * (x - 1)/12))
``` ## Using Dates

Now let's really nail it and use dates. Add a new column to the data frame and make a simple plot.

``` moult\$date = seq(as.Date("1963/1/1"), as.Date("1963/12/1"), "month")
plot(moult\$date, moult\$shed)
``` Now we have dates, we can do the periodic fit but this time we have to scale by 365 since the underlying numbers for R's dates are days.

``` g = ggplot(moult, aes(x = date, y = shed))
g + geom_point() + stat_smooth(method = "lm", formula = y ~ sin(2 *
pi * (x - 1)/365) + cos(2 * pi * (x - 1)/365))
``` 