# R - the basics

## Basics

• Starting
• Simple Arithmetic
• Save your work
• Simple Graphics
• Writing Functions

## Start

You can run R in various ways...

## The prompt

```>
```

This is where you type stuff

## The other prompt

```+
```

Keep typing stuff

## The Editor (in RStudio)

This is where you type stuff you only want to type once

## Type Some Stuff

```> 2 + 2
```
```[1] 4
```
```> 1 + 2 * 3/4^5
```
```[1] 1.006
```
```> pi * 2
```
```[1] 6.283
```

## Scientific functions

```> sqrt(2)
```
```[1] 1.414
```
```> exp(2)
```
```[1] 7.389
```
```> log(7.389)  # base e
```
```[1] 2
```
```> exp(log(2))
```
```[1] 2
```
```> log10(1000)
```
```[1] 3
```
```> sin(pi/3)
```
```[1] 0.866
```
```> cos(pi/3)
```
```[1] 0.5
```
```> tan(pi/3)
```
```[1] 1.732
```

## Assigning Results

```> a = sqrt(4)
> b = 3
> c = -2
> d = b^2 - 4 * a * c
> r1 = (-b + sqrt(d))/(2 * a)
> r2 = (-b - sqrt(d))/(2 * a)
```

## Showing Results

```> ls()
```
```[1] "a"  "b"  "c"  "d"  "r1" "r2"
```
```> r1
```
```[1] 0.5
```
```> r2
```
```[1] -2
```

## Help!

How do I get help on R things?

• `help(sqrt)` - gets the help page for a function - `?sqrt` for short.
• `help.search("model")` - searches the help pages
• `help.start()` - interactive help system
• RDocumentation.org - nice new R help site
• The R-Help mailing list
• Other R mailing lists (R-sig-Ecology, R-sig-Geo) for specialists
• StackOverflow for programming questions
• Cross-Validated for stats questions

## Saving Objects

You can save objects

```> save(a, b, c, r1, r2, file = "output.rda")
> rm(a, b, c, r1, r2)
> r1
```
```Error: object 'r1' not found
```
```> load("output.rda")
> r1
```
```[1] 0.5
```
```> r2
```
```[1] -2
```

R has `save.image()` which saves all your objects to a file called `.RData`

R will offer to do this when you quit: `q()`

R will reload `.RData` when you start in that folder.

## Sea Otter Weight Data

Suppose we have sea otter weights in kg for males and females in two files:

`Data/males.dat`
```27 28 39 32 26 28 25 42 28 38

```
`Data/females.dat`
```22 25 24 31 26 30 14 17 21 30

```

We want to test if the sexes have different mean weights - a classic two-sample t-test. But first:

• Always have a look at your data!

## Read In Data

`scan` reads space-separated numbers from a file and returns...

```> m = scan("./Data/males.dat")
> f = scan("./Data/females.dat")
> m
```
``` [1] 27 28 39 32 26 28 25 42 28 38
```
```> f
```
``` [1] 22 25 24 31 26 30 14 17 21 30
```

...one-dimensional vectors.

## Vectors

```> length(m)
```
```[1] 10
```
```> length(f)
```
```[1] 10
```
```> summary(m)
```
```   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
25.0    27.2    28.0    31.3    36.5    42.0
```
```> summary(f)
```
```   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
14.0    21.2    24.5    24.0    29.0    31.0
```

## Histograms, boxplots

Plotting functions generally have the side effect of making a graphic:

```> hist(m)
```
```> hist(f)
```
```> boxplot(m, f)
```
```> boxplot(m, f, names = c("males",
"females"), main = "Sea Otters",
ylab = "Weight/kg", xlab = "Sex")
```

## A Tale of Two t-tests

That loads one set of data in

Now repeat for the second file. Or save the format. Or edit and run the 'syntax'

Now we have two datasets.

Can't see how to work across datasets, so lets combine - cut n paste:

Now, all I can do with this is a paired t-test

But that's wrong! These are not paired observations! Need to get my data in 'long' form.

Some more cut n paste action later...

And my data looks like this:

Then I can use the independent sample t-test with Sex as grouping:

And get some correct output!

## In R

Three lines

```> m = scan("./Data/males.dat")
> f = scan("./Data/females.dat")
> t.test(m, f)
```
```	Welch Two Sample t-test

data:  m and f
t = 2.768, df = 17.89, p-value = 0.01274
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.756 12.844
sample estimates:
mean of x mean of y
31.3      24.0
```

Note: if typing filenames is too much bother, on Windows:

```> m = scan(file.choose())
```

## Unfair!

What if the data is already in long form, ready for SPSS?

```weight,sex
22,female
28,male
32,male
25,female
24,female
31,female
26,female
30,female
14,female

```

Use `read.csv` (or `read.table`) to input

```> otters = read.csv("./Data/otterweight.csv")
> summary(otters)
```
```     weight         sex
Min.   :14.0   female:10
1st Qu.:24.8   male  :10
Median :27.5
Mean   :27.6
3rd Qu.:30.2
Max.   :42.0
```
```> otters
```
```   weight    sex
1      22 female
2      28   male
3      32   male
4      25 female
5      24 female
6      31 female
7      26 female
8      30 female
9      14 female
10     17 female
11     21 female
12     30 female
13     27   male
14     39   male
15     26   male
16     28   male
17     25   male
18     42   male
19     28   male
20     38   male
```
```> head(otters)
```
```  weight    sex
1     22 female
2     28   male
3     32   male
4     25 female
5     24 female
6     31 female
```
```> names(otters)
```
```[1] "weight" "sex"
```
```> dim(otters)
```
```[1] 20  2
```

This is a data frame...

## Data Frames - use when:

• Regular, tabular data
• A row is a record
• A column is a measurement
• Much like a spreadsheet grid - but no formulae!

## Slicing and dicing

You can extract columns:

```> # by column number
> otters[, 1]
```
``` [1] 22 28 32 25 24 31 26 30 14 17 21 30 27 39 26 28 25
[18] 42 28 38
```
```> # by name
> otters\$weight
```
``` [1] 22 28 32 25 24 31 26 30 14 17 21 30 27 39 26 28 25
[18] 42 28 38
```
```> # and its just a vector
> otters\$weight > 36
```
``` [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[9] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
[17] FALSE  TRUE FALSE  TRUE
```

You can subset rows of data frames to get smaller data frames:

```> # by row number
> otters[1, ]
```
```  weight    sex
1     22 female
```
```> otters[2:4, ]
```
```  weight    sex
2     28   male
3     32   male
4     25 female
```
```> # by true/false values:
> otters[otters\$weight > 36, ]
```
```   weight  sex
14     39 male
18     42 male
20     38 male
```
```> # or use subset:
> subset(otters, otters\$weight >
36)
```
```   weight  sex
14     39 male
18     42 male
20     38 male
```
```> subset(otters, sex == "male")
```
```   weight  sex
2      28 male
3      32 male
13     27 male
14     39 male
15     26 male
16     28 male
17     25 male
18     42 male
19     28 male
20     38 male
```

## t-test from a data frame

I could subset and do what I did before

```> t.test(subset(otters, sex == "male")\$weight,
subset(otters, sex == "female")\$weight)
```
```	Welch Two Sample t-test

data:  subset(otters, sex == "male")\$weight and subset(otters, sex == "female")\$weight
t = 2.768, df = 17.89, p-value = 0.01274
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.756 12.844
sample estimates:
mean of x mean of y
31.3      24.0
```

That works, but its ugly - data frames give us a nicer way...

## t-test with a data frame

Use a formula notation:

```> t.test(weight ~ sex, otters)
```
```	Welch Two Sample t-test

data:  weight by sex
t = -2.768, df = 17.89, p-value = 0.01274
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.844  -1.756
sample estimates:
mean in group female   mean in group male
24.0                 31.3
```

## Data frame boxplots

```> boxplot(weight ~ sex, otters, col = c("pink",
"cyan"), ylab = "Weight/kg")
```

Note how the axis labels come from the data

## Wrapping This Up

If we put these commands in a new file, called `ottertest.R`

```data = read.csv(filename)
print(summary(data))
boxplot(weight ~ sex, data)
t.test(weight ~ sex, data)
```

Then we can repeat the analysis on a new file...

```> filename = "newdata.csv"
> source("ottertest.R")
```

You could automate this to hundreds of data files with a loop...

## Ultimate Report Writing

Using the `knitr` package:

• You can put chunks of plain R code into a document
• Process the document - run the code, create outputs
• Generate a new document with results and embedded plots
• The output may be text, Web page, LaTeX, PDF - maybe Word?!
• All these web pages have been made with it!

## Thoughts

• Don't fear the command line
• Use your environment's features to help
• The power of a programming language
• Repeatability
• Repeatability