Lancaster University
You can run R in various ways...
>
This is where you type stuff
+
Keep typing stuff
This is where you type stuff you only want to type once
> 2 + 2
[1] 4
> 1 + 2 * 3/4^5
[1] 1.006
> pi * 2
[1] 6.283
> sqrt(2)
[1] 1.414
> exp(2)
[1] 7.389
> log(7.389) # base e
[1] 2
> exp(log(2))
[1] 2
> log10(1000)
[1] 3
> sin(pi/3)
[1] 0.866
> cos(pi/3)
[1] 0.5
> tan(pi/3)
[1] 1.732
> a = sqrt(4) > b = 3 > c = -2 > d = b^2 - 4 * a * c > r1 = (-b + sqrt(d))/(2 * a) > r2 = (-b - sqrt(d))/(2 * a)
> ls()
[1] "a" "b" "c" "d" "r1" "r2"
> r1
[1] 0.5
> r2
[1] -2
How do I get help on R things?
help(sqrt)
- gets the help page for a function - ?sqrt
for short.help.search("model")
- searches the help pageshelp.start()
- interactive help systemYou can save objects
> save(a, b, c, r1, r2, file = "output.rda") > rm(a, b, c, r1, r2) > r1
Error: object 'r1' not found
> load("output.rda") > r1
[1] 0.5
> r2
[1] -2
R has save.image()
which saves all your objects to a file called .RData
R will offer to do this when you quit: q()
R will reload .RData
when you start in that folder.
Suppose we have sea otter weights in kg for males and females in two files:
Data/males.dat
27 28 39 32 26 28 25 42 28 38
Data/females.dat
22 25 24 31 26 30 14 17 21 30
We want to test if the sexes have different mean weights - a classic two-sample t-test. But first:
scan
reads space-separated numbers from a file and returns...
> m = scan("./Data/males.dat") > f = scan("./Data/females.dat") > m
[1] 27 28 39 32 26 28 25 42 28 38
> f
[1] 22 25 24 31 26 30 14 17 21 30
...one-dimensional vectors.
> length(m)
[1] 10
> length(f)
[1] 10
> summary(m)
Min. 1st Qu. Median Mean 3rd Qu. Max. 25.0 27.2 28.0 31.3 36.5 42.0
> summary(f)
Min. 1st Qu. Median Mean 3rd Qu. Max. 14.0 21.2 24.5 24.0 29.0 31.0
Plotting functions generally have the side effect of making a graphic:
> hist(m)
> hist(f)
> boxplot(m, f)
> boxplot(m, f, names = c("males", "females"), main = "Sea Otters", ylab = "Weight/kg", xlab = "Sex")
That loads one set of data in
Now repeat for the second file. Or save the format. Or edit and run the 'syntax'
Now we have two datasets.
Can't see how to work across datasets, so lets combine - cut n paste:
Now, all I can do with this is a paired t-test
But that's wrong! These are not paired observations! Need to get my data in 'long' form.
Some more cut n paste action later...
And my data looks like this:
Then I can use the independent sample t-test with Sex as grouping:
And get some correct output!
Three lines
> m = scan("./Data/males.dat") > f = scan("./Data/females.dat") > t.test(m, f)
Welch Two Sample t-test data: m and f t = 2.768, df = 17.89, p-value = 0.01274 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.756 12.844 sample estimates: mean of x mean of y 31.3 24.0
Note: if typing filenames is too much bother, on Windows:
> m = scan(file.choose())
What if the data is already in long form, ready for SPSS?
weight,sex 22,female 28,male 32,male 25,female 24,female 31,female 26,female 30,female 14,female
Use read.csv
(or read.table
) to input
> otters = read.csv("./Data/otterweight.csv") > summary(otters)
weight sex Min. :14.0 female:10 1st Qu.:24.8 male :10 Median :27.5 Mean :27.6 3rd Qu.:30.2 Max. :42.0
> otters
weight sex 1 22 female 2 28 male 3 32 male 4 25 female 5 24 female 6 31 female 7 26 female 8 30 female 9 14 female 10 17 female 11 21 female 12 30 female 13 27 male 14 39 male 15 26 male 16 28 male 17 25 male 18 42 male 19 28 male 20 38 male
> head(otters)
weight sex 1 22 female 2 28 male 3 32 male 4 25 female 5 24 female 6 31 female
> names(otters)
[1] "weight" "sex"
> dim(otters)
[1] 20 2
This is a data frame...
You can extract columns:
> # by column number
> otters[, 1]
[1] 22 28 32 25 24 31 26 30 14 17 21 30 27 39 26 28 25 [18] 42 28 38
> # by name
> otters$weight
[1] 22 28 32 25 24 31 26 30 14 17 21 30 27 39 26 28 25 [18] 42 28 38
> # and its just a vector
> otters$weight > 36
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [9] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE [17] FALSE TRUE FALSE TRUE
You can subset rows of data frames to get smaller data frames:
> # by row number
> otters[1, ]
weight sex 1 22 female
> otters[2:4, ]
weight sex 2 28 male 3 32 male 4 25 female
> # by true/false values:
> otters[otters$weight > 36, ]
weight sex 14 39 male 18 42 male 20 38 male
> # or use subset: > subset(otters, otters$weight > 36)
weight sex 14 39 male 18 42 male 20 38 male
> subset(otters, sex == "male")
weight sex 2 28 male 3 32 male 13 27 male 14 39 male 15 26 male 16 28 male 17 25 male 18 42 male 19 28 male 20 38 male
I could subset and do what I did before
> t.test(subset(otters, sex == "male")$weight, subset(otters, sex == "female")$weight)
Welch Two Sample t-test data: subset(otters, sex == "male")$weight and subset(otters, sex == "female")$weight t = 2.768, df = 17.89, p-value = 0.01274 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.756 12.844 sample estimates: mean of x mean of y 31.3 24.0
That works, but its ugly - data frames give us a nicer way...
Use a formula notation:
> t.test(weight ~ sex, otters)
Welch Two Sample t-test data: weight by sex t = -2.768, df = 17.89, p-value = 0.01274 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -12.844 -1.756 sample estimates: mean in group female mean in group male 24.0 31.3
> boxplot(weight ~ sex, otters, col = c("pink", "cyan"), ylab = "Weight/kg")
Note how the axis labels come from the data
If we put these commands in a new file, called ottertest.R
data = read.csv(filename) print(summary(data)) boxplot(weight ~ sex, data) t.test(weight ~ sex, data)
Then we can repeat the analysis on a new file...
> filename = "newdata.csv" > source("ottertest.R")
You could automate this to hundreds of data files with a loop...
Using the knitr
package: