Lancaster University
S
Simple scalar (single) values
can be numbers, character strings, TRUE/FALSE, date... any 'atomic' value
Can be numeric
Or character string
Or true/false
Or any "atomic" value
2-d matrix
3-d array
Data Frame
Scalars
> x = 1
> x = 1.2
> x = "Hello World"
> x = TRUE
Vectors
> v = c(1, 4, 9, 16)
> v
[1] 1 4 9 16
> v[3]
[1] 9
Matrices and arrays
> m = matrix(c(1, 2, 3, 4, 5, 6), ncol = 2) > m
[,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6
> m[, 2]
[1] 4 5 6
> a = array(1:24, dim = c(2, 3, 4)) > a[1, , ]
[,1] [,2] [,3] [,4] [1,] 1 7 13 19 [2,] 3 9 15 21 [3,] 5 11 17 23
> a[1, 2, ]
[1] 3 9 15 21
> a[1, 2, 3]
[1] 15
> a
, , 1 [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12 , , 3 [,1] [,2] [,3] [1,] 13 15 17 [2,] 14 16 18 , , 4 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24
> d = data.frame(x = 1:5, n = c("a", "b", "b", "c", "d")) > d
x n 1 1 a 2 2 b 3 3 b 4 4 c 5 5 d
> d$x
[1] 1 2 3 4 5
> d[, 1]
[1] 1 2 3 4 5
> d[2:3, ]
x n 2 2 b 3 3 b
Suppose each person can take a test any number of times...
> e1 = list(name = "Fred", scores = c(23, 74, 12)) > e1
$name [1] "Fred" $scores [1] 23 74 12
> names(e1)
[1] "name" "scores"
> e1$name
[1] "Fred"
> mean(e1$scores)
[1] 36.33
> mean(e1[[2]])
[1] 36.33
> e2 = list(name = "Joe", scores = c(27, 65, 17, 19, 32)) > exams = list(e1, e2) > exams[[1]]
$name [1] "Fred" $scores [1] 23 74 12
> exams[[1]]$name
[1] "Fred"
> for (i in c(1, 2, 3, 4, 5)) { cat("square root of ", i, " is ", sqrt(i), "\n") }
square root of 1 is 1 square root of 2 is 1.414 square root of 3 is 1.732 square root of 4 is 2 square root of 5 is 2.236
In some programming languages you loop a lot.
> # divide every element by two > x = c(1, 2, 4, 8, 16) > for (i in 1:5) { x[i] = x[i]/2 } > x
[1] 0.5 1.0 2.0 4.0 8.0
But in R many basic operations don't need it.
> x = c(1, 2, 4, 8, 16)
> x = x/2
> x
[1] 0.5 1.0 2.0 4.0 8.0
> for (i in 1:100) { if (sqrt(i) == as.integer(sqrt(i))) { cat("sqrt(", i, ") is integer\n") } }
sqrt( 1 ) is integer sqrt( 4 ) is integer sqrt( 9 ) is integer sqrt( 16 ) is integer sqrt( 25 ) is integer sqrt( 36 ) is integer sqrt( 49 ) is integer sqrt( 64 ) is integer sqrt( 81 ) is integer sqrt( 100 ) is integer
> random = runif(1) # one random number > if (random > 0.5) { cat("Heads you win!\n") } else { cat("Tails you lose!\n") }
Heads you win!
> count = 0 > while (runif(1) < 0.99) { count = count + 1 } > cat("Took ", count, " iterations\n")
Took 128 iterations
> quadrat = function(x, a, b, c) { return(a * x^2 + b * x + c) } > quadrat(c(1, 2, 3, 4), 1, 0.5, -2)
[1] -0.5 3.0 8.5 16.0
> qsolve = function(a, b, c) { det = b^2 - 4 * a * c if (det < 0) { stop("Complex roots") } rplus = (-b + sqrt(det))/(2 * a) rminus = (-b - sqrt(det))/(2 * a) return(c(rplus, rminus)) } > qsolve(1, 0.5, -2)
[1] 1.186 -1.686
> quadrat(qsolve(1, 0.5, -2), 1, 0.5, -2)
[1] 0 0
> qsolve(3, -2, 1)
Error: Complex roots
A function:
> args(log) # inspect arguments
function (x, base = exp(1)) NULL
> log(100) # default base-e natural log
[1] 4.605
> log(100, base = 10) # name match
[1] 2
> log(100, 10) # position match
[1] 2
> log(100, b = 10) # partial name
[1] 2
> log(100, z = 10) # wrong
Error: unused argument(s) (z = 10)
> log(base = 10, 100) # args backwards
[1] 2
help(log)
help page for every function should
explain its argumentsUsing functions as arguments...
> m
[,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6
> apply(m, 1, sum)
[1] 5 7 9
> apply(m, 2, mean)
[1] 2 5
A programming methodology.
We've already seen this in action.
> x = 1:10 > y = rnorm(10) > m = lm(y ~ x) > class(m)
[1] "lm"
This is an object of the lm
(linear model) class. If I do residuals(m)
I'm calling the residuals
method.
> x = 1:10 > p = rpois(10, 3) > gm = glm(p ~ x, family = "poisson") > class(gm)
[1] "glm" "lm"
This is an object of class glm
and lm
. An GLM is a generalisation of an LM, so
it should have all the same methods, and possibly new ones.
Now when I do residuals(gm)
I am calling the method for GLMs. If no such method exists,
then R will use the lm
method.
Once you start working seriously with R you will not only be using objects everywhere (data frames are a class), but you might start defining your own classes.