# Programming in R

## R Is A Programming Language

• A dialect of `S`
• Data Structures
• Flow Control
• Conditional Execution
• Partly "Functional"
• Partly "Object-oriented"

## Data Structures

Simple scalar (single) values can be numbers, character strings, TRUE/FALSE, date... any 'atomic' value

## Vectors

Can be numeric Or character string Or true/false Or any "atomic" value

## Next Dimensions

2-d matrix 3-d array • All elements are the same type
• Can be any atomic type
• Can have row and column names

Data Frame • All rows have the same types
• Each column is one type
• A bit like a spreadsheet table
• But stricter!

## Irregular Data - lists • A number of elements
• Each element can have a name
• Each element can be anything

## Nested Irregular Data • Elements can be lists!
• Careful thought needed for designing list structures...

## Scalars and Vectors

Scalars

```> x = 1
> x = 1.2
> x = "Hello World"
> x = TRUE
```

Vectors

```> v = c(1, 4, 9, 16)
> v
```
```  1  4  9 16
```
```> v
```
``` 9
```

Matrices and arrays

```> m = matrix(c(1, 2, 3, 4, 5, 6), ncol = 2)
> m
```
```     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
```
```> m[, 2]
```
``` 4 5 6
```
```> a = array(1:24, dim = c(2, 3, 4))
> a[1, , ]
```
```     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23
```
```> a[1, 2, ]
```
```  3  9 15 21
```
```> a[1, 2, 3]
```
``` 15
```
```> a
```
```, , 1

[,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

[,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

[,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

[,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24
```

## Data Frames

```> d = data.frame(x = 1:5, n = c("a", "b", "b", "c",
"d"))
> d
```
```  x n
1 1 a
2 2 b
3 3 b
4 4 c
5 5 d
```
```> d\$x
```
``` 1 2 3 4 5
```
```> d[, 1]
```
``` 1 2 3 4 5
```
```> d[2:3, ]
```
```  x n
2 2 b
3 3 b
```

## Lists - for irregular data

Suppose each person can take a test any number of times...

```> e1 = list(name = "Fred", scores = c(23, 74, 12))
> e1
```
```\$name
 "Fred"

\$scores
 23 74 12
```
```> names(e1)
```
``` "name"   "scores"
```
```> e1\$name
```
``` "Fred"
```
```> mean(e1\$scores)
```
``` 36.33
```
```> mean(e1[])
```
``` 36.33
```
```> e2 = list(name = "Joe", scores = c(27, 65, 17,
19, 32))
> exams = list(e1, e2)
> exams[]
```
```\$name
 "Fred"

\$scores
 23 74 12
```
```> exams[]\$name
```
``` "Fred"
```

## Program Flow ## Loop

```> for (i in c(1, 2, 3, 4, 5)) {
cat("square root of ", i, " is ", sqrt(i),
"\n")
}
```
```square root of  1  is  1
square root of  2  is  1.414
square root of  3  is  1.732
square root of  4  is  2
square root of  5  is  2.236
```

## Don't loop...

In some programming languages you loop a lot.

```> # divide every element by two
> x = c(1, 2, 4, 8, 16)
> for (i in 1:5) {
x[i] = x[i]/2
}
> x
```
``` 0.5 1.0 2.0 4.0 8.0
```

But in R many basic operations don't need it.

```> x = c(1, 2, 4, 8, 16)
> x = x/2
> x
```
``` 0.5 1.0 2.0 4.0 8.0
```

## If-Then-Else

```> for (i in 1:100) {
if (sqrt(i) == as.integer(sqrt(i))) {
cat("sqrt(", i, ") is integer\n")
}
}
```
```sqrt( 1 ) is integer
sqrt( 4 ) is integer
sqrt( 9 ) is integer
sqrt( 16 ) is integer
sqrt( 25 ) is integer
sqrt( 36 ) is integer
sqrt( 49 ) is integer
sqrt( 64 ) is integer
sqrt( 81 ) is integer
sqrt( 100 ) is integer
```
```> random = runif(1)  # one random number
> if (random > 0.5) {
} else {
cat("Tails you lose!\n")
}
```
```Heads you win!
```

## While...

```> count = 0
> while (runif(1) < 0.99) {
count = count + 1
}
> cat("Took ", count, " iterations\n")
```
```Took  128  iterations
```

## Writing functions

```> quadrat = function(x, a, b, c) {
return(a * x^2 + b * x + c)
}
> quadrat(c(1, 2, 3, 4), 1, 0.5, -2)
```
``` -0.5  3.0  8.5 16.0
```
```> qsolve = function(a, b, c) {
det = b^2 - 4 * a * c
if (det < 0) {
stop("Complex roots")
}
rplus = (-b + sqrt(det))/(2 * a)
rminus = (-b - sqrt(det))/(2 * a)
return(c(rplus, rminus))
}
> qsolve(1, 0.5, -2)
```
```  1.186 -1.686
```
```> quadrat(qsolve(1, 0.5, -2), 1, 0.5, -2)
```
``` 0 0
```
```> qsolve(3, -2, 1)
```
```Error: Complex roots
```

## Function formality

A function:

• Has a name
• Has zero or more arguments
• Arguments can be named or positional
• Arguments can have default values
• Returns a single value
• May cause side-effects

## I'd like to have an argument...

```> args(log)  # inspect arguments
```
```function (x, base = exp(1))
NULL
```
```> log(100)  # default base-e natural log
```
``` 4.605
```
```> log(100, base = 10)  # name match
```
``` 2
```
```> log(100, 10)  # position match
```
``` 2
```
```> log(100, b = 10)  # partial name
```
``` 2
```
```> log(100, z = 10)  # wrong
```
```Error: unused argument(s) (z = 10)
```
```> log(base = 10, 100)  # args backwards
```
``` 2
```

• Don't mess with the order unless you have a very good reason
• Using named arguments helps with clarity
• Don't shorten argument names - clarity is a good thing
• The `help(log)` help page for every function should explain its arguments
• When writing functions, think carefully about arguments

## Functional Programming

Using functions as arguments...

```> m
```
```     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6
```
```> apply(m, 1, sum)
```
``` 5 7 9
```
```> apply(m, 2, mean)
```
``` 2 5
```

## Object-oriented what?

A programming methodology.

• Model things as 'objects' of a class
• Have a nested hierarchy of classes
• Specify attributes of objects
• Specify "methods" - what you can do with those objects

We've already seen this in action.

```> x = 1:10
> y = rnorm(10)
> m = lm(y ~ x)
> class(m)
```
``` "lm"
```

This is an object of the `lm` (linear model) class. If I do `residuals(m)` I'm calling the `residuals` method.

```> x = 1:10
> p = rpois(10, 3)
> gm = glm(p ~ x, family = "poisson")
> class(gm)
```
``` "glm" "lm"
```

This is an object of class `glm` and `lm`. An GLM is a generalisation of an LM, so it should have all the same methods, and possibly new ones.

Now when I do `residuals(gm)` I am calling the method for GLMs. If no such method exists, then R will use the `lm` method.

## Objects Everywhere

Once you start working seriously with R you will not only be using objects everywhere (data frames are a class), but you might start defining your own classes.