# Dice Rolling

In this worksheet we will write some functions for simulating throws of dice and investigate random number generation.

## RNGs

R has plenty of random number generators.

• Uniform(0,1) with `runif(10)` : `0.6515, 0.2056, 0.7945, 0.7793, 0.7888, 0.152, 0.4109, 0.7085, 0.1717, 0.5241`
• Normal(0,1) with `rnorm(10)` : `0.4415, 0.6087, 0.1816, -0.0075, 0.08, -0.5867, 2.1508, -0.0653, -1.0329, 0.1495`
• Poisson(lambda) with `rpois(10,lambda=3)` : `3, 2, 3, 4, 1, 3, 5, 2, 1, 3`
• ...and many more.

Each of these has parameters to control the mean, standard deviation, and whatever other parameters the distribution has.

## One die

How do we simulate single die rolls?

``` d1 = as.integer(runif(1, 1, 6))
d1
```
```[1] 5
```

Well possibly. But let's roll a few more and tabulate them to check:

``` table(as.integer(runif(10000, 1, 6)))
```
```   1    2    3    4    5
2013 2010 1917 2046 2014
```

Oops, we seem to have rolled a d5. We didn't read the help for `runif` well enough. The upper limit is excluded and we round down. Next try:

``` table(as.integer(runif(10000, 1, 7)))
```
```   1    2    3    4    5    6
1701 1624 1672 1676 1645 1682
```

Much better.

Another method is to use the `sample` function:

``` sample(1:6, 1)
```
```[1] 6
```

This takes a random sample of 1 value from its first argument. With a slight tweak we can take multiple values:

``` table(sample(1:6, 10000))
```
```Error: cannot take a sample larger than the population when 'replace = FALSE'
```
``` table(sample(1:6, 10000, replace = TRUE))
```
```   1    2    3    4    5    6
1627 1678 1593 1661 1679 1762
```

We should probably wrap this into a function. The input is just the number of single die rolls to do, and the return value is a vector of that length.

``` die = function(n) {
return(sample(1:6, n, replace = TRUE))
}
```

check all the following work:

``` die(1)
die(3)
table(die(5))
table(die(1000))
```

Good functions should probably cope with bad input. What does yours do with these:

``` die(0)
die(1, 2, 3)
```
```Error: unused argument(s) (2, 3)
```
``` die(pi)
die(-1)
```
```Error: invalid 'size' argument
```

How much of this behaviour is expected?

Let's fix some things with a simple check:

``` die = function(n) {
if (n < 1) {
stop("n is less than one")
}
return(sample(1:6, n, replace = TRUE))
}
die(0)
```
```Error: n is less than one
```

What other conditions should you check for?

## Multiple dice

Now we can roll one die a thousand times. It's distribution looks flat:

``` plot(table(die(10000)))
```

What if we sum two dice? We can just call our function twice and add the resulting vectors - remember R adds each element, and as long as our vectors are the same length this will work as planned.

``` plot(table(die(10000) + die(10000)))
```

This gives us a neat triangular distribution peaking at seven. Let's extend this to more dice:

``` par(mfrow = c(2, 2))
plot(table(die(10000) + die(10000)))
plot(table(die(10000) + die(10000) + die(10000) + die(10000)))
plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000)))
plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000)))
```
``` par(mfrow = c(1, 1))
```

## For more dice

Repetition like that is a pain to write, and a pain to debug. Let the computer do the work. We'll now write a `dice` function to throw multiple dice and return the sum.

• Input:
1. T, number of throws
2. N, number of dice in each set
• Output: vector with length equal to number of throws containing sums of sets.

There's various ways to do this. The neatest may be to create a 2-d matrix of die scores and then add up the rows. R has some functions for this:

``` m = matrix(die(33), 3, 11)
m
```
```     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    6    3    3    3    6    4    6    3    6     5     4
[2,]    1    4    1    4    4    6    3    2    6     2     1
[3,]    4    5    1    6    3    6    5    5    1     2     1
```
``` apply(m, 2, sum)
```
``` [1] 11 12  5 13 13 16 14 10 13  9  6
```

This creates a 3x11 matrix of single die rolls and then uses `apply` to sum along dimension 2, which is columns. The `apply` function is a general purpose function for doing anything along rows or columns of a matrix, in this case the `sum` function.

We can actually use `colSums` here. Let's build our function. We set a default of ten throws of two dice.

``` dice = function(T = 10, N = 2) {
m = matrix(die(N * T), N, T)
return(colSums(m))
}
dice()
```
``` [1] 10  7  7 10  4  7  8  9  3  7
```
``` dice(5, 10)
```
```[1] 36 40 32 35 37
```

What happens if you give nonsensical inputs to this function?

Ever wanted to throw 100 dice a thousand times?

``` lots = dice(1000, 100)
hist(lots)
```

There's some statistical theory that says the more dice you roll, the more it looks like a Normal distribution. Let's do a qqplot.

``` summary(lots)
```
```   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
298     339     351     350     362     402
```
``` qqnorm(scale(lots))  # rescale our dice to mean 0, sd 1
```

## Yahtzee!

While we're rolling dice, let's look at Yahtzee, a dice game that is a trademark of Hasbro. In it the player rolls five dice and tries to get certain patterns. For example getting all five dice the same number is a "Yahtzee" and scores fifty points. A triple and a pair is a "Full House" and scores 25 points. Let's write some code to check for some of the Yahtzee patterns in a vector of dice rolls.

What are our functions going to look like? They should take a vector of length 5 of values from 1 to 6 and return TRUE or FALSE.

``` is.yahtzee = function(d6) {

}
is.fullhouse = function(d6) {

}
is.longstraight = function(d6) {

}
```

and so on. Let's take these in order.

## Yahtzee!

To score a yahtzee, all the dice are the same. Which means all the dice are the same as the first one. We use the `all` function, which returns a single TRUE if fed a vector of all TRUE values.

``` all(c(TRUE, TRUE, TRUE))
```
```[1] TRUE
```
``` all(c(TRUE, FALSE, FALSE))
```
```[1] FALSE
```
``` all(c(FALSE, FALSE, FALSE))
```
```[1] FALSE
```
```
is.yahtzee = function(d6) {
return(all(d6 == d6[1]))
}
is.yahtzee(c(1, 2, 3, 4, 5))  # no
```
```[1] FALSE
```
``` is.yahtzee(c(3, 3, 3, 3, 3))  # yes
```
```[1] TRUE
```
``` is.yahtzee(c(3, 3, 3, 3, 6))  # no
```
```[1] FALSE
```

## Full House

To test for this we can tabulate the dice rolls, sort the table, and see if we get (0,0,0,0,2,3). We have to convert the dice roll into a factor with six levels to get zeroes in the table.

``` is.fullhouse = function(d6) {
return(all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3)))
}
is.fullhouse(c(1, 2, 1, 2, 2))  # yes
```
```[1] TRUE
```
``` is.fullhouse(c(1, 2, 1, 2, 3))  # no
```
```[1] FALSE
```
``` is.fullhouse(c(6, 6, 5, 5, 6))  # yes
```
```[1] TRUE
```
``` is.fullhouse(c(3, 3, 3, 3, 3))  # no....
```
```[1] FALSE
```

Is that that last result correct? Is a yahtzee also a full house? It is a triple and a pair, just both of the same number. This is often a source of arguments among children, and without source code how can they be sure? Let's change our function to allow it.

``` is.fullhouse = function(d6) {
return(is.yahtzee(d6) | all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3)))
}
is.fullhouse(c(1, 1, 1, 1, 1))  # yes
```
```[1] TRUE
```
``` is.fullhouse(c(1, 2, 1, 2, 1))  # yes
```
```[1] TRUE
```
``` is.fullhouse(c(1, 1, 1, 1, 6))  # no
```
```[1] FALSE
```

## Aside: Automated Testing

This is a good point to mention the `testthat` package. It lets you write a set of tests which you can run to make sure that changing your function doesn't break any of the functionality you expect. The tests would be a set of expressions and expected values (or expected errors) which you can run with a single function. If anything returns a different value to that specified in the test, or if it returns an error when you didn't expect one, or if it doesn't return an error when you specified that it darn well should, this will get flagged up.

Here's a few tests that should run without errors. Note I'm testing for an expected error in the last line.

``` require(testthat)
```
```Loading required package: testthat
```
```Loading required package: methods
```
``` expect_true(is.yahtzee(c(1, 1, 1, 1, 1)))
expect_false(is.yahtzee(c(1, 1, 1, 1, 2)))
expect_error(die(0))
```

## Long Straight

A long straight is either 1,2,3,4,5 or 2,3,4,5,6. We could sort the dice and check for these two conditions, or we could check the sorted differences are all equal to one.

``` is.longstraight = function(d6) {
return(all(diff(sort(d6)) == 1))
}
is.longstraight(c(4, 3, 2, 3, 4))  # no
```
```[1] FALSE
```
``` is.longstraight(c(5, 4, 3, 2, 1))  # yes
```
```[1] TRUE
```
``` is.longstraight(c(1, 2, 3, 4, 5))  # yes
```
```[1] TRUE
```
``` is.longstraight(c(6, 2, 3, 4, 5))  # yes
```
```[1] TRUE
```

## What have we got?

Finally let's write some code to see how common various patterns are.

First let's write a function to return a matrix of yahtzee dice throws, default 10

``` throwfive = function(N = 10) {
return(matrix(die(N * 5), N, 5))
}
throws = throwfive(3)
throws
```
```     [,1] [,2] [,3] [,4] [,5]
[1,]    2    6    6    5    2
[2,]    5    5    6    2    6
[3,]    3    4    4    3    3
```

Each row here is a set of five dice. We can now apply our test functions to rows using the `apply` function. We're going across rows now, so we use 1 as the dimension index.

``` apply(throws, 1, is.yahtzee)
```
```[1] FALSE FALSE FALSE
```
``` apply(throws, 1, is.fullhouse)
```
```[1] FALSE FALSE  TRUE
```
``` apply(throws, 1, is.longstraight)
```
```[1] FALSE FALSE FALSE
```

Let's scale this up. A thousand throws... Then tabulate how many full houses and long straights we got:

``` throws = throwfive(1000)
table(apply(throws, 1, is.fullhouse))
```
```FALSE  TRUE
964    36
```
``` table(apply(throws, 1, is.longstraight))
```
```FALSE  TRUE
962    38
```

By simple division we can work out the probability.

Did we get any Yahtzees??

``` if (any(apply(throws, 1, is.yahtzee))) {
print("Yahtzee!!")
}
```

How many? Work out the probability of throwing a Yahtzee numerically. Note this is just from one throw of five dice.

## Challenges

Write some more functions for checking the other yahtzee score throws (see Wikipedia for details). Note that four-of-a-kind is also three-of-a-kind, and a short straight is also a long straight.

## Improving Things

The `throwfive` function looks a bit like the `dice` function. How could you rewrite the code to avoid duplication?

## Functions as arguments

Here's another interesting thing. Passing a function to a function:

``` applyDice = function(T, N, f) {
d = matrix(die(N * T), N, T)
return(apply(d, 2, f))
}
applyDice(20, 5, sum)
```
``` [1] 18 14 13 14 14 19 22 17 20 15 16 17 20 19 17 17 18 20 22 19
```
``` applyDice(20, 5, is.fullhouse)
```
``` [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[20] FALSE
```

## Real Hard Challenge

When playing Yahtzee you throw five dice, then you can keep some and rethrow the others in a bid to improve your score, and then you can do that again. Simulate that in R, and work out the probability of getting yahtzee in real play! What's the best strategy for rethrows? My code for this has a 'verbose' argument that says what it is doing, and returns -1 if it fails or the number of throws it took to get a Yahtzee.

``` getYahtzee(verbose = TRUE)
```
```First throw...
Got  3  of a kind
Second throw...
Got  3  of a kind
Last chance...
No luck this time :(
```
```[1] -1
```
``` getYahtzee(verbose = TRUE)
```
```First throw...
Got  2  of a kind
Second throw...
Got  2  of a kind
Last chance...
No luck this time :(
```
```[1] -1
```
``` getYahtzee(verbose = TRUE)
```
```First throw...
Got  2  of a kind
Second throw...
Got  3  of a kind
Last chance...
Yahtzee! Last roll!
```
```[1] 3
```

Now repeat a thousand times:

``` y = rep(0, 1000)
for (i in 1:1000) {
y[i] = getYahtzee()
}
table(y > 0)
```
```FALSE  TRUE
957    43
```