Dice Rolling

In this worksheet we will write some functions for simulating throws of dice and investigate random number generation.

RNGs

R has plenty of random number generators.

  • Uniform(0,1) with runif(10) : 0.6515, 0.2056, 0.7945, 0.7793, 0.7888, 0.152, 0.4109, 0.7085, 0.1717, 0.5241
  • Normal(0,1) with rnorm(10) : 0.4415, 0.6087, 0.1816, -0.0075, 0.08, -0.5867, 2.1508, -0.0653, -1.0329, 0.1495
  • Poisson(lambda) with rpois(10,lambda=3) : 3, 2, 3, 4, 1, 3, 5, 2, 1, 3
  • ...and many more.

Each of these has parameters to control the mean, standard deviation, and whatever other parameters the distribution has.

One die

How do we simulate single die rolls?

 d1 = as.integer(runif(1, 1, 6))
 d1
[1] 5

Well possibly. But let's roll a few more and tabulate them to check:

 table(as.integer(runif(10000, 1, 6)))
   1    2    3    4    5 
2013 2010 1917 2046 2014 

Oops, we seem to have rolled a d5. We didn't read the help for runif well enough. The upper limit is excluded and we round down. Next try:

 table(as.integer(runif(10000, 1, 7)))
   1    2    3    4    5    6 
1701 1624 1672 1676 1645 1682 

Much better.

Another method is to use the sample function:

 sample(1:6, 1)
[1] 6

This takes a random sample of 1 value from its first argument. With a slight tweak we can take multiple values:

 table(sample(1:6, 10000))
Error: cannot take a sample larger than the population when 'replace = FALSE'
 table(sample(1:6, 10000, replace = TRUE))
   1    2    3    4    5    6 
1627 1678 1593 1661 1679 1762 

We should probably wrap this into a function. The input is just the number of single die rolls to do, and the return value is a vector of that length.

 die = function(n) {
     return(sample(1:6, n, replace = TRUE))
 }

check all the following work:

 die(1)
 die(3)
 table(die(5))
 table(die(1000))

Good functions should probably cope with bad input. What does yours do with these:

 die(0)
 die(1, 2, 3)
Error: unused argument(s) (2, 3)
 die(pi)
 die(-1)
Error: invalid 'size' argument

How much of this behaviour is expected?

Let's fix some things with a simple check:

 die = function(n) {
     if (n < 1) {
         stop("n is less than one")
     }
     return(sample(1:6, n, replace = TRUE))
 }
 die(0)
Error: n is less than one

What other conditions should you check for?

Multiple dice

Now we can roll one die a thousand times. It's distribution looks flat:

 plot(table(die(10000)))
plot of chunk unnamed-chunk-10

What if we sum two dice? We can just call our function twice and add the resulting vectors - remember R adds each element, and as long as our vectors are the same length this will work as planned.

 plot(table(die(10000) + die(10000)))
plot of chunk unnamed-chunk-11

This gives us a neat triangular distribution peaking at seven. Let's extend this to more dice:

 par(mfrow = c(2, 2))
 plot(table(die(10000) + die(10000)))
 plot(table(die(10000) + die(10000) + die(10000) + die(10000)))
 plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000)))
 plot(table(die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000) + die(10000)))
plot of chunk unnamed-chunk-12
 par(mfrow = c(1, 1))

For more dice

Repetition like that is a pain to write, and a pain to debug. Let the computer do the work. We'll now write a dice function to throw multiple dice and return the sum.

  • Input:
    1. T, number of throws
    2. N, number of dice in each set
  • Output: vector with length equal to number of throws containing sums of sets.

There's various ways to do this. The neatest may be to create a 2-d matrix of die scores and then add up the rows. R has some functions for this:

 m = matrix(die(33), 3, 11)
 m
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]    6    3    3    3    6    4    6    3    6     5     4
[2,]    1    4    1    4    4    6    3    2    6     2     1
[3,]    4    5    1    6    3    6    5    5    1     2     1
 apply(m, 2, sum)
 [1] 11 12  5 13 13 16 14 10 13  9  6

This creates a 3x11 matrix of single die rolls and then uses apply to sum along dimension 2, which is columns. The apply function is a general purpose function for doing anything along rows or columns of a matrix, in this case the sum function.

We can actually use colSums here. Let's build our function. We set a default of ten throws of two dice.

 dice = function(T = 10, N = 2) {
     m = matrix(die(N * T), N, T)
     return(colSums(m))
 }
 dice()
 [1] 10  7  7 10  4  7  8  9  3  7
 dice(5, 10)
[1] 36 40 32 35 37

What happens if you give nonsensical inputs to this function?

Ever wanted to throw 100 dice a thousand times?

 lots = dice(1000, 100)
 hist(lots)
plot of chunk unnamed-chunk-15

There's some statistical theory that says the more dice you roll, the more it looks like a Normal distribution. Let's do a qqplot.

 summary(lots)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    298     339     351     350     362     402 
 qqnorm(scale(lots))  # rescale our dice to mean 0, sd 1
plot of chunk unnamed-chunk-16

Yahtzee!

While we're rolling dice, let's look at Yahtzee, a dice game that is a trademark of Hasbro. In it the player rolls five dice and tries to get certain patterns. For example getting all five dice the same number is a "Yahtzee" and scores fifty points. A triple and a pair is a "Full House" and scores 25 points. Let's write some code to check for some of the Yahtzee patterns in a vector of dice rolls.

What are our functions going to look like? They should take a vector of length 5 of values from 1 to 6 and return TRUE or FALSE.

 is.yahtzee = function(d6) {
     
 }
 is.fullhouse = function(d6) {
     
 }
 is.longstraight = function(d6) {
     
 }

and so on. Let's take these in order.

Yahtzee!

To score a yahtzee, all the dice are the same. Which means all the dice are the same as the first one. We use the all function, which returns a single TRUE if fed a vector of all TRUE values.

 all(c(TRUE, TRUE, TRUE))
[1] TRUE
 all(c(TRUE, FALSE, FALSE))
[1] FALSE
 all(c(FALSE, FALSE, FALSE))
[1] FALSE
 
 is.yahtzee = function(d6) {
     return(all(d6 == d6[1]))
 }
 is.yahtzee(c(1, 2, 3, 4, 5))  # no
[1] FALSE
 is.yahtzee(c(3, 3, 3, 3, 3))  # yes
[1] TRUE
 is.yahtzee(c(3, 3, 3, 3, 6))  # no
[1] FALSE

Full House

To test for this we can tabulate the dice rolls, sort the table, and see if we get (0,0,0,0,2,3). We have to convert the dice roll into a factor with six levels to get zeroes in the table.

 is.fullhouse = function(d6) {
     return(all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3)))
 }
 is.fullhouse(c(1, 2, 1, 2, 2))  # yes
[1] TRUE
 is.fullhouse(c(1, 2, 1, 2, 3))  # no
[1] FALSE
 is.fullhouse(c(6, 6, 5, 5, 6))  # yes
[1] TRUE
 is.fullhouse(c(3, 3, 3, 3, 3))  # no....
[1] FALSE

Is that that last result correct? Is a yahtzee also a full house? It is a triple and a pair, just both of the same number. This is often a source of arguments among children, and without source code how can they be sure? Let's change our function to allow it.

 is.fullhouse = function(d6) {
     return(is.yahtzee(d6) | all(sort(table(factor(d6, levels = 1:6))) == c(0, 0, 0, 0, 2, 3)))
 }
 is.fullhouse(c(1, 1, 1, 1, 1))  # yes
[1] TRUE
 is.fullhouse(c(1, 2, 1, 2, 1))  # yes
[1] TRUE
 is.fullhouse(c(1, 1, 1, 1, 6))  # no
[1] FALSE

Aside: Automated Testing

This is a good point to mention the testthat package. It lets you write a set of tests which you can run to make sure that changing your function doesn't break any of the functionality you expect. The tests would be a set of expressions and expected values (or expected errors) which you can run with a single function. If anything returns a different value to that specified in the test, or if it returns an error when you didn't expect one, or if it doesn't return an error when you specified that it darn well should, this will get flagged up.

Here's a few tests that should run without errors. Note I'm testing for an expected error in the last line.

 require(testthat)
Loading required package: testthat
Loading required package: methods
 expect_true(is.yahtzee(c(1, 1, 1, 1, 1)))
 expect_false(is.yahtzee(c(1, 1, 1, 1, 2)))
 expect_error(die(0))

Long Straight

A long straight is either 1,2,3,4,5 or 2,3,4,5,6. We could sort the dice and check for these two conditions, or we could check the sorted differences are all equal to one.

 is.longstraight = function(d6) {
     return(all(diff(sort(d6)) == 1))
 }
 is.longstraight(c(4, 3, 2, 3, 4))  # no
[1] FALSE
 is.longstraight(c(5, 4, 3, 2, 1))  # yes
[1] TRUE
 is.longstraight(c(1, 2, 3, 4, 5))  # yes
[1] TRUE
 is.longstraight(c(6, 2, 3, 4, 5))  # yes
[1] TRUE

What have we got?

Finally let's write some code to see how common various patterns are.

First let's write a function to return a matrix of yahtzee dice throws, default 10

 throwfive = function(N = 10) {
     return(matrix(die(N * 5), N, 5))
 }
 throws = throwfive(3)
 throws
     [,1] [,2] [,3] [,4] [,5]
[1,]    2    6    6    5    2
[2,]    5    5    6    2    6
[3,]    3    4    4    3    3

Each row here is a set of five dice. We can now apply our test functions to rows using the apply function. We're going across rows now, so we use 1 as the dimension index.

 apply(throws, 1, is.yahtzee)
[1] FALSE FALSE FALSE
 apply(throws, 1, is.fullhouse)
[1] FALSE FALSE  TRUE
 apply(throws, 1, is.longstraight)
[1] FALSE FALSE FALSE

Let's scale this up. A thousand throws... Then tabulate how many full houses and long straights we got:

 throws = throwfive(1000)
 table(apply(throws, 1, is.fullhouse))
FALSE  TRUE 
  964    36 
 table(apply(throws, 1, is.longstraight))
FALSE  TRUE 
  962    38 

By simple division we can work out the probability.

Did we get any Yahtzees??

 if (any(apply(throws, 1, is.yahtzee))) {
     print("Yahtzee!!")
 }

How many? Work out the probability of throwing a Yahtzee numerically. Note this is just from one throw of five dice.

Challenges

Write some more functions for checking the other yahtzee score throws (see Wikipedia for details). Note that four-of-a-kind is also three-of-a-kind, and a short straight is also a long straight.

Improving Things

The throwfive function looks a bit like the dice function. How could you rewrite the code to avoid duplication?

Functions as arguments

Here's another interesting thing. Passing a function to a function:

 applyDice = function(T, N, f) {
     d = matrix(die(N * T), N, T)
     return(apply(d, 2, f))
 }
 applyDice(20, 5, sum)
 [1] 18 14 13 14 14 19 22 17 20 15 16 17 20 19 17 17 18 20 22 19
 applyDice(20, 5, is.fullhouse)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
[20] FALSE

Real Hard Challenge

When playing Yahtzee you throw five dice, then you can keep some and rethrow the others in a bid to improve your score, and then you can do that again. Simulate that in R, and work out the probability of getting yahtzee in real play! What's the best strategy for rethrows? My code for this has a 'verbose' argument that says what it is doing, and returns -1 if it fails or the number of throws it took to get a Yahtzee.

 getYahtzee(verbose = TRUE)
First throw...
Got  3  of a kind
Second throw...
Got  3  of a kind
Last chance...
No luck this time :(
[1] -1
 getYahtzee(verbose = TRUE)
First throw...
Got  2  of a kind
Second throw...
Got  2  of a kind
Last chance...
No luck this time :(
[1] -1
 getYahtzee(verbose = TRUE)
First throw...
Got  2  of a kind
Second throw...
Got  3  of a kind
Last chance...
Yahtzee! Last roll!
[1] 3

Now repeat a thousand times:

 y = rep(0, 1000)
 for (i in 1:1000) {
     y[i] = getYahtzee()
 }
 table(y > 0)
FALSE  TRUE 
  957    43