Functions

Functions are important in R - its a functional language

Usage

A function has arguments, returns a result.

> foo(1, 2, 3)
[1] 5

Simple argument matching

> args(foo)
function (x, a, b = 1) 
NULL
> foo()
Error: 'a' is missing
> foo(1)
Error: 'a' is missing
> foo(1, 2)
[1] 3
> foo(1, 2, 3)
[1] 5
> foo(1, 2, 3, 4)
Error: unused argument(s) (4)

Names

> foo(1, 2, 3)
[1] 5
> foo(b = 3, a = 2, x = 1)
[1] 5

Seeing the source

> foo
function(x,a,b=1){a*x+b*x}
> sqrt
function (x)  .Primitive("sqrt")

Write Your Own

> foo = function(x, a, b = 1) {
+     a * x + b * x
+ }
> foo(1, 2, 3)
[1] 5
> zoo = foo
> zoo(1, 2, 3)
[1] 5
> rm(zoo)
> sqrt = foo
> sqrt(2)
Error: 'a' is missing
> rm(sqrt)

Vector arguments

> foo(1:10, b = 3, a = 2)
 [1]  5 10 15 20 25 30 35 40 45 50
> foo(1:10, b = c(0, 2), a = 2)
 [1]  2  8  6 16 10 24 14 32 18 40
x12345678910
a2222222222
b0202020202
> foo(1:9, b = c(2, 3), a = 2)
Warning: longer object length is not a multiple of
shorter object length
[1]  4 10 12 20 20 30 28 40 36
x123456789
a222222222
b232323232

Help!

help(sqrt) or ?sqrt
MathFun                  package:base                  R Documentation

Miscellaneous Mathematical Functions

Description:

     These functions compute miscellaneous mathematical functions.  The
     naming follows the standard for computer languages such as C or
     Fortran.

Usage:

     abs(x)
     sqrt(x)
     
Arguments:

       x: a numeric or ‘complex’ vector or array.

Details:
....
References:
....
See Also:
....
Examples:
     require(stats) # for spline
     require(graphics)
     xx <- -9:9
     plot(xx, sqrt(abs(xx)),  col = "red")
     lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")

> example(sqrt)
sqrt> require(stats) # for spline

sqrt> require(graphics)

sqrt> xx <- -9:9

sqrt> plot(xx, sqrt(abs(xx)),  col = "red")
plot of chunk unnamed-chunk-9
sqrt> lines(spline(xx, sqrt(abs(xx)), n=101), col = "pink")

More Help

From R

> help("thing")
> help.search("thing")
> RSiteSearch("thing")
> help.start()

Online

RDocumentation.org

Its all functions

> x=3;y=4
> "+"("^"(x,2),"^"(y,2))
[1] 25
> x^2 + y^2
[1] 25

Scope

> x = 2
> foo = function(z) {
+     z + x
+ }
> foo(10)
[1] 12
> x = 3
> foo(10)
[1] 13

generally a bad idea!

> x
[1] 3
> foo = function(z) {
+     x = 99
+     z + x
+ }
> foo(10)
[1] 109
> x
[1] 3

global x unaffected - this is good!

To Change Something, Return It

> d = data.frame(age = 1:10)
> classify = function(df) {
+     df$old = df$age > 5
+ }
> classify(d)  # doesn't change d

modify, return, assign

> classify = function(df) {
+     df$old = df$age > 5
+     return(df)
+ }
> d = classify(d)

'pass by value'

Packages (and libraries)

> require(ggplot2)
Loading required package: ggplot2
Loading required package: methods
> ggplot(quakes) + geom_point(aes(x = long, y = lat, color = mag))
plot of chunk unnamed-chunk-15
> search()
 [1] ".GlobalEnv"        "package:ggplot2"  
 [3] "package:methods"   "package:knitr"    
 [5] "package:stats"     "package:graphics" 
 [7] "package:grDevices" "package:utils"    
 [9] "package:datasets"  "Autoloads"        
[11] "package:base"     
> ls("package:datasets", pat = "^[pq].*")
[1] "precip"     "presidents" "pressure"   "quakes"    
> summary(quakes)
      lat             long         depth    
 Min.   :-38.6   Min.   :166   Min.   : 40  
 1st Qu.:-23.5   1st Qu.:180   1st Qu.: 99  
 Median :-20.3   Median :181   Median :247  
 Mean   :-20.6   Mean   :179   Mean   :311  
 3rd Qu.:-17.6   3rd Qu.:183   3rd Qu.:543  
 Max.   :-10.7   Max.   :188   Max.   :680  
      mag          stations    
 Min.   :4.00   Min.   : 10.0  
 1st Qu.:4.30   1st Qu.: 18.0  
 Median :4.60   Median : 27.0  
 Mean   :4.62   Mean   : 33.4  
 3rd Qu.:4.90   3rd Qu.: 42.0  
 Max.   :6.40   Max.   :132.0  
help(quakes)
Locations of Earthquakes off Fiji

Description:

     The data set give the locations of 1000 seismic events of MB >
     4.0.  The events occurred in a cube near Fiji since 1964.

Format:

     A data frame with 1000 observations on 5 variables.

       [,1]  lat       numeric  Latitude of event            
       [,2]  long      numeric  Longitude                    
       [,3]  depth     numeric  Depth (km)                   
       [,4]  mag       numeric  Richter Magnitude            
       [,5]  stations  numeric  Number of stations reporting 

Flow

If everything is a function, how do you control flow?

> mean(quakes$mag)  # no need to loop
[1] 4.62
> m = matrix(1:12, 3, 4)
> m
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> apply(m, 1, sum)  # apply a function over rows...
[1] 22 26 30
> apply(m, 2, sum)  # ...or columns
[1]  6 15 24 33

For loops

> for (i in 1:5) {
+     print(i * 2)
+ }
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10

If...

Conditional execution

> testme = function(x) {
+     if (x > 0.5) {
+         return("yes")
+     } else {
+         return("no")
+     }
+ }
> testme(0)
[1] "no"
> testme(1)
[1] "yes"
> testme(c(0.2, 0.9))
Warning: the condition has length > 1 and only the
first element will be used
[1] "no"
> testme = function(x) {
+     ifelse(x > 0.5, "yes", "no")
+ }
> testme(c(0.2, 0.9, 0.05, 1.2))
[1] "no"  "yes" "no"  "yes"

If you must...

Compute the mean magnitude for quakes below 500km...

> magsum = 0
> ndeep = 0
> for (i in 1:nrow(quakes)) {
+     if (quakes[i, "depth"] > 500) {
+         magsum = magsum + quakes[i, "mag"]
+         ndeep = ndeep + 1
+     }
+ }
> magsum/ndeep
[1] 4.55

BUT much better to do

> mean(quakes$mag[quakes$depth > 500])
[1] 4.55

Break that down to see how it works:

> head(quakes$depth)
[1] 562 650  42 626 649 195
> head(quakes$mag)
[1] 4.8 4.2 5.4 4.1 4.0 4.0
> head(quakes$depth > 500)
[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE
> head(quakes$mag[quakes$depth > 500])
[1] 4.8 4.2 4.1 4.0 4.3 4.4

Plots

> par(mfrow = c(2, 2))  # 2x2 plots
> hist(quakes$depth)
> hist(quakes$mag)
> plot(quakes$mag, quakes$depth)
> plot(quakes$lat, quakes$long, pch = 19, col = "red")
plot of chunk firstplots
> par(mfrow = c(1, 1))
> plot(quakes)
plot of chunk firstplots

Objects

A way of creating complex objects with complex behaviour...

Let's fit a linear model

> lm.stack <- lm(stack.loss ~ stack.x)

What is lm.stack?

> class(lm.stack)
[1] "lm"

We say this is an object of class 'lm'. What can we do with it?

> lm.stack
Call:
lm(formula = stack.loss ~ stack.x)

Coefficients:
      (Intercept)    stack.xAir.Flow  
          -39.920              0.716  
stack.xWater.Temp  stack.xAcid.Conc.  
            1.295             -0.152  
> summary(lm.stack)
Call:
lm(formula = stack.loss ~ stack.x)

Residuals:
   Min     1Q Median     3Q    Max 
-7.238 -1.712 -0.455  2.361  5.698 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)
(Intercept)        -39.920     11.896   -3.36   0.0038
stack.xAir.Flow      0.716      0.135    5.31  5.8e-05
stack.xWater.Temp    1.295      0.368    3.52   0.0026
stack.xAcid.Conc.   -0.152      0.156   -0.97   0.3440
                     
(Intercept)       ** 
stack.xAir.Flow   ***
stack.xWater.Temp ** 
stack.xAcid.Conc.    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 3.24 on 17 degrees of freedom
Multiple R-squared: 0.914,	Adjusted R-squared: 0.898 
F-statistic: 59.9 on 3 and 17 DF,  p-value: 3.02e-09 
> par(mfrow = c(2, 2))
> plot(lm.stack)
plot of chunk unnamed-chunk-22
> par(mfrow = c(1, 1))

Functions that are defined on objects are called 'methods'. They will usually operate differently on different classes of objects.

We've already seen examples of this

  • summary(quakes) vs summary(lm.stack)
  • plot(quakes) vs plot(quakes$mag,quakes$depth) vs plot(lm.stack)