Song of the Day

Main ideas

Coming up

Functions

library(tidyverse)

Functions allow you to automate tasks.

Let’s make an example dataset to work with.

Question: What does the function rnorm() do?

Generates random draws from a normal distribution.

set.seed(2320)

ex_data <- tibble(
  a = rnorm(5),
  b = rnorm(5),
  c = rnorm(5),
  d = rnorm(5)
)
ex_data
## # A tibble: 5 x 4
##        a      b      c      d
##    <dbl>  <dbl>  <dbl>  <dbl>
## 1  0.356 -0.404 -1.21   1.28 
## 2  0.544 -0.628 -0.862 -1.77 
## 3 -0.936  1.36  -0.117 -0.491
## 4  0.915  0.759 -0.644 -0.644
## 5 -0.507  0.641 -0.272  0.664

Suppose we want to normalize the data so it lives between 0 and 1.

\[\dfrac{x_i - \text{min}(x)}{\text{max}(x) - \text{min}(x)}\]

ex_data <- ex_data %>% mutate(
  a = (a - min(a))/(max(a) - min(a)),
  b = (b - min(b))/(max(b) - min(b)),
  c = (c - min(c))/(max(c) - min(c)),
  d = (d - min(d))/(max(d) - min(d)))

ex_data
## # A tibble: 5 x 4
##       a     b     c     d
##   <dbl> <dbl> <dbl> <dbl>
## 1 0.698 0.113 0     1    
## 2 0.800 0     0.320 0    
## 3 0     1     1     0.419
## 4 1     0.697 0.518 0.368
## 5 0.232 0.638 0.858 0.797

Don’t write code from scratch - start from working code.

(a - min(a))/(max(a) - min(a))

Question: How many inputs should this function have?

Just the one.

Choose an informative name.

rescale01 <- 

Use function to define a function.

rescale01 <- function

Specify the inputs (arguments) inside function. Multiple arguments can be included and separated by commas (function(x, y, z)).

rescale01 <- function(x)

Create the body of the function using a {} block immediately following function.

rescale01 <- function(x){
  
}

Place your code in the body of the function.

rescale01 <- function(x){
  
  (x - min(x)) / (max(x) - min(x))

}

Now let’s test rescale01!

x1 <- 1:10
rescale01(x1)
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000
x2 <- c(1:10, NA)
rescale01(x2)
##  [1] NA NA NA NA NA NA NA NA NA NA NA

Question: What’s going on here? Address this issue in the code chunk below.

rescale01a <- function(x) {
  rangex <- max(x, na.rm = TRUE) - min(x, na.rm = TRUE)
  (x - min(x, na.rm = TRUE)) / rangex
}

x1 <- 1:10
rescale01(x1)
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000
x2 <- c(1:10, NA)
rescale01(x2)
##  [1] NA NA NA NA NA NA NA NA NA NA NA
 do_something <- function(x, y, z){
  # do bunch of stuff with the input...
  
  # return a tibble
  tibble(...)
}

Question: Does the function defined below behave as you expect? Why or why not?

The add_2() function returns 1000 every time since 1000 is the last value computed in the function.

add_2 <- function(x){
  x + 2
  1000
}

add_2(998)
## [1] 1000
add_2(2)
## [1] 1000
add_2(100)
## [1] 1000
add_2(24)
## [1] 1000

Automation: Mapping

Mapping allows us to apply a function to each element of an object and return a specific type of value.

Suppose we have exam 1 and exam 2 scores of 4 students stored in a list.

exam_scores <- list(
  exam1 <- c(80, 90, 70, 50),
  exam2 <- c(85, 83, 45, 60)
)
exam_scores
## [[1]]
## [1] 80 90 70 50
## 
## [[2]]
## [1] 85 83 45 60

We can use map() to find the mean score for each exam.

map(exam_scores, mean)
## [[1]]
## [1] 72.5
## 
## [[2]]
## [1] 68.25

Suppose we want the results as a numeric (double) vector.

map_dbl(exam_scores, mean)
## [1] 72.50 68.25

What if we want the results as a character string?

map_chr(exam_scores, mean)
## [1] "72.500000" "68.250000"
map_dbl(ex_data, mean)
##         a         b         c         d 
## 0.5460050 0.4894938 0.5392597 0.5168666
map_dbl(ex_data, median)
##         a         b         c         d 
## 0.6982264 0.6377430 0.5182620 0.4185009
map_dbl(ex_data, sd)
##         a         b         c         d 
## 0.4154975 0.4205473 0.4042266 0.3908414

Question: How many distinct observations are there in each column? Use an appropriate map_ function to answer.

mtcars %>% map_int(n_distinct)
##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6

Clean Coding

Code should express intent, use the correct parts of speech, have the length correspond to scope, and contain no disinformation (Martin).

For variables, what is it? For functions, what does it do? These should be expressed in the name of the variable or function.

Variables are nouns, functions are verbs, and predicates (T/F) are predicates. They should be named as such.

Small scope variables should have short names and longer scope variables should have long names.

The opposite is true for functions. Small scope functions should have long names, and long scope functions should have short names.

Question: Why are the functions below bad?

mean <- function(x){
  sum(x)
}

T <- FALSE

c <- 25

Sources and Additional Information