Main ideas
- Make functions to automate tasks
- Discuss automation using
map_
functions - Introduce clean coding
map_
functionslibrary(tidyverse)
Functions allow you to automate tasks.
Let’s make an example dataset to work with.
Question: What does the function rnorm()
do?
Generates random draws from a normal distribution.
set.seed(2320)
ex_data <- tibble(
a = rnorm(5),
b = rnorm(5),
c = rnorm(5),
d = rnorm(5)
)
ex_data
## # A tibble: 5 x 4
## a b c d
## <dbl> <dbl> <dbl> <dbl>
## 1 0.356 -0.404 -1.21 1.28
## 2 0.544 -0.628 -0.862 -1.77
## 3 -0.936 1.36 -0.117 -0.491
## 4 0.915 0.759 -0.644 -0.644
## 5 -0.507 0.641 -0.272 0.664
Suppose we want to normalize the data so it lives between 0 and 1.
\[\dfrac{x_i - \text{min}(x)}{\text{max}(x) - \text{min}(x)}\]
ex_data <- ex_data %>% mutate(
a = (a - min(a))/(max(a) - min(a)),
b = (b - min(b))/(max(b) - min(b)),
c = (c - min(c))/(max(c) - min(c)),
d = (d - min(d))/(max(d) - min(d)))
ex_data
## # A tibble: 5 x 4
## a b c d
## <dbl> <dbl> <dbl> <dbl>
## 1 0.698 0.113 0 1
## 2 0.800 0 0.320 0
## 3 0 1 1 0.419
## 4 1 0.697 0.518 0.368
## 5 0.232 0.638 0.858 0.797
Don’t write code from scratch - start from working code.
(a - min(a))/(max(a) - min(a))
Question: How many inputs should this function have?
Just the one.
Choose an informative name.
rescale01 <-
Use function
to define a function.
rescale01 <- function
Specify the inputs (arguments) inside function. Multiple arguments can be included and separated by commas (function(x, y, z)
).
rescale01 <- function(x)
Create the body of the function using a {}
block immediately following function
.
rescale01 <- function(x){
}
Place your code in the body of the function.
rescale01 <- function(x){
(x - min(x)) / (max(x) - min(x))
}
Now let’s test rescale01
!
x1 <- 1:10
rescale01(x1)
## [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
## [8] 0.7777778 0.8888889 1.0000000
x2 <- c(1:10, NA)
rescale01(x2)
## [1] NA NA NA NA NA NA NA NA NA NA NA
Question: What’s going on here? Address this issue in the code chunk below.
rescale01a <- function(x) {
rangex <- max(x, na.rm = TRUE) - min(x, na.rm = TRUE)
(x - min(x, na.rm = TRUE)) / rangex
}
x1 <- 1:10
rescale01(x1)
## [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
## [8] 0.7777778 0.8888889 1.0000000
x2 <- c(1:10, NA)
rescale01(x2)
## [1] NA NA NA NA NA NA NA NA NA NA NA
do_something <- function(x, y, z){
# do bunch of stuff with the input...
# return a tibble
tibble(...)
}
Question: Does the function defined below behave as you expect? Why or why not?
The add_2()
function returns 1000 every time since 1000 is the last value computed in the function.
add_2 <- function(x){
x + 2
1000
}
add_2(998)
## [1] 1000
add_2(2)
## [1] 1000
add_2(100)
## [1] 1000
add_2(24)
## [1] 1000
Mapping allows us to apply a function to each element of an object and return a specific type of value.
Suppose we have exam 1 and exam 2 scores of 4 students stored in a list.
exam_scores <- list(
exam1 <- c(80, 90, 70, 50),
exam2 <- c(85, 83, 45, 60)
)
exam_scores
## [[1]]
## [1] 80 90 70 50
##
## [[2]]
## [1] 85 83 45 60
We can use map()
to find the mean score for each exam.
map(exam_scores, mean)
## [[1]]
## [1] 72.5
##
## [[2]]
## [1] 68.25
Suppose we want the results as a numeric (double) vector.
map_dbl(exam_scores, mean)
## [1] 72.50 68.25
What if we want the results as a character string?
map_chr(exam_scores, mean)
## [1] "72.500000" "68.250000"
map()
: - returns a listmap_lgl()
: - returns a logical vectormap_int()
: - returns an integer vectormap_dbl()
: - returns a double vectormap_chr()
: - returns a character vectormap_df()
/ map_dfr()
: returns a data frame by row bindingmap_dfc()
: returns a data frame by column bindingmap_dbl(ex_data, mean)
## a b c d
## 0.5460050 0.4894938 0.5392597 0.5168666
map_dbl(ex_data, median)
## a b c d
## 0.6982264 0.6377430 0.5182620 0.4185009
map_dbl(ex_data, sd)
## a b c d
## 0.4154975 0.4205473 0.4042266 0.3908414
Question: How many distinct observations are there in each column? Use an appropriate map_
function to answer.
mtcars %>% map_int(n_distinct)
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6
Code should express intent, use the correct parts of speech, have the length correspond to scope, and contain no disinformation (Martin).
For variables, what is it? For functions, what does it do? These should be expressed in the name of the variable or function.
Variables are nouns, functions are verbs, and predicates (T/F) are predicates. They should be named as such.
Small scope variables should have short names and longer scope variables should have long names.
The opposite is true for functions. Small scope functions should have long names, and long scope functions should have short names.
snake_case
not CamelCase
).str_trim
, str_sub
, str_remove
).Question: Why are the functions below bad?
mean <- function(x){
sum(x)
}
T <- FALSE
c <- 25