library(tidyverse)
library(palmerpenguins)
Functions and Iteration
RAdelaide 2025
Functions
Functions
- Now familiar with using functions
- Writing our own functions is an everyday skill in
R
- Sometimes complex \(\implies\) often very simple
- Mostly “inline” functions for simple data manipulation
- Very common for axis labels in
ggplot()
- Required for
across()
indplyr
- Very common for axis labels in
Using rename_with()
dplyr
allows you to rename columns of adata.frame
usingrename_with()
- Requires a function
|>
penguins rename_with(str_to_title)
# A tibble: 344 × 8
Species Island Bill_length_mm Bill_depth_mm Flipper_length_mm Body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: Sex <fct>, Year <int>
- How could we replace the underscores with a space and return everything in Title Case?
Using across()
- Sometimes we wish to perform an identical operation across multiple columns
- Find the max, min, mean, sd etc
- Format in a similar way
- The function
across()
is very powerful for this type of operation - Demonstrate using RA Fisher’s “iris” data
- Measure four variables for 3 species of iris
## Preview the data.frame called 'iris'
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Checking Missing Values
if_any()
andif_all()
are similar toacross()
, but apply logical tests- Can also take a list of functions
## Find all the missing values in the dataset
|>
penguins as_tibble() |>
::filter(
dplyr## if_any() is like a version of across, but performing logical tests
if_any(.cols = everything(), .fns = is.na)
)
# A tibble: 11 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen NA NA NA NA
2 Adelie Torgersen 34.1 18.1 193 3475
3 Adelie Torgersen 42 20.2 190 4250
4 Adelie Torgersen 37.8 17.1 186 3300
5 Adelie Torgersen 37.8 17.3 180 3700
6 Adelie Dream 37.5 18.9 179 2975
7 Gentoo Biscoe 44.5 14.3 216 4100
8 Gentoo Biscoe 46.2 14.4 214 4650
9 Gentoo Biscoe 47.3 13.8 216 4725
10 Gentoo Biscoe 44.5 15.7 217 4875
11 Gentoo Biscoe NA NA NA NA
# ℹ 2 more variables: sex <fct>, year <int>
Trying To Use across()
With penguins
- There are no missing values for Chinstrap \(\implies\) mean is returned
NA
values for the other species
|>
penguins as_tibble() |>
summarise(
## Select all numeric columns using `where()`
## This applies a logical test to each column & selects it if TRUE
across(where(is.numeric), mean), .by = species
)
# A tibble: 3 × 6
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelie NA NA NA NA 2008.
2 Gentoo NA NA NA NA 2008.
3 Chinstrap 48.8 18.4 196. 3733. 2008.
Inline Functions
- This is an every day process in
R
- Similar to above
- Modifying labels in plots
- Modifying factor levels
- We need to first learn about functions a bit more
How Functions Are Defined
Functions have three key components
- The arguments also known as the
formals()
- The code that is executed known as the
body()
- Their own environment
- When we pass data to a function it is renamed internally
- Everything is executed in a separate environment to the
GlobalEnvironment
Function Arguments
- The function
sd()
is a beautifully simple one - Check the help page:
?sd
- The arguments are:
x
: a numeric vectorna.rm
: a logical value
formals(sd)
$x
$na.rm
[1] FALSE
The Functon Body
- We can look at the code executed by a function by calling
body()
body(sd)
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
na.rm = na.rm))
Writing Our Own Functions
Writing Our Own Function
- Take as many question from the floor as needed
- Before we write a brief inline function \(\rightarrow\) let’s write a more formal one
- We’ll take a vector and transform everything to a \(Z\)-score
- First we decide on the function name:
z_score
- Just like a standard
R
object
- Just like a standard
- The contents of the
R
object are someR
code
<- function(x, na.rm = FALSE) {
z_score ## The key elements we need for a Z-score are the mean & SD of a vector
<- mean(x, na.rm = na.rm)
mn <- sd(x, na.rm = na.rm)
sd ## To calculate the z-score we subtract the mean, then divide by the SD
## The last line executed is what the function returns
- mn) / sd # No need to assign this internally to an object
(x }
The Ellipsis (...
)
R
has a very unique feature using the syntax...
- You may have seen this in multiple help pages
- Allows arguments to be passed internally to functions without being defined
- Makes it very powerful but a little dangerous
- Check the help page for
mean()
Closing Comments
S3 Method Dispatch
- The most common class system in
R
is theS3
class - Can make looking inside functions frustrating
- Look inside the function
mean
usingbody(mean)
UseMethod("mean")
- This relies on the idea that multiple versions of
mean
exist - Have been defined for objects of different classes
Challenge
- Try creating an inline function to rename
penguins
in Title Case - You’ll need to
- remove underscores & replace with spaces
- convert to title case
- decide what to do with
mm
(orMm
)
## As a starting hint
|>
penguins rename_with(
\(x) {str_to_title(x)
} )