Functions and Iteration

RAdelaide 2024

Author

Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
Telethon Kids Institute

Published

July 10, 2024

Functions

Now familiar with using functions
Writing our own functions is an everyday skill in R
Sometimes complex \(\implies\) usually very simple
Mostly “inline” functions for simple data manipulation
- Very common for axis labels in ggplot()

library(tidyverse)

A Quick Example

Orange %>% 
  ggplot(
    aes(
      age, circumference, 
      colour = Tree
    )
  ) +
  geom_point() +
  scale_colour_brewer(
    palette = "Set1"
  )

. . .

Let’s say that we wish to add the prefix ‘Tree’ to the legend

A Quick Example

Orange %>% 
  ggplot(
    aes(
      age, circumference,
      colour = Tree
    )
  ) +
  geom_point() +
  scale_colour_brewer(
    palette = "Set1",
    labels = \(x) paste("Tree", x)
  )

. . .

\(x) is shorthand for function(x) (since R v4.1)
All labels are passed to the function as x

New function notation was introduced in R v4.1 (2021)

Inline Functions

This is often referred to as an inline function
Usually very simple, single line functions
- Often \(x) using x as the underlying value
We could’ve modified the underlying data (but didn’t)
Also very useful when using mutate() to modify columns

Inline Functions

A common step I use when modifying labels might be

flagstats <- c("properly_paired_reads", "unique_alignments")
flagstats %>% 
  str_replace_all("_", " ") %>% # Add spaces
  str_to_title() %>% # Capitalise the 1st letter
  str_wrap(12) # Add line breaks (\n) to a given width

[1] "Properly\nPaired Reads" "Unique\nAlignments"

. . .

When modifying x-axis labels this might lead to:

scale_x_discrete(
  labels = \(x) x %>% str_replace_all("_", " ") %>% str_to_title() %>% str_wrap(12)
)

Piping like this is OK but up to you

Understanding Functions

A function really has multiple aspects

The formals() \(\implies\) the arguments we pass
The body() \(\implies\) the code that does stuff
The environment() where calculations take place

Let’s look through sd() starting at the help page ?sd

Understanding Functions

formals(sd)

$x


$na.rm
[1] FALSE

. . .

These are the arguments (or formals) required by the function
na.rm has a default value (FALSE)

Understanding Functions

body(sd)

sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))

. . .

Re-formatted that might be

sqrt(
  var(
    if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm
  )
)

Understanding Functions

So the full function might be written as:

sd <- function(x, na.rm = FALSE) {
  sqrt(
    var(
      if (is.vector(x) || is.factor(x)) x else as.double(x), 
      na.rm = na.rm
    )
  )
}

No need to write this function

Understanding Functions

To make this more understandable

sd <- function(x, na.rm = FALSE) {
  ## Coerce to a double, only if needed
  if (!is.vector(x) || is.factor(x)) x <- as.double(x)
  ## Calculate the variance
  var_x <- var(x, na.rm = na.rm)
  ## Return the square root of the variance
  sqrt(var_x)
}

Writing Our Function

Let’s write that function for modifying labels
Start by deciding what the function might be called
- Also what arguments we need

modify_labels <- function(x) {
  
}

Writing Our Function

The first step is to change "_" to spaces
- The last line will be returned by default

modify_labels <- function(x) {
  str_replace_all(x, "_", " ")
}
modify_labels(flagstats)

[1] "properly paired reads" "unique alignments"

Writing Our Function

We’re going to modify that again \(\implies\) let’s form an object
- Then return the new object

modify_labels <- function(x) {
  new_x <- str_replace_all(x, "_", " ")
  new_x
}
modify_labels(flagstats)

[1] "properly paired reads" "unique alignments"

Looking Inside the Function

Why are we referring to flagstats as x?

. . .

When we pass it to the function is temporarily renamed x
\(\implies\) But where is it called x?

. . .

Each function has it’s own environment
- Within the GlobalEnvironment but separate

Looking Inside the Function

If we add a browser() command, we can enter the function at that exact point

modify_labels <- function(x) {
  browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x
}
modify_labels(flagstats)

You should have been jumped to the Environnment Tab
The Console should also say Browse[1]>

This played up when testing both locally & on posit cloud

Looking Inside the Function

We’re now looking inside the environment within the function
By passing flagstats as x \(\implies\) this is it’s name within this environment
Modifying within the function will not change the object in the Global Environment

. . .

Within the browser execute
new_x <- str_replace_all(x, "_", " ")

. . .

What happened?

Should’ve seen the new object created in the function environment

Writing Our Function

Exit the browser() by typing Q
Remove the browser() line by adding a comment (#) to the start

. . .

Now we can finish the function

modify_labels <- function(x) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, 12)
  new_x
}
modify_labels(flagstats)

[1] "Properly\nPaired Reads" "Unique\nAlignments"

. . .

Why didn’t I use a pipe?

Extending Our Function

Can we also control the width at which the text wraps
- Hard-wired to 12 internally

. . .

Add an extra argument called width with default value of 12
- Now this can be changed any time we call the function

modify_labels <- function(x, width = 12) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, width)
  new_x
}
modify_labels(flagstats)

[1] "Properly\nPaired Reads" "Unique\nAlignments"

modify_labels(flagstats, 80)

[1] "Properly Paired Reads" "Unique Alignments"

Extending Our Function

In many help pages you may see ... as a function argument
This allows for passing arguments to internal functions
- Are not required to be set specifically
Check the help page ?str_wrap

. . .

Notice there are four additional arguments:
- width, indent, exdent and whitespace_only

Extending Our Function

Let’s remove width from our list of formal arguments
Replace with ...
Pass ... inside str_wrap

modify_labels <- function(x, ...) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, ...)
  new_x
}
modify_labels(flagstats)

[1] "Properly Paired Reads" "Unique Alignments"

modify_labels(flagstats, width = 12)

[1] "Properly\nPaired Reads" "Unique\nAlignments"

Extending Our Function

The default values will now be applied by str_wrap() unless we change them
The following would indent the first line by 2 spaces
- May be a little pointless …

modify_labels(flagstats, width = 12, indent = 2)

[1] "  Properly\nPaired Reads" "  Unique\nAlignments"

. . .

When writing functions need to choose target function carefully

Finishing Our Function

This is a more formal implementation of the original process
These two are functionally identical
- One more suited for inline use as a one-off
- The other may be more useful if using repeatedly

one_liner <- \(x, ...) x %>% str_replace_all("_", " ") %>% str_to_title() %>% str_wrap(...)
one_liner(flagstats, width = 12)

[1] "Properly\nPaired Reads" "Unique\nAlignments"

modify_labels(flagstats, width = 12)

[1] "Properly\nPaired Reads" "Unique\nAlignments"

. . .

What else could we have done?

Added comments to explain each line
Added checks like stopifnot(is.character(x))

Iteration

R sees everything as vectors
We didn’t need to modify each value of flagstats
- Not the case for most languages
- python, C, C++. perl etc step through vectors
  \(\implies\)process one value at a time

Iteration

A python-like process might’ve been

new_x <- c()
for (x in flagstats) {
  new_x <- c(new_x, modify_labels(x, width = 12))
}
new_x

[1] "Properly\nPaired Reads" "Unique\nAlignments"

. . .

We formed an empty object
Stepped through every value in flagstats
Modified values one at a time & added to new_x

. . .

Stupidly slow in R

Iteration

Each value was called x as we stepped through it
x is just a convention \(\implies\) can be anything (i, bob etc)

for (x in flagstats) {
  print(x)
}

[1] "properly_paired_reads"
[1] "unique_alignments"

for (i in flagstats) {
  print(i)
}

[1] "properly_paired_reads"
[1] "unique_alignments"

for (bob in flagstats) {
  print(bob)
}

[1] "properly_paired_reads"
[1] "unique_alignments"

Iteration

Because R works on vectors \(\implies\) rarely need to iterate on vectors

. . .

Lists however…

. . .

How would we get the length for each list element?

vals <- list(letters = letters, num = rnorm(1000))

. . .

Iteration is probably our first, best guess…

Iteration

len <- list()
for (x in vals) {
  len <- c(len, length(x))
}
names(len) <- names(vals)
len

$letters
[1] 26

$num
[1] 1000

The above

Initialises an empty list len
Steps through each element calling it x
Finds the length of x and extends len
Adds the names so we know which value goes with which element

The `R` Way to Iterate

Conventional iteration is very slow in R
Provides the function lapply
- Stands for list apply
- Applies a function to each element of a list

. . .

Syntax is lapply(list, function, ...)

. . .

lapply(vals, length)

$letters
[1] 26

$num
[1] 1000

No ... arguments needed here

Much simpler way to work & is faster

The `R` Way to Iterate

A quick example using the ...

lapply(vals, head, n = 4)

$letters
[1] "a" "b" "c" "d"

$num
[1] -0.06136509  1.17332403 -1.37271761 -0.97836301

The `R` Way to Iterate

lapply() will always return a list
Safest option
- Calling head gave two elements of different types
- Calling length gave two integer elements

. . .

We could’ve coerced the lengths into a vector

lapply(vals, length) %>% unlist()

letters     num 
     26    1000

. . .

Only useful if returning a common type

The `R` Way to Iterate

If we know what we’ll get\(\implies\) map_*() functions
Part of purrr \(\implies\) core tidyverse package

map_int(vals, length)

letters     num 
     26    1000

. . .

Alternatives are map_chr(), map_lgl(), map_dbl()
Will error if setting the wrong type
Only used when single values are returned

Getting Real

SNP Data

For the rest of this session we’ll look at some genotype data
Will put all the day’s material into practice
Everyone will have very different applications
Hopefully will help you figure out best approach for your data

. . .

Simulated data
Based on surviving moths after exposure to freezing temperature

SNP Data

snps <- read_csv("data/snps.csv")
dim(snps)

[1]  104 2001

snps[1:6, 1:10]

# A tibble: 6 × 10
  Population SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 
  <chr>      <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Control    AB    BB    AB    BB    BB    BB    BB    AB    AB   
2 Control    AB    AB    AB    AB    BB    <NA>  AA    AA    AB   
3 Control    BB    AB    AA    AA    AA    AB    AB    BB    AB   
4 Control    AB    AB    BB    AB    AB    AB    AA    AA    BB   
5 Control    BB    BB    AB    BB    AB    AA    AA    AB    AB   
6 Control    AB    AB    AA    AA    AB    <NA>  AB    BB    AB

. . .

Each row represents a surviving moth
We have 104 moths with 2000 SNP genotypes

SNP Data

Our task is to:

Perform Fisher’s Exact Test on each SNP locus
- Tests for association between genotype and survival
- Could be allele count or genotypes
- Dominant or recessive model
Decide which values to return
- Probably a p-value
- Do we want genotype counts? Odds Ratios?

. . .

lapply() and functions will be our friends

SNP Data

First let’s check our population sizes

snps %>% summarise(n = dplyr::n(), .by = Population)

# A tibble: 2 × 2
  Population     n
  <chr>      <int>
1 Control       53
2 Treat         51

Missing Genotypes

Check the missing genotypes

. . .

Know we know functions \(\implies\) across()
Is a dplyr function
- Enables us to apply a function across zero or more columns
- Uses tidyselect helpers
The help page has lots of information
- Lets use it first

Missing Genotypes

Apply the function is.na() across all columns that start_with “SNP”

snps %>% 
  mutate(across(starts_with("SNP"), is.na)) %>% 
  dplyr::select(1:10)

# A tibble: 104 × 10
   Population SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 
   <chr>      <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 2 Control    FALSE FALSE FALSE FALSE FALSE TRUE  FALSE FALSE FALSE
 3 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 4 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 5 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 6 Control    FALSE FALSE FALSE FALSE FALSE TRUE  FALSE FALSE FALSE
 7 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 8 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 9 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
10 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# ℹ 94 more rows

Missing Genotypes

If we pass to summarise() we can count these accross all SNPs
- Perfect opportunity for an inline function

snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  dplyr::select(1:10)

# A tibble: 1 × 10
   SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 SNP10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     3     0     1     0     2     3     1     1     1     1

Missing Genotypes

This gives the missing count for all 2000 loci
Maybe pivot_longer() might help

snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  pivot_longer(everything(), names_to = "locus", values_to = "missing")

# A tibble: 2,000 × 2
   locus missing
   <chr>   <int>
 1 SNP1        3
 2 SNP2        0
 3 SNP3        1
 4 SNP4        0
 5 SNP5        2
 6 SNP6        3
 7 SNP7        1
 8 SNP8        1
 9 SNP9        1
10 SNP10       1
# ℹ 1,990 more rows

Missing Genotypes

Now we can summarise again
Will make a nice descriptive table in our rmarkdown report

snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  pivot_longer(everything(), names_to = "locus", values_to = "missing") %>% 
  summarise(n = dplyr::n(), .by = missing) %>% 
  arrange(desc(n))

# A tibble: 7 × 2
  missing     n
    <int> <int>
1       1   721
2       0   706
3       2   361
4       3   164
5       4    39
6       5     8
7       6     1

Performing an Analysis

Let’s see if the A allele acts in a dominant manner
Compare the numbers with A alleles across populations

. . .

Classic Fisher’s Exact Test using a 2x2 table

	A_TRUE	A_FALSE
Control	a	b
Treat	c	d

. . .

No right or wrong strategy

Performing an Analysis

Start by converting to long form

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype")

. . .

Can easily remove the missing genotypes

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype))

Performing an Analysis

Check for the presence of an A allele

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A"))

Performing an Analysis

Now count by presence of A
- Set the grouping to be by Population, locus & A status

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A))

Performing an Analysis

Move the counts into TRUE/FALSE columns
- The 2x2 tables now start to appear

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  arrange(locus)

Performing an Analysis

We can form a nested tibble for each locus

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_")))

Nesting Columns

This is a new idea \(\implies\) we now have a list column
Look at the first one to see what the elements look like
- Not part of the analysis

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  slice(1) %>% pull(df)

Using `lapply()` On Nested Columns

We can use lapply() to perform an analysis on every nested df

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")]))
  )

Using `lapply()` On Nested Columns

We now have a new list column with a list of results from each test
Objects of class htest
Will have an element called p.value
This is a double (i.e. numeric)

. . .

We can use map_dbl() to grab these values

Using `lapply()` On Nested Columns

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    p = map_dbl(ft, \(x) x$p.value),
  )

Using `lapply()` On Nested Columns

How about an Odds Ratio?
- The OR is in an element called estimate

. . .

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
  )

Using `lapply()` On Nested Columns

Getting counts will require using df again

. . .

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    Control = map_int(df, \(x) dplyr::filter(x, Population == "Control")[["A_TRUE"]]),
    Treat = map_int(df, \(x) dplyr::filter(x, Population == "Treat")[["A_TRUE"]]),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
  )

The Final Analysis

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    Control = map_int(df, \(x) dplyr::filter(x, Population == "Control")[["A_TRUE"]]),
    Treat = map_int(df, \(x) dplyr::filter(x, Population == "Treat")[["A_TRUE"]]),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
    adj_p = p.adjust(p, "bonferroni")
  ) %>% 
  arrange(p)

# A tibble: 2,000 × 8
   locus   df               ft      Control Treat    OR        p    adj_p
   <chr>   <list>           <list>    <int> <int> <dbl>    <dbl>    <dbl>
 1 SNP1716 <tibble [2 × 3]> <htest>      47     8  38.7 2.17e-14 4.35e-11
 2 SNP1236 <tibble [2 × 3]> <htest>      46    11  22.9 1.29e-11 2.57e- 8
 3 SNP1618 <tibble [2 × 3]> <htest>      45    12  22.7 3.00e-11 6.00e- 8
 4 SNP248  <tibble [2 × 3]> <htest>      44    10  18.8 7.91e-11 1.58e- 7
 5 SNP1730 <tibble [2 × 3]> <htest>      43    10  17.0 3.27e-10 6.54e- 7
 6 SNP311  <tibble [2 × 3]> <htest>      41    10  13.5 2.52e- 9 5.04e- 6
 7 SNP1385 <tibble [2 × 3]> <htest>      45    14  14.0 3.70e- 9 7.40e- 6
 8 SNP1647 <tibble [2 × 3]> <htest>      43    13  13.5 4.28e- 9 8.57e- 6
 9 SNP8    <tibble [2 × 3]> <htest>      46    16  13.5 9.41e- 9 1.88e- 5
10 SNP1993 <tibble [2 × 3]> <htest>      40    11  11.7 1.74e- 8 3.47e- 5
# ℹ 1,990 more rows

For those with an OR > 1 the A allele might be a dominant susceptibility locus

Summary

We could save this as a final object
Select our important columns and prepare a table

Summary

For this we needed to understand

When to use pivot_longer() and pivot_wider()
What is a list, vector and data.frame?
Difference between integer and double values
tidyselect helper functions + dplyr
How to use lapply() with inline functions
Extending lapply() using map_*() to produce vector output

Summary

Alternatives to map_*() are sapply() and vapply()
- sapply() is slightly unpredictable
- vapply() is a bit more clunky but powerful

. . .

Could’ve use unlist(lapply(...))

Why didn’t we?

Specifying the exact type will error if we’re wrong. This is good

Functions

Functions

A Quick Example

A Quick Example

Inline Functions

Inline Functions

Understanding Functions

Understanding Functions

Understanding Functions

Understanding Functions

Understanding Functions

Writing Our Function

Writing Our Function

Writing Our Function

Looking Inside the Function

Looking Inside the Function

Looking Inside the Function

Writing Our Function

Extending Our Function

Extending Our Function

Extending Our Function

Extending Our Function

Finishing Our Function

Iteration

Iteration

Iteration

Iteration

Iteration

Iteration

The R Way to Iterate

The R Way to Iterate

The R Way to Iterate

The R Way to Iterate

Getting Real

SNP Data

SNP Data

SNP Data

SNP Data

Missing Genotypes

Missing Genotypes

Missing Genotypes

Missing Genotypes

Missing Genotypes

Performing an Analysis

Performing an Analysis

Performing an Analysis

Performing an Analysis

Performing an Analysis

Performing an Analysis

Nesting Columns

Using lapply() On Nested Columns

Using lapply() On Nested Columns

Using lapply() On Nested Columns

Using lapply() On Nested Columns

Using lapply() On Nested Columns

The Final Analysis

Summary

Summary

Summary

The `R` Way to Iterate

The `R` Way to Iterate

The `R` Way to Iterate

The `R` Way to Iterate

Using `lapply()` On Nested Columns

Using `lapply()` On Nested Columns

Using `lapply()` On Nested Columns

Using `lapply()` On Nested Columns

Using `lapply()` On Nested Columns