Functions and Iteration

RAdelaide 2024

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
Telethon Kids Institute

Published

July 10, 2024

Functions

Functions

  • Now familiar with using functions
  • Writing our own functions is an everyday skill in R
  • Sometimes complex \(\implies\) usually very simple
  • Mostly “inline” functions for simple data manipulation
    • Very common for axis labels in ggplot()
library(tidyverse)

A Quick Example

Orange %>% 
  ggplot(
    aes(
      age, circumference, 
      colour = Tree
    )
  ) +
  geom_point() +
  scale_colour_brewer(
    palette = "Set1"
  ) 

. . .

  • Let’s say that we wish to add the prefix ‘Tree’ to the legend

A Quick Example

Orange %>% 
  ggplot(
    aes(
      age, circumference,
      colour = Tree
    )
  ) +
  geom_point() +
  scale_colour_brewer(
    palette = "Set1",
    labels = \(x) paste("Tree", x)
  )

. . .

  • \(x) is shorthand for function(x) (since R v4.1)
  • All labels are passed to the function as x
  • New function notation was introduced in R v4.1 (2021)

Inline Functions

  • This is often referred to as an inline function
  • Usually very simple, single line functions
    • Often \(x) using x as the underlying value
  • We could’ve modified the underlying data (but didn’t)
  • Also very useful when using mutate() to modify columns

Inline Functions

  • A common step I use when modifying labels might be
flagstats <- c("properly_paired_reads", "unique_alignments")
flagstats %>% 
  str_replace_all("_", " ") %>% # Add spaces
  str_to_title() %>% # Capitalise the 1st letter
  str_wrap(12) # Add line breaks (\n) to a given width
[1] "Properly\nPaired Reads" "Unique\nAlignments"    

. . .


  • When modifying x-axis labels this might lead to:
scale_x_discrete(
  labels = \(x) x %>% str_replace_all("_", " ") %>% str_to_title() %>% str_wrap(12)
)

Piping like this is OK but up to you

Understanding Functions

A function really has multiple aspects

  1. The formals() \(\implies\) the arguments we pass
  2. The body() \(\implies\) the code that does stuff
  3. The environment() where calculations take place

Let’s look through sd() starting at the help page ?sd

Understanding Functions

formals(sd)
$x


$na.rm
[1] FALSE

. . .

  • These are the arguments (or formals) required by the function
  • na.rm has a default value (FALSE)

Understanding Functions

body(sd)
sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm))

. . .

Re-formatted that might be

sqrt(
  var(
    if (is.vector(x) || is.factor(x)) x else as.double(x), 
    na.rm = na.rm
  )
)

Understanding Functions

So the full function might be written as:

sd <- function(x, na.rm = FALSE) {
  sqrt(
    var(
      if (is.vector(x) || is.factor(x)) x else as.double(x), 
      na.rm = na.rm
    )
  )
}

No need to write this function

Understanding Functions

To make this more understandable

sd <- function(x, na.rm = FALSE) {
  ## Coerce to a double, only if needed
  if (!is.vector(x) || is.factor(x)) x <- as.double(x)
  ## Calculate the variance
  var_x <- var(x, na.rm = na.rm)
  ## Return the square root of the variance
  sqrt(var_x)
}

Writing Our Function

  • Let’s write that function for modifying labels
  • Start by deciding what the function might be called
    • Also what arguments we need
modify_labels <- function(x) {
  
}

Writing Our Function

  • The first step is to change "_" to spaces
    • The last line will be returned by default
modify_labels <- function(x) {
  str_replace_all(x, "_", " ")
}
modify_labels(flagstats)
[1] "properly paired reads" "unique alignments"    

Writing Our Function

  • We’re going to modify that again \(\implies\) let’s form an object
    • Then return the new object
modify_labels <- function(x) {
  new_x <- str_replace_all(x, "_", " ")
  new_x
}
modify_labels(flagstats)
[1] "properly paired reads" "unique alignments"    

Looking Inside the Function

  • Why are we referring to flagstats as x?

. . .

  • When we pass it to the function is temporarily renamed x
    \(\implies\) But where is it called x?

. . .

  • Each function has it’s own environment
    • Within the GlobalEnvironment but separate

Looking Inside the Function

  • If we add a browser() command, we can enter the function at that exact point
modify_labels <- function(x) {
  browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x
}
modify_labels(flagstats)
  • You should have been jumped to the Environnment Tab
  • The Console should also say Browse[1]>

This played up when testing both locally & on posit cloud

Looking Inside the Function

  • We’re now looking inside the environment within the function
  • By passing flagstats as x \(\implies\) this is it’s name within this environment
  • Modifying within the function will not change the object in the Global Environment

. . .

  • Within the browser execute
    new_x <- str_replace_all(x, "_", " ")

. . .

  • What happened?
  • Should’ve seen the new object created in the function environment

Writing Our Function

  • Exit the browser() by typing Q
  • Remove the browser() line by adding a comment (#) to the start

. . .

  • Now we can finish the function
modify_labels <- function(x) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, 12)
  new_x
}
modify_labels(flagstats)
[1] "Properly\nPaired Reads" "Unique\nAlignments"    

. . .

Why didn’t I use a pipe?

Extending Our Function

  • Can we also control the width at which the text wraps
    • Hard-wired to 12 internally

. . .

  • Add an extra argument called width with default value of 12
    • Now this can be changed any time we call the function
modify_labels <- function(x, width = 12) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, width)
  new_x
}
modify_labels(flagstats)
[1] "Properly\nPaired Reads" "Unique\nAlignments"    
modify_labels(flagstats, 80)
[1] "Properly Paired Reads" "Unique Alignments"    

Extending Our Function

  • In many help pages you may see ... as a function argument
  • This allows for passing arguments to internal functions
    • Are not required to be set specifically
  • Check the help page ?str_wrap

. . .

  • Notice there are four additional arguments:
    • width, indent, exdent and whitespace_only

Extending Our Function

  • Let’s remove width from our list of formal arguments
  • Replace with ...
  • Pass ... inside str_wrap
modify_labels <- function(x, ...) {
  # browser()
  new_x <- str_replace_all(x, "_", " ")
  new_x <- str_to_title(new_x)
  new_x <- str_wrap(new_x, ...)
  new_x
}
modify_labels(flagstats)
[1] "Properly Paired Reads" "Unique Alignments"    
modify_labels(flagstats, width = 12)
[1] "Properly\nPaired Reads" "Unique\nAlignments"    

Extending Our Function

  • The default values will now be applied by str_wrap() unless we change them
  • The following would indent the first line by 2 spaces
    • May be a little pointless …
modify_labels(flagstats, width = 12, indent = 2)
[1] "  Properly\nPaired Reads" "  Unique\nAlignments"    

. . .

  • When writing functions need to choose target function carefully

Finishing Our Function

  • This is a more formal implementation of the original process
  • These two are functionally identical
    • One more suited for inline use as a one-off
    • The other may be more useful if using repeatedly
one_liner <- \(x, ...) x %>% str_replace_all("_", " ") %>% str_to_title() %>% str_wrap(...)
one_liner(flagstats, width = 12)
[1] "Properly\nPaired Reads" "Unique\nAlignments"    
modify_labels(flagstats, width = 12)
[1] "Properly\nPaired Reads" "Unique\nAlignments"    

. . .

What else could we have done?

  • Added comments to explain each line
  • Added checks like stopifnot(is.character(x))

Iteration

Iteration

  • R sees everything as vectors
  • We didn’t need to modify each value of flagstats
    • Not the case for most languages
    • python, C, C++. perl etc step through vectors
      \(\implies\)process one value at a time

Iteration

  • A python-like process might’ve been
new_x <- c()
for (x in flagstats) {
  new_x <- c(new_x, modify_labels(x, width = 12))
}
new_x
[1] "Properly\nPaired Reads" "Unique\nAlignments"    

. . .

  1. We formed an empty object
  2. Stepped through every value in flagstats
  3. Modified values one at a time & added to new_x

. . .

Stupidly slow in R

Iteration

  • Each value was called x as we stepped through it
  • x is just a convention \(\implies\) can be anything (i, bob etc)
for (x in flagstats) {
  print(x)
}
[1] "properly_paired_reads"
[1] "unique_alignments"
for (i in flagstats) {
  print(i)
}
[1] "properly_paired_reads"
[1] "unique_alignments"
for (bob in flagstats) {
  print(bob)
}
[1] "properly_paired_reads"
[1] "unique_alignments"

Iteration

  • Because R works on vectors \(\implies\) rarely need to iterate on vectors

. . .

  • Lists however…

. . .

  • How would we get the length for each list element?
vals <- list(letters = letters, num = rnorm(1000))

. . .


Iteration is probably our first, best guess…

Iteration

len <- list()
for (x in vals) {
  len <- c(len, length(x))
}
names(len) <- names(vals)
len
$letters
[1] 26

$num
[1] 1000

The above

  1. Initialises an empty list len
  2. Steps through each element calling it x
  3. Finds the length of x and extends len
  4. Adds the names so we know which value goes with which element

The R Way to Iterate

  • Conventional iteration is very slow in R
  • Provides the function lapply
    • Stands for list apply
    • Applies a function to each element of a list

. . .

  • Syntax is lapply(list, function, ...)

. . .

lapply(vals, length)
$letters
[1] 26

$num
[1] 1000
  • No ... arguments needed here

Much simpler way to work & is faster

The R Way to Iterate

A quick example using the ...

lapply(vals, head, n = 4)
$letters
[1] "a" "b" "c" "d"

$num
[1] -0.06136509  1.17332403 -1.37271761 -0.97836301

The R Way to Iterate

  • lapply() will always return a list
  • Safest option
    • Calling head gave two elements of different types
    • Calling length gave two integer elements

. . .

  • We could’ve coerced the lengths into a vector
lapply(vals, length) %>% unlist()
letters     num 
     26    1000 

. . .

  • Only useful if returning a common type

The R Way to Iterate

  • If we know what we’ll get\(\implies\) map_*() functions
  • Part of purrr \(\implies\) core tidyverse package
map_int(vals, length)
letters     num 
     26    1000 

. . .

  • Alternatives are map_chr(), map_lgl(), map_dbl()
  • Will error if setting the wrong type
  • Only used when single values are returned

Getting Real

SNP Data

  • For the rest of this session we’ll look at some genotype data
  • Will put all the day’s material into practice
  • Everyone will have very different applications
  • Hopefully will help you figure out best approach for your data

. . .

  • Simulated data
  • Based on surviving moths after exposure to freezing temperature

SNP Data

snps <- read_csv("data/snps.csv")
dim(snps)
[1]  104 2001
snps[1:6, 1:10]
# A tibble: 6 × 10
  Population SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 
  <chr>      <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Control    AB    BB    AB    BB    BB    BB    BB    AB    AB   
2 Control    AB    AB    AB    AB    BB    <NA>  AA    AA    AB   
3 Control    BB    AB    AA    AA    AA    AB    AB    BB    AB   
4 Control    AB    AB    BB    AB    AB    AB    AA    AA    BB   
5 Control    BB    BB    AB    BB    AB    AA    AA    AB    AB   
6 Control    AB    AB    AA    AA    AB    <NA>  AB    BB    AB   

. . .

  • Each row represents a surviving moth
  • We have 104 moths with 2000 SNP genotypes

SNP Data

Our task is to:

  • Perform Fisher’s Exact Test on each SNP locus
    • Tests for association between genotype and survival
    • Could be allele count or genotypes
    • Dominant or recessive model
  • Decide which values to return
    • Probably a p-value
    • Do we want genotype counts? Odds Ratios?

. . .

  • lapply() and functions will be our friends

SNP Data

  • First let’s check our population sizes
snps %>% summarise(n = dplyr::n(), .by = Population)
# A tibble: 2 × 2
  Population     n
  <chr>      <int>
1 Control       53
2 Treat         51

Missing Genotypes

  • Check the missing genotypes

. . .

  • Know we know functions \(\implies\) across()
  • Is a dplyr function
    • Enables us to apply a function across zero or more columns
    • Uses tidyselect helpers
  • The help page has lots of information
    • Lets use it first

Missing Genotypes

  • Apply the function is.na() across all columns that start_with “SNP”
snps %>% 
  mutate(across(starts_with("SNP"), is.na)) %>% 
  dplyr::select(1:10)
# A tibble: 104 × 10
   Population SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 
   <chr>      <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 2 Control    FALSE FALSE FALSE FALSE FALSE TRUE  FALSE FALSE FALSE
 3 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 4 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 5 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 6 Control    FALSE FALSE FALSE FALSE FALSE TRUE  FALSE FALSE FALSE
 7 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 8 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 9 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
10 Control    FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# ℹ 94 more rows

Missing Genotypes

  • If we pass to summarise() we can count these accross all SNPs
    • Perfect opportunity for an inline function
snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  dplyr::select(1:10)
# A tibble: 1 × 10
   SNP1  SNP2  SNP3  SNP4  SNP5  SNP6  SNP7  SNP8  SNP9 SNP10
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     3     0     1     0     2     3     1     1     1     1

Missing Genotypes

  • This gives the missing count for all 2000 loci
  • Maybe pivot_longer() might help
snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  pivot_longer(everything(), names_to = "locus", values_to = "missing")
# A tibble: 2,000 × 2
   locus missing
   <chr>   <int>
 1 SNP1        3
 2 SNP2        0
 3 SNP3        1
 4 SNP4        0
 5 SNP5        2
 6 SNP6        3
 7 SNP7        1
 8 SNP8        1
 9 SNP9        1
10 SNP10       1
# ℹ 1,990 more rows

Missing Genotypes

  • Now we can summarise again
  • Will make a nice descriptive table in our rmarkdown report
snps %>% 
  summarise(
    across(starts_with("SNP"), \(x) sum(is.na(x)))
  ) %>% 
  pivot_longer(everything(), names_to = "locus", values_to = "missing") %>% 
  summarise(n = dplyr::n(), .by = missing) %>% 
  arrange(desc(n))
# A tibble: 7 × 2
  missing     n
    <int> <int>
1       1   721
2       0   706
3       2   361
4       3   164
5       4    39
6       5     8
7       6     1

Performing an Analysis

  • Let’s see if the A allele acts in a dominant manner
  • Compare the numbers with A alleles across populations

. . .

  • Classic Fisher’s Exact Test using a 2x2 table
A_TRUE A_FALSE
Control a b
Treat c d

. . .

  • No right or wrong strategy

Performing an Analysis

  • Start by converting to long form
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") 

. . .

  • Can easily remove the missing genotypes
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype))

Performing an Analysis

  • Check for the presence of an A allele
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) 

Performing an Analysis

  • Now count by presence of A
    • Set the grouping to be by Population, locus & A status
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A))

Performing an Analysis

  • Move the counts into TRUE/FALSE columns
    • The 2x2 tables now start to appear
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  arrange(locus)

Performing an Analysis

  • We can form a nested tibble for each locus
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_")))

Nesting Columns

  • This is a new idea \(\implies\) we now have a list column
  • Look at the first one to see what the elements look like
    • Not part of the analysis
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  slice(1) %>% pull(df)

Using lapply() On Nested Columns

  • We can use lapply() to perform an analysis on every nested df
snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")]))
  )

Using lapply() On Nested Columns

  • We now have a new list column with a list of results from each test
  • Objects of class htest
  • Will have an element called p.value
  • This is a double (i.e. numeric)

. . .

  • We can use map_dbl() to grab these values

Using lapply() On Nested Columns

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    p = map_dbl(ft, \(x) x$p.value),
  )

Using lapply() On Nested Columns

  • How about an Odds Ratio?
    • The OR is in an element called estimate

. . .

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
  )

Using lapply() On Nested Columns

  • Getting counts will require using df again

. . .

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    Control = map_int(df, \(x) dplyr::filter(x, Population == "Control")[["A_TRUE"]]),
    Treat = map_int(df, \(x) dplyr::filter(x, Population == "Treat")[["A_TRUE"]]),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
  )

The Final Analysis

snps %>% 
  pivot_longer(starts_with("SNP"), names_to = "locus", values_to = "genotype") %>% 
  dplyr::filter(!is.na(genotype)) %>% 
  mutate(A = str_detect(genotype, "A")) %>% 
  summarise(n = dplyr::n(), .by = c(Population, locus, A)) %>% 
  pivot_wider(
    names_from = "A", values_from = "n", values_fill = 0, names_prefix = "A_"
  ) %>% 
  nest(df = c(Population, starts_with("A_"))) %>% 
  mutate(
    ft = lapply(df, \(x) fisher.test(x[, c("A_TRUE", "A_FALSE")])),
    Control = map_int(df, \(x) dplyr::filter(x, Population == "Control")[["A_TRUE"]]),
    Treat = map_int(df, \(x) dplyr::filter(x, Population == "Treat")[["A_TRUE"]]),
    OR = map_dbl(ft, \(x) x$estimate),
    p = map_dbl(ft, \(x) x$p.value),
    adj_p = p.adjust(p, "bonferroni")
  ) %>% 
  arrange(p)
# A tibble: 2,000 × 8
   locus   df               ft      Control Treat    OR        p    adj_p
   <chr>   <list>           <list>    <int> <int> <dbl>    <dbl>    <dbl>
 1 SNP1716 <tibble [2 × 3]> <htest>      47     8  38.7 2.17e-14 4.35e-11
 2 SNP1236 <tibble [2 × 3]> <htest>      46    11  22.9 1.29e-11 2.57e- 8
 3 SNP1618 <tibble [2 × 3]> <htest>      45    12  22.7 3.00e-11 6.00e- 8
 4 SNP248  <tibble [2 × 3]> <htest>      44    10  18.8 7.91e-11 1.58e- 7
 5 SNP1730 <tibble [2 × 3]> <htest>      43    10  17.0 3.27e-10 6.54e- 7
 6 SNP311  <tibble [2 × 3]> <htest>      41    10  13.5 2.52e- 9 5.04e- 6
 7 SNP1385 <tibble [2 × 3]> <htest>      45    14  14.0 3.70e- 9 7.40e- 6
 8 SNP1647 <tibble [2 × 3]> <htest>      43    13  13.5 4.28e- 9 8.57e- 6
 9 SNP8    <tibble [2 × 3]> <htest>      46    16  13.5 9.41e- 9 1.88e- 5
10 SNP1993 <tibble [2 × 3]> <htest>      40    11  11.7 1.74e- 8 3.47e- 5
# ℹ 1,990 more rows

For those with an OR > 1 the A allele might be a dominant susceptibility locus

Summary

  • We could save this as a final object
  • Select our important columns and prepare a table

Summary

For this we needed to understand

  • When to use pivot_longer() and pivot_wider()
  • What is a list, vector and data.frame?
  • Difference between integer and double values
  • tidyselect helper functions + dplyr
  • How to use lapply() with inline functions
  • Extending lapply() using map_*() to produce vector output

Summary

  • Alternatives to map_*() are sapply() and vapply()
    • sapply() is slightly unpredictable
    • vapply() is a bit more clunky but powerful

. . .

  • Could’ve use unlist(lapply(...))

Why didn’t we?

Specifying the exact type will error if we’re wrong. This is good