Spreadsheet-Like Data

ASI: Introduction to R

Author

Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

Published

September 3, 2025

Working With
Spreadsheet-Like Data

Section Setup

Clear your R Environment
Create a new script called ‘SpreadsheetLikeData.R’
Load the tidyverse

library(tidyverse)

We’ll cover:

SQL- and Excel-like functions in dplyr
Pivot Tables and Manipulation using tidyr
Chaining function together using the pipe operator

The Pipe Operator

In real-world analyses we often perform multiple steps
The classic R approach was to create a new object at each step
- Leads to congested environments with multiple similar objects
- e.g. data_raw, data_sorted, data_subset

In R v4.1.0 the pipe operator was introduced: |>
Takes the output of one function and connects it to the input of another
- An assembly line \(\implies\) like taking the output of one process and sending straight to the input of another process

The Pipe Operator

If R ran an assembly line making toy cars

hot_plastic <- heat(plastic)
step1 <- mould(hot_plastic)
step2 <- paint(step1, "red")
step3 <- attach_wheels(step2)
final_car <- package(step3)

Using the |> operator this might become

final_car <- plastic |>
  heat() |>
  mould() |>
  paint("red") |>
  attach_wheels() |>
  package()

The Pipe Operator

head(cars)

  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10

cars |> head()

  speed dist
1     4    2
2     4   10
3     7    4
4     7   22
5     8   16
6     9   10

We’ll use this approach for the remainder of the session

Returning To The Penguins

We’ll use my_penguins for this session

my_penguins <- read_csv("data/my_penguins.csv")

This is a tibble() (i.e. a conveniently wrapped data.frame)
We can easily get key attributes about these objects

dim(my_penguins)
nrow(my_penguins)
ncol(my_penguins)
colnames(my_penguins)

Using the Pipe Operator

Once we move beyond 2 piped functions, it’s very common to write each one on a separate line

We could also perform these operations the pipe (|>)

my_penguins |> colnames()

[1] "species"           "island"            "bill_length_mm"   
[4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
[7] "sex"               "year"

This approach combines nicely with stringr

my_penguins |>
  colnames() |>
  str_subset(pattern = "bill")

[1] "bill_length_mm" "bill_depth_mm"

Subsetting a Data Frame

We can subset a tibble using [,] just like a data.frame

my_penguins[1:3, c("species", "island", "body_mass_g")]

# A tibble: 3 × 3
  species island    body_mass_g
  <chr>   <chr>           <dbl>
1 Adelie  Torgersen        3750
2 Adelie  Torgersen        3800
3 Adelie  Torgersen        3250

Subsetting a Data Frame

dplyr provides two separate functions for this
Designed for integration with the pipe
select() allows for selecting columns by position or name
slice() allows for cutting out rows by position only
- filter() (coming soon) allows for returning rows by a key value

## Grab the first 3 rows from the species, island and body_mass_g columns
my_penguins |>
  select(species, island, body_mass_g) |>
  slice(1:3)

# A tibble: 3 × 3
  species island    body_mass_g
  <chr>   <chr>           <dbl>
1 Adelie  Torgersen        3750
2 Adelie  Torgersen        3800
3 Adelie  Torgersen        3250

Subsetting a Data Frame

An advantage of this approach is the use of helpers
- starts_with(), ends_with()
- contains(), matches()
- all_of(), any_of()
- everything()
- where()

## Just return the columns species and island, along with thos measured in mm
my_penguins |>
  select(species, island, ends_with("mm"))

Subsetting a Data Frame

## Just return the columns species & island along with bill measurements
my_penguins |>
  select(species, island, contains("bill"))

## Return the columns which are character vectors then place all other columns 
## after them. `where()` applies the logical test `is.character()` to each column
my_penguins |>
  select(where(is.character), everything())

Subsetting a Data Frame

Column names above haven’t been quoted
any_of() and all_of() take character vectors

## `all_of` requires all columns to exist
my_penguins |>
  select(all_of(c("species", "island", "sex")))

## `any_of` will ignore any columns that don't exist
my_penguins |>
  select(any_of(c("species", "island", "sex", "nonsense")))

Excel-Like Functions in `dplyr`

Two common tasks in Excel are 1) sorting a column and 2) applying a filter
In dplyr the relevant functons are arrange() and filter()

## Show the penguins in order of weight. This will be ascending by default
my_penguins |>
  arrange(body_mass_g)

# A tibble: 333 × 8
   species   island   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>     <chr>             <dbl>         <dbl>             <dbl>       <dbl>
 1 Chinstrap Dream              46.9          16.6               192        2700
 2 Adelie    Biscoe             36.5          16.6               181        2850
 3 Adelie    Biscoe             36.4          17.1               184        2850
 4 Adelie    Biscoe             34.5          18.1               187        2900
 5 Adelie    Dream              33.1          16.1               178        2900
 6 Adelie    Torgers…           38.6          17                 188        2900
 7 Chinstrap Dream              43.2          16.6               187        2900
 8 Adelie    Biscoe             37.9          18.6               193        2925
 9 Adelie    Dream              37            16.9               185        3000
10 Adelie    Dream              37.3          16.8               192        3000
# ℹ 323 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Excel-Like Functions in `dplyr`

Notice we’re not saving any new objects here. We’re simply exploring our data

To sort in descending order, we wrap the column name inside desc()

## Show the penguins in descending order of weight
my_penguins |>
  arrange(desc(body_mass_g))

# A tibble: 333 × 8
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>           <dbl>         <dbl>             <dbl>       <dbl>
 1 Gentoo  Biscoe           49.2          15.2               221        6300
 2 Gentoo  Biscoe           59.6          17                 230        6050
 3 Gentoo  Biscoe           51.1          16.3               220        6000
 4 Gentoo  Biscoe           48.8          16.2               222        6000
 5 Gentoo  Biscoe           45.2          16.4               223        5950
 6 Gentoo  Biscoe           49.8          15.9               229        5950
 7 Gentoo  Biscoe           48.4          14.6               213        5850
 8 Gentoo  Biscoe           49.3          15.7               217        5850
 9 Gentoo  Biscoe           55.1          16                 230        5850
10 Gentoo  Biscoe           49.5          16.2               229        5800
# ℹ 323 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Applying a Filter

We can subset using filter()

## Show only the penguins heavier than 4kg
my_penguins |>
  filter(body_mass_g > 4000)

# A tibble: 167 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
 1 Adelie  Torgersen           39.2          19.6               195        4675
 2 Adelie  Torgersen           34.6          21.1               198        4400
 3 Adelie  Torgersen           42.5          20.7               197        4500
 4 Adelie  Torgersen           46            21.5               194        4200
 5 Adelie  Dream               39.2          21.1               196        4150
 6 Adelie  Dream               39.8          19.1               184        4650
 7 Adelie  Dream               44.1          19.7               196        4400
 8 Adelie  Dream               39.6          18.8               190        4600
 9 Adelie  Dream               42.3          21.2               191        4150
10 Adelie  Biscoe              40.1          18.9               188        4300
# ℹ 157 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Applying a Filter

filter() relies on logical tests being performed on the column of interest

Symbol	Meaning
`==`	Is equal to
`!=`	Is NOT equal to
`>=` / `<=`	Is greater/less than or equal to
`>` / `<`	Is greater/less than
`%in%`	Is IN a set of values
`is.na()`	Is a missing value (i.e. `NA`)

In most languages, ! means NOT

Applying a Filter

## Show only the Chinstrap penguins heavier than 4kg
my_penguins |>
  filter(species == "Chinstrap", body_mass_g > 4000)

# A tibble: 15 × 8
   species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>     <chr>           <dbl>         <dbl>             <dbl>       <dbl>
 1 Chinstrap Dream            46            18.9               195        4150
 2 Chinstrap Dream            52            18.1               201        4050
 3 Chinstrap Dream            50.5          19.6               201        4050
 4 Chinstrap Dream            49.2          18.2               195        4400
 5 Chinstrap Dream            52            19                 197        4150
 6 Chinstrap Dream            52.8          20                 205        4550
 7 Chinstrap Dream            54.2          20.8               201        4300
 8 Chinstrap Dream            51            18.8               203        4100
 9 Chinstrap Dream            52            20.7               210        4800
10 Chinstrap Dream            53.5          19.9               205        4500
11 Chinstrap Dream            50.8          18.5               201        4450
12 Chinstrap Dream            49            19.6               212        4300
13 Chinstrap Dream            50.7          19.7               203        4050
14 Chinstrap Dream            49.3          19.9               203        4050
15 Chinstrap Dream            50.8          19                 210        4100
# ℹ 2 more variables: sex <chr>, year <dbl>

Applying a Filter

## Show the Adelie or Chinstrap penguins
my_penguins |>
  filter(species %in% c("Adelie", "Chinstrap"))

# A tibble: 214 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 204 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Applying a Filter

## Alternatively, exclude the Gentoo penguins. This does the same as previously
my_penguins |>
  filter(species != "Gentoo")

# A tibble: 214 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 204 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Applying a Filter

str_detect() can also play a role here

## Match the patterns Ade or Chin to show only the Adelie or Chinstrap penguins
my_penguins |>
  filter(str_detect(species, "Ade|Chin"))

# A tibble: 214 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 204 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Combining Functions

## Match the patterns Ade or Chin to show only the Adelie or Chinstrap penguins
## Show the results in decreasing order of weight
my_penguins |>
  filter(str_detect(species, "Ade|Chin")) |>
  arrange(desc(body_mass_g))

# A tibble: 214 × 8
   species   island   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>     <chr>             <dbl>         <dbl>             <dbl>       <dbl>
 1 Chinstrap Dream              52            20.7               210        4800
 2 Adelie    Biscoe             43.2          19                 197        4775
 3 Adelie    Biscoe             41            20                 203        4725
 4 Adelie    Torgers…           42.9          17.6               196        4700
 5 Adelie    Torgers…           39.2          19.6               195        4675
 6 Adelie    Dream              39.8          19.1               184        4650
 7 Adelie    Dream              39.6          18.8               190        4600
 8 Adelie    Biscoe             45.6          20.3               191        4600
 9 Chinstrap Dream              52.8          20                 205        4550
10 Adelie    Torgers…           42.5          20.7               197        4500
# ℹ 204 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>

Summarising Data

Creating tables of counts is simple \(\implies\) just pass the column names to count()

my_penguins |>
  count(species, island)

# A tibble: 5 × 3
  species   island        n
  <chr>     <chr>     <int>
1 Adelie    Biscoe       44
2 Adelie    Dream        55
3 Adelie    Torgersen    47
4 Chinstrap Dream        68
5 Gentoo    Biscoe      119

Creating Summary Plots

This is a great example of how powerful the piping approach is

## Create a summary on the fly, followed 
## by a very barplot showing how penguins
## are distributed on the three islands
my_penguins |>
  count(species, island) |>
  ggplot(aes(species, n, fill = island)) +
  geom_col() +
  scale_fill_brewer(palette = "Set1") +
  theme_bw()

Summarising Data

Sometimes we need more sophisticated summaries or summary values

my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    sd_weight = sd(body_mass_g)
  )

To show these by any category we can use the .by = argument

## Find the mean & sd of body mass by species
my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    sd_weight = sd(body_mass_g),
    .by = species
  )

Summarising Data

## Create a barplot with error bars
my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    sd_weight = sd(body_mass_g),
    .by = species
  ) |>
  ggplot(
    aes(species, mn_weight, fill = species)
  ) +
  geom_col() +
  geom_errorbar(
    aes(
      ymin = mn_weight - sd_weight,
      ymax = mn_weight + sd_weight
    ),
    width = 0.3
  ) +
  scale_fill_brewer(palette = "Set1") +
  theme_bw()

Modifying Columns

So far we’ve only selected, arranged, filtered and summarised existing columns
mutate() can be used to:
1. Modify existing columns, or
2. Add a new column

Do we think any of these columns are categorical
- We could turn these into factors!!!

Modifying Columns

my_penguins |>
  mutate(
    species = fct(species),
    island = fct(island),
    ## The default order for `fct()` will be in order of appearance
    ## Set levels manually to place female first
    sex = fct(sex, levels = c("female", "male")),
    ## `fct()` requires a character vector
    ## We can also pipe internally!!!
    year = year |> as.character() |> fct()
  )

Have we changed the original data?

No, we’ve just changed it on the fly as we inspect it

Modifying Columns

Leaving the comments here will be like a gift to future you when you can’t remember what you were thinking

I will usually modify columns like this when I’m loading the data
This ensures the entire workflow uses the identical dataset from the beginning

my_penguins <- read_csv("data/my_penguins.csv") |>
  mutate(
    species = fct(species),
    island = fct(island),
    ## The default order for `fct()` will be in order of appearance
    ## Set levels manually to place female first
    sex = fct(sex, levels = c("female", "male")),
    ## `fct()` requires a character vector
    ## We can also pipe internally!!!
    year = year |> as.character() |> fct()
  )

Reshaping Data with `tidyr`

Using `tidyr`

The data structures so far have been great for plotting
Generally we’ve seen data in long form
- Each value type has it’s own column
Summary tables can be a little different \(\implies\) tidyr
- Pivoting Data with pivot_longer() and pivot_wider()
- Merging/Separating Columns with unite() and separate()

Earlier we found the mean & sd of the weight by species and island
- Let’s include sex then show results in seprate columns

Changing From Long To Wide Form

## Calculate the mean weight by species, island and sex
my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    .by = c(species, island, sex)
  )

## Calculate the mean weight by species, island and sex
## Then place values for male & female in separate columns
my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    .by = c(species, island, sex)
  ) |>
  pivot_wider(names_from = sex, values_from = mn_weight)

This is a great summary table already \(\implies\) how do we include SD?

Changing From Long To Wide Form

Pivot wider can move multiple columns into wide format
The tidyselect helper ends_with() we saw earlier will work here too

my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    ## This is a fitted value so standard error may be more appropriate
    se_weight = sd(body_mass_g) / sqrt(n()),
    .by = c(species, island, sex)
  ) |>
  ## Use `ends_with("weight")` to pivot the appropriate columns
  pivot_wider(names_from = sex, values_from = ends_with("weight"))

Combining Columns

Now we have the mean & standard error
How do we combine these for a nice printed table \(\implies\) unite()

The first argument is the name of the new column
Then use tidyselect helpers to select the columns to “unite”

my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    se_weight = sd(body_mass_g) / sqrt(n()),
    .by = c(species, island, sex)
  ) |>
  pivot_wider(names_from = sex, values_from = ends_with("weight")) |>
  ## Include the underscore for the male columns, but not for the female columns
  unite(male, ends_with("_male"), sep = " +/- ") |>
  unite(female, ends_with("female"), sep = " +/- ")

Are we happy with this? No. We need to round the values

Applying Functions To Multiple Columns

Sometimes we need to apply the same function to multiple columns
The function across() is perfect for this situation
- Can be applied inside mutate() or summarise()

First we specify the columns using the tidyselect helpers
Then we provide the function to apply

across(.cols, .fns)

Applying Functions To Multiple Columns

Here we’ll call across(ends_with("male"), round) within a mutate()

my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    ## This is a fitted value so standard error may be more appropriate
    se_weight = sd(body_mass_g) / sqrt(n()),
    .by = c(species, island, sex)
  ) |>
  ## Use `ends_with("weight")` to pivot the appropriate columns
  pivot_wider(names_from = sex, values_from = ends_with("weight")) |>
  ## Apply the function `round()` to all four columns ending with male
  ## This will default to zero digits, which is fine for this table
  mutate(across(ends_with("male"), round)) |>
  unite(male, ends_with("_male"), sep = " +/- ") |>
  unite(female, ends_with("female"), sep = " +/- ")

Applying Functions To Multiple Columns

Note

A bonus code chunk not included in the slides showing how to round to 1 digit

my_penguins |>
  summarise(
    mn_weight = mean(body_mass_g),
    ## This is a fitted value so standard error may be more appropriate
    se_weight = sd(body_mass_g) / sqrt(n()),
    .by = c(species, island, sex)
  ) |>
  ## Use `ends_with("weight")` to pivot the appropriate columns
  pivot_wider(names_from = sex, values_from = ends_with("weight")) |>
  ## If we want 1 digit, we use slightly different syntax
  ## This relies on the value from each column being passed to `round()`
  ## as the value `.x`. The details are beyond the scope of this course
  mutate(across(ends_with("male"), ~round(.x, digits = 1))) |>
  unite(male, ends_with("_male"), sep = " +/- ") |>
  unite(female, ends_with("female"), sep = " +/- ")

Shifting From Wide to Long Form

The same principles apply when using pivot_longer()
- Won’t explore in as much detail

## Place all measurements taken in 'mm' into the same column
my_penguins |>
  pivot_longer(ends_with("mm"), names_to = "measurement", values_to = "mm")

# A tibble: 999 × 7
   species island    body_mass_g sex    year  measurement          mm
   <fct>   <fct>           <dbl> <fct>  <fct> <chr>             <dbl>
 1 Adelie  Torgersen        3750 male   2007  bill_length_mm     39.1
 2 Adelie  Torgersen        3750 male   2007  bill_depth_mm      18.7
 3 Adelie  Torgersen        3750 male   2007  flipper_length_mm 181  
 4 Adelie  Torgersen        3800 female 2007  bill_length_mm     39.5
 5 Adelie  Torgersen        3800 female 2007  bill_depth_mm      17.4
 6 Adelie  Torgersen        3800 female 2007  flipper_length_mm 186  
 7 Adelie  Torgersen        3250 female 2007  bill_length_mm     40.3
 8 Adelie  Torgersen        3250 female 2007  bill_depth_mm      18  
 9 Adelie  Torgersen        3250 female 2007  flipper_length_mm 195  
10 Adelie  Torgersen        3450 female 2007  bill_length_mm     36.7
# ℹ 989 more rows

Separating Values Into Multiple Columns

## Place all measurements taken in 'mm' into the same column
my_penguins |>
  pivot_longer(ends_with("mm"), names_to = "measurement", values_to = "mm") |>
  ## Now separate the values into body part & measurement type
  separate(measurement, c("body_part", "measurement_type", "junk"), sep = "_") |>
  ## We can get rid of the junk column by using the `-` symbol before the column name
  dplyr::select(-junk)

# A tibble: 999 × 8
   species island    body_mass_g sex    year  body_part measurement_type    mm
   <fct>   <fct>           <dbl> <fct>  <fct> <chr>     <chr>            <dbl>
 1 Adelie  Torgersen        3750 male   2007  bill      length            39.1
 2 Adelie  Torgersen        3750 male   2007  bill      depth             18.7
 3 Adelie  Torgersen        3750 male   2007  flipper   length           181  
 4 Adelie  Torgersen        3800 female 2007  bill      length            39.5
 5 Adelie  Torgersen        3800 female 2007  bill      depth             17.4
 6 Adelie  Torgersen        3800 female 2007  flipper   length           186  
 7 Adelie  Torgersen        3250 female 2007  bill      length            40.3
 8 Adelie  Torgersen        3250 female 2007  bill      depth             18  
 9 Adelie  Torgersen        3250 female 2007  flipper   length           195  
10 Adelie  Torgersen        3450 female 2007  bill      length            36.7
# ℹ 989 more rows

Separating Values Into Multiple Columns

Note

A bonus code chunk showing all other measurements compared to Body Mass

my_penguins |>
  pivot_longer(ends_with("mm"), names_to = "measurement", values_to = "mm") |>
  mutate(measurement = str_remove_all(measurement, "_mm")) |>
  ggplot(aes(body_mass_g, mm, colour = species)) +
  geom_point(aes(shape = sex)) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~measurement, scales = "free") +
  scale_colour_brewer(palette = "Set1") +
  scale_shape_manual(values = c(1, 19)) +
  theme_bw()

Closing Comments

The Palmer Penguins

The data we’ve been working with comes from the package palmerpenguins
- https://allisonhorst.github.io/palmerpenguins/index.html
my_penguins had all NA values removed for convenience

library(palmerpenguins)
penguins

# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>

Function Conflicts

Functions in dplyr are often based on SQL functions
- dplyr was first released in 2014 ¹
Some other packages already contained a select(), filter() etc
- Especially those with a SQL back-end

Function Conflicts

If a dplyr function gives a strange error
\(\implies\) probably loaded another package with a function of the same name
Common cases are select(), filter(), slice(), count(), n()

The solution is to call explicitly from dplyr using the namespace
- e.g. dplyr::select(), dplyr::filter(), dplyr::count() etc
- The namespace is the formal title for all function names within a package

The package conflicted also enables setting of which function to use

Footnotes

https://hadley.github.io/25-tidyverse-history/index-preview.html#key-innovations↩︎

Working WithSpreadsheet-Like Data

Section Setup

The Pipe Operator

The Pipe Operator

The Pipe Operator

Returning To The Penguins

Using the Pipe Operator

Subsetting a Data Frame

Subsetting a Data Frame

Subsetting a Data Frame

Subsetting a Data Frame

Subsetting a Data Frame

Excel-Like Functions in dplyr

Excel-Like Functions in dplyr

Applying a Filter

Applying a Filter

Applying a Filter

Applying a Filter

Applying a Filter

Applying a Filter

Combining Functions

Summarising Data

Creating Summary Plots

Summarising Data

Summarising Data

Modifying Columns

Modifying Columns

Modifying Columns

Reshaping Data with tidyr

Using tidyr

Changing From Long To Wide Form

Changing From Long To Wide Form

Combining Columns

Applying Functions To Multiple Columns

Applying Functions To Multiple Columns

Applying Functions To Multiple Columns

Shifting From Wide to Long Form

Separating Values Into Multiple Columns

Separating Values Into Multiple Columns

Closing Comments

The Palmer Penguins

Function Conflicts

Function Conflicts

Footnotes

Working With
Spreadsheet-Like Data

Excel-Like Functions in `dplyr`

Excel-Like Functions in `dplyr`

Reshaping Data with `tidyr`

Using `tidyr`