Reshaping And Tidying Data

RAdelaide 2025

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

Published

July 8, 2025

Challenges

Initial Challenges

  1. Count the number of penguins from each species on each island for each year. Try and make it look like the output below
    • Hint: use values_fill = 0 to get rid of missing values in the output
# A tibble: 9 × 5
  island     year Adelie Chinstrap Gentoo
  <fct>     <int>  <int>     <int>  <int>
1 Biscoe     2007     10         0     34
2 Biscoe     2008     18         0     46
3 Biscoe     2009     16         0     44
4 Dream      2007     20        26      0
5 Dream      2008     16        18      0
6 Dream      2009     20        24      0
7 Torgersen  2007     20         0      0
8 Torgersen  2008     16         0      0
9 Torgersen  2009     16         0      0
  1. Add a column called total which adds the values for all species columns
  2. Subset the table to only show values from 2009
  3. Combine the island and year to be a column in the form island:year

More Challenges

  • Create the following table showing the mean bill_length_mm
    • Decide how to handle missing values
# A tibble: 5 × 5
  species   island    `2007` `2008` `2009`
  <fct>     <fct>      <dbl>  <dbl>  <dbl>
1 Adelie    Torgersen   38.8   38.8   39.3
2 Adelie    Biscoe      38.3   38.7   39.7
3 Adelie    Dream       39.1   38.2   38.2
4 Gentoo    Biscoe      47.0   46.9   48.5
5 Chinstrap Dream       48.7   48.7   49.1
  • Add a column called overall_mean which averages the values for all years
    • Hint: The mean is the sum divided by the number of values