|>
penguins filter(
!is.na(sex) # Exclude those where sex is missing
|>
) count(species, sex)
Reshaping And Tidying Data
RAdelaide 2025
Manipulating Tibbles with tidyr
Motivation
- In Excel many people use pivot tables
- Sometimes we need similar operations in R
- Data may need to be in different ‘shapes’ for plotting/tables/analysis
- Often we refer to data as being in long form or in wide form
- In wide form our response variable may be spread across multiple columns
- Often multiple measures for one individual
- In long form, the response variable is in a single column
The package tidyr
- We’ll look at some key functions from
tidyr
pivot_longer()
andpivot_wider()
separate()
andunite()
- Many more beyond the scope of today
- Add
library(tidyr)
to the beginning of the script penguins
has each measurement in long form
\(\implies\) has multiple columns of measurement
From Long To Wide
- We’ll first create some summaries by sex
From Wide To Long
- The reverse operation is
pivot_longer()
- This time we move values from multiple columns into a single column
- Notice that the new column names need to be quoted here
|>
penguins # Place all the length and depth measurements into a single column
pivot_longer(
cols = ends_with("mm"),
names_to = "measurement",
values_to = "mm"
)
Using Both Together
|>
penguins pivot_longer(
cols = ends_with("mm"),
names_to = "measurement",
values_to = "mm"
|>
) filter(!is.na(mm)) |>
summarise(
mean_mm = mean(mm), .by = c(year, measurement)
|>
) # Now form separate columns for each year
pivot_wider(names_from = year, values_from = mean_mm)
Combining Columns
## We can combine multiple columns using unite
## By default, the separator will be '_'
|>
penguins unite(new_column, species, sex)
## Original columns can be left if preferred
## An alternative separator can also be provided
|>
penguins unite(new_column, species, sex, remove = FALSE, sep = ":")
Splitting Columns
- The reverse operation is
separate()
- We’ll create our own tibble on the fly for this example
tibble(merged = c("A.1", "B.1", "C.2"))
# A tibble: 3 × 1
merged
<chr>
1 A.1
2 B.1
3 C.2
Summary
- Covered quite some turf
- Switch from wide to long
- Splitting/Merging Columns
- Selecting/Renaming Columns
- Adding Columns
- Summarising Columns
- Filtering/Slicing Rows