## First we can subset the dataset
filter(penguins, species == "Adelie")
RAdelaide 2025
July 8, 2025
How do we then pass this to the arrange()
function?
arrange()
on that objectfilter()
inside arrange()
Is this any good?
A complete analysis would lead to a workspace with multiple, similar objects, e.g. adelie_penguins
, penguins_2007
, torgerson_penguins
, etc.
This can become very messy and confusing
tibble
to be sortedIs this any good?
Functions are executed in order from the inside-most function to the outer-most. First the filtering is done, and then this is passed to arrange()
Can become very messy if calling 10 functions in a row
|>
# A tibble: 152 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# A tibble: 152 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# Filter the object, then pass the filtered object to arrange using the pipe
penguins |> filter(species == "Adelie") |> arrange(body_mass_g)
# A tibble: 152 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Biscoe 36.5 16.6 181 2850
2 Adelie Biscoe 36.4 17.1 184 2850
3 Adelie Biscoe 34.5 18.1 187 2900
4 Adelie Dream 33.1 16.1 178 2900
5 Adelie Torgersen 38.6 17 188 2900
6 Adelie Biscoe 37.9 18.6 193 2925
7 Adelie Dream 37.5 18.9 179 2975
8 Adelie Dream 37 16.9 185 3000
9 Adelie Dream 37.3 16.8 192 3000
10 Adelie Torgersen 35.9 16.6 190 3050
# ℹ 142 more rows
# ℹ 2 more variables: sex <fct>, year <int>
dplyr
dplyr
distinct()
will remove duplicate rows across the provided columnspenguins
datapenguins_raw
penguins
was derived fromcase_when()
case_when()
ifelse
statement with multiple conditionsif (condition is true)
do something
else
do something else
endif
case_when()
mutate
functionLHS is TRUE
~ Assign the RHS value
left/right_join()
left_join()
to add this to penguins
penguins
will be taken as a scaffold to add values toleft_join()
studies
was expanded
year
values matched \(\implies\) studyName
was addedstudies
)right_join()
by
argumentinner_join()
produces the subset where all complete matches are presentfull_join()
produces the union of the two datasets
NA
.keep_all = TRUE
will return all columns
kg
instead of g
slice_max()
to return the same penguins as the final example, i.e. the heaviest from each species and island?penguins
from penguins_raw
, but retaining studyName
and the Individual ID
as additional columns>
then changed to |
R
base pipe (|>
) was introduced in v.4.1 (2021)%>%
) was introduced in the package magrittr
(~2014)