library(tidyverse)
Visualising and Plotting Data
ASI: Introduction to R
Visualisation in R
Start a New R Script
- Call the new script:
BasicVisualisation.R
- Load our favourite package at the top of the script
- Load the
my_penguins
dataset
<- read_csv("data/my_penguins.csv") my_penguins
Introducing The Penguins
my_penguins
# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen 36.7 19.3 193 3450
5 Adelie Torgersen 39.3 20.6 190 3650
6 Adelie Torgersen 38.9 17.8 181 3625
7 Adelie Torgersen 39.2 19.6 195 4675
8 Adelie Torgersen 41.1 17.6 182 3200
9 Adelie Torgersen 38.6 21.2 191 3800
10 Adelie Torgersen 34.6 21.1 198 4400
# ℹ 323 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
- Contains multiple measurements for penguins recorded around Palmer Station, Antarctica
- Slightly modified version from https://allisonhorst.github.io/palmerpenguins/
Base Plotting in R
R
comes with some very powerful plotting capabilities- Provided in the base package
graphics
- Always loaded with every session
- Provided in the base package
- Examples are often extremely helpful
- People used happily for decades
- The release of
ggplot2
changed everything
- The release of
- Let’s quickly explore base plotting before moving to the good stuff
Visualisation With ggplot2
The Grammar of Graphics
ggplot2
has become the industry standard for visualisation (Wickham 2016)- Core & essential part of the
tidyverse
- Developed by Hadley Wickham as his PhD thesis
- An implementation of The Grammar of Graphics (Wilkinson 2005)
- Breaks visualisation into layers
- Each layer added as a new line of code
Beginning Our Plot
## The initial call to ggplot defines
## the data layer only
ggplot(my_penguins)
Adding Layers
- I’m spreading my code to fit the narrow column in the slide
Adding Points
Tidying The Code
- To write more clearly
ggplot(my_penguins) + # 1: Define the data layer
aes(x = body_mass_g, y = flipper_length_mm) + # 2: Aesthetic mappings
geom_point() # 3: Geometry
Adding Layers
- I’ll hide my figures from here to the code looks nicer
- Now we can easily add additional layers
- e.g. a smooth curve as a statistics layer
Additional Aesthetics
- We might like to colour points by species
- Adding this to the main
aes()
will send this mapping to all layers- Will produce a regression line for each species
Facets
- Separate panels are known as facets in
ggplot2
- Plot male & female penguins in separate panels
Co-ordinates
- An additional layer is the co-ordinate system
- Defaults are always chosen (unlike geometry)
- Can modify if we choose
- Quite obvious for points
- Can transform axes to log10 etc
- Applies to colours and shapes as well
- Set using
scale_*()
functions
Themes
- The final grammar of graphics layer \(\implies\) themes
- Controls overall appearance:
- font-sizes, background style, title/legend placement etc
- Will explore in detail later if we have time
- We do need to get rid of the horrible grey background though
Additional Geometries
Additional Geometries
- The choice of appropriate geometry is usually data-driven
- Same principle as
aes()
/geom_point()
- May have different mappings
- Same principle as
- Lines:
geom_line()
,geom_abline()
,geom_hline()
,geom_vline()
- Distributions:
geom_boxplot()
,geom_violin()
- Histograms:
geom_histogram()
,geom_density()
- Bar Plots:
geom_bar()
,geom_col()
+geom_errorbar()
- Heatmaps:
geom_tile()
,geom_raster()
,geom_rect()
Making Boxplots
ggplot(
my_penguins,aes(species, body_mass_g)
+
) geom_boxplot() +
theme_bw()
Histograms
- The default histogram usually looks terrible \(\implies\) easy to fix
## Map sex to the fill aesthetic
ggplot(my_penguins) +
aes(x = body_mass_g, fill = sex) +
geom_histogram() +
theme_bw()
Closing Comments
- The top-level
aes()
can also be set in the first call toggplot()
- Is personal preference \(\implies\) showed layers clearly today
## My code usually looks more like this
ggplot(
my_penguins, aes(x = body_mass_g, fill = sex)
+
) geom_histogram(colour = "black", binwidth = 100) +
facet_grid(sex~species) +
theme_bw()
Challenge
- Load the
pigs
dataset and create a boxplot- Show
dose
across the x-axis - Fill by supplement type
- Show
- Experiment with
geom_violin()
as an alternative
References
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.