library(palmerpenguins)
library(tidyverse)
Introductory Visualisation
RAdelaide 2025
Visualisation in R
Start a New R Script
- Call the new script:
IntroVisualisation.R
- Load our favourite packages at the top of the script
Base Plotting in R
R
comes with some very powerful plotting capabilities- Provided in the base package
graphics
- Always loaded with every session
- Provided in the base package
- Examples are often extremely helpful
- People used happily for decades
- The release of
ggplot2
changed everything
- The release of
- Let’s quickly explore base plotting before moving to the good stuff
Visualisation With ggplot2
The Grammar of Graphics
ggplot2
has become the industry standard for visualisation (Wickham 2016)- Core & essential part of the
tidyverse
- Developed by Hadley Wickham as his PhD thesis
- An implementation of The Grammar of Graphics (Wilkinson 2005)
- Breaks visualisation into layers
An Initial Example
- Using the example dataset
cars
- Two columns:
speed
(mph)distance
each car takes to stop
- We can make a classic
x
vsy
plot using points
- The predictor (x) would be
speed
- The response (y) would be
distance
Visualising Our Penguins
What visualisations could we produce to inspect penguins
?
- Boxplots & Points
Creating Our First Plot
## Compare the two bill measurements
|> # Layer 1: Data
penguins ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + # Layer 2: Aesthetics
geom_point() # Layer 3: Geometry
Understanding Aesthetics
- Setting the colour in the call to
ggplot()
\(\implies\) all layers will use this - If we shift
colour = species
togeom_point()
\(\implies\) ???
## Only use colour for the points
|>
penguins ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point(aes(colour = species)) +
stat_smooth(method = "lm", se = FALSE)
Using Facets
- Alternatively, we could plot each species in it’s own panel (or facet)
- Using
~
notation to say all facets depend onspecies
## Plot each species in a separate panel
|>
penguins ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
geom_point(aes(colour = species)) +
stat_smooth(method = "lm", se = FALSE) +
facet_wrap(~species) # Layer 4: Facets
Scales
Setting Scales
- By default,
ggplot2
will detect the most appropriate scale- Has applied
scale_x_continuous()
andscale_y_continuous()
- Has applied
# Explicitly set the scales. This will appear identical
|>
penguins ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
geom_point() +
stat_smooth(method = "lm", se = FALSE) +
scale_x_continuous() +
scale_y_continuous()
- Multiple presets are available:
scale_x_log10()
,scale_x_sqrt()
,scale_x_reverse()
- Also available for
y
What Else Can We Do?
- What else might be informative?
- Can we separate by island or sex?
sex
will have missing values- Let’s set the shape of the points
## Try setting different point shapes based on the recorded sex
|>
penguins filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
geom_point(aes(shape = sex)) + # Now we have a layer-specific aesthetic
stat_smooth(method = "lm", se = FALSE) +
scale_colour_colorblind()
Modifying Points
- We can change the size of these outside the aesthetic
- Fixed values only \(\implies\) will not respond to change in data
## Try setting different point shapes based on the recorded sex
|>
penguins filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
geom_point(aes(shape = sex), size = 3) + # Change the point size
stat_smooth(method = "lm", se = FALSE) +
scale_colour_colorblind()
Finishing Our Figure
- The next step in making our figure look brilliant
- Axis & Scale labels
## Try setting different point shapes based on the recorded sex
|>
penguins filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
geom_point(aes(shape = sex), size = 3) +
stat_smooth(method = "lm", se = FALSE) +
scale_colour_colorblind() +
scale_shape_manual(values = c(19, 1)) +
labs(
# Manually add labels
x = "Bill Depth (mm)", y = "Bill Length (mm)",
colour = "Species", shape = "Sex"
)
- Note that this fundamentally breaks the automatic association between our data & figure
- We can type literally anything for labels which can leave to door open to errors
Themes
Using Themes
- A default theme is applied:
theme_grey()
- I prefer
theme_bw()
- Removes the grey background
- Gets most of the job done
+ theme_bw() p
Different Plot Types
Different Plot Types
Classic BarPlots
geom_bar()
&geom_col()
geom_errorbar()
&geom_errorbarh()
Classic Density plots
geom_boxplot()
&geom_violin()
geom_density()
&geom_histogram()
Line-based Geometry
geom_line()
,geom_segment()
geom_abline()
,geom_hline()
&geom_vline()
Heatmaps and Grids
geom_raster()
,geom_tile()
&geom_rect()
Creating A Boxplot
- A starting point might be to choose
sex
as the predictor body_mass_g
may be a response variable
|>
penguins ggplot(aes(island, body_mass_g)) +
geom_boxplot()
Overlaying Geoms
geom_jitter()
will draw points but with noise in either direction- The
alpha
parameter will make the points partially transparent
|>
penguins filter(!is.na(sex)) |>
ggplot(aes(sex, body_mass_g, fill = sex)) +
geom_boxplot(outliers = FALSE) + # Hide any outliers from the boxes
geom_jitter(width = 0.1, alpha = 0.5) + # Outliers will be shown here
facet_wrap(~species + island, nrow = 1)
Trying a Violin Plot
- Violin plots are similar to boxplots
- can estimate distributions beyond the points (
trim = FALSE
)
- can estimate distributions beyond the points (
|>
penguins filter(!is.na(sex)) |>
ggplot(aes(sex, body_mass_g, fill = sex)) +
geom_violin(draw_quantiles = 0.5, trim = FALSE) +
geom_jitter(width = 0.1, alpha = 0.5) +
facet_wrap(~species + island, nrow = 1)
Creating A Histogram
- The default histogram in
ggplot2
is a bit ugly
|>
penguins filter(!is.na(sex)) |>
ggplot(aes(body_mass_g)) +
geom_histogram()
- This can be easily improved by setting
fill = "grey70"
andcolour = "black"
binwidth
is automatically set \(\implies\) trybinwidth = 100
Creating a Summary Barplot
- Bar plots are pretty common in many fields of research
- First we’ll create a summary table with mean and sd
- Use the mean for bars
- sd for error bars
|>
penguins filter(!is.na(sex)) |>
summarise(
weight_mn = mean(body_mass_g, na.rm = TRUE),
weight_sd = sd(body_mass_g, na.rm = TRUE),
.by = c(species, sex)
)
# A tibble: 6 × 4
species sex weight_mn weight_sd
<fct> <fct> <dbl> <dbl>
1 Adelie male 4043. 347.
2 Adelie female 3369. 269.
3 Gentoo female 4680. 282.
4 Gentoo male 5485. 313.
5 Chinstrap female 3527. 285.
6 Chinstrap male 3939. 362.
Adding Labels To Points
- The penguins dataset isn’t well suited to adding labels
- Let’s make a toy dataset
tibble(
x = 1:4, y = x^2, label = c("a", "b", "c", "d")
)
# A tibble: 4 × 3
x y label
<int> <dbl> <chr>
1 1 1 a
2 2 4 b
3 3 9 c
4 4 16 d
Saving Images
- The simple way is click
Export
in thePlots
pane
- The way to save using code is
ggsave("myplot.png", width = 7, height = 7, units = "in")
- This will always save the most recent plot by default
- Output format is determined by the suffix
- Try saving as a pdf…
Conclusion
Conclusion
A fabulous resource: https://r-graphics.org/
Challenges
- Create a barplot with error bars showing mean flipper length by species
- Colour (i.e. fill) however you choose
- Create a histogram of flipper length by species
- Facet by species
- Use boxplots to show the same (flipper length by species)
- Don’t facet, but fill the boxes by sex
- Using points, compare flipper length to body mass
- Colour by species including a regression line
- Try adding
stat_ellipse()
to your plot
References
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.
Wong, Bang. 2011. “Color Blindness.” Nat. Methods 8 (6): 441.