Visualising and Plotting Data

ASI: Introduction to R

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

September 2, 2025

Visualisation in R

Start a New R Script

  • Call the new script: BasicVisualisation.R
  • Load our favourite package at the top of the script
library(tidyverse)
  • Load the my_penguins dataset
my_penguins <- read_csv("data/my_penguins.csv")

Introducing The Penguins

my_penguins
# A tibble: 333 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 323 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
  • Contains multiple measurements for penguins recorded around Palmer Station, Antarctica
  • Slightly modified version from https://allisonhorst.github.io/palmerpenguins/

Base Plotting in R

  • R comes with some very powerful plotting capabilities
    • Provided in the base package graphics
    • Always loaded with every session
  • Examples are often extremely helpful
  • People used happily for decades
    • The release of ggplot2 changed everything
  • Let’s quickly explore base plotting before moving to the good stuff

Base Plotting In R

  • Simple plots are usually easy
    • Complex figures can get really messy
  • Using the cars dataset
    • speed (mph)
    • dist (ft) each car takes to stop
plot(cars)

Base Plotting In R

  • The first two columns were automatically placed on the x & y axis
  • We could set values for x & y manually
  • Automatically decided to plot using points
plot(x = cars$speed, y = cars$dist)

Base Plotting In R

  • Using the my_penguins dataset to compare flipper length and body mass
plot(x = my_penguins$body_mass_g, y = my_penguins$flipper_length_mm) 

Base Plotting In R

  • The function boxplot() can also create simple figures easily
  • For categorical variables (i.e. factors) we use R formula notation
    • y ~ x \(\implies\) y depends on x, or
    • y ~ x \(\implies\) y is a function of x
## Make a simple boxplot showing the weights by species
boxplot(body_mass_g ~ species, data = my_penguins)
  • The dependent variable will always appear on the y-axis
  • The predictor will always appear on the x-axis

Base Plotting In R

  • We can also use combinations of predictor variables
## Separate by species and sex
boxplot(body_mass_g ~ sex + species, data = my_penguins)

Base Plotting In R

  • Histograms can be produced on an individual column
    • The number of breaks can be set manually
  • The default is pretty useful here
    • Generally simple figures without complexity
hist(my_penguins$body_mass_g, breaks = 20, xlab = "Body Mass (g)")

Visualisation With ggplot2

The Grammar of Graphics

  • ggplot2 has become the industry standard for visualisation (Wickham 2016)
  • Core & essential part of the tidyverse
  • Developed by Hadley Wickham as his PhD thesis
  • An implementation of The Grammar of Graphics (Wilkinson 2005)
    • Breaks visualisation into layers
    • Each layer added as a new line of code

The Grammar of Graphics

Taken from https://r.qcbs.ca/workshop03/book-en/grammar-of-graphics-gg-basics.html

Beginning Our Plot

## The initial call to ggplot defines
## the data layer only
ggplot(my_penguins)

Adding Layers

## Adding aesthetic mappings sets the 
## data range (via the axes)
ggplot(my_penguins) +
  aes(
    x = body_mass_g, 
    y = flipper_length_mm
  )

Adding Layers

  • Notice how the call to ggplot() was followed by +
  • This tells R: “But wait, there’s more to come!”
  • Is how we add the layers described in The Grammar of Graphics

Adding Points

## Adding `geom_point()` now tells
## ggplot what geometry to use
ggplot(my_penguins) +
  aes(
    x = body_mass_g, 
    y = flipper_length_mm
  ) +
  geom_point()

Tidying The Code

  • To write more clearly
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm) + # 2: Aesthetic mappings
  geom_point() # 3: Geometry

Adding Layers

  • Now we can easily add additional layers
    • e.g. a smooth curve as a statistics layer
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm) + # 2: Aesthetic mappings
  geom_point() + # 3: Geometry
  stat_smooth() # 5: Statistics

Adding Layers

  • Change to a regression line without an error region:
    • method = "lm", se = FALSE
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm) + # 2: Aesthetic mappings
  geom_point() + # 3: Geometry
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics

Additional Aesthetics

  • We might like to colour points by species
  • Adding this to the main aes() will send this mapping to all layers
    • Will produce a regression line for each species
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point() +  # 3: Geometry
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics

Additional Aesthetics

  • We can also include the sex of penguins on this figure
  • Can easily add as an aesthetic mapping
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species, shape = sex) + 
  geom_point() +  # 3: Geometry
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics
  • This doesn’t work well for the regression lines
  • Lines don’t have a shape aesthetic \(\implies\) ignored by stat_smooth()

Additional Aesthetics

  • Aesthetic mappings set in the first call are passed to all layers
  • Aesthetics can also be set as layer-specific
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics

Facets

  • Separate panels are known as facets in ggplot2
    • Plot male & female penguins in separate panels
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex) + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics

Facets

  • We can “free” each axis within each panel
    • Not always informative
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex, scales = "free") + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) # 5: Statistics

Co-ordinates

  • An additional layer is the co-ordinate system
    • Defaults are always chosen (unlike geometry)
    • Can modify if we choose
  • Quite obvious for points
    • Can transform axes to log10 etc
  • Applies to colours and shapes as well
  • Set using scale_*() functions

Co-ordinates

  • Manually set the point shapes
  • The options can be seen using ?pch
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex, scales = "free") + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) + # 5: Statistics
  scale_shape_manual(values = c(1, 19)) # 6: Co-ordinates (i.e. scales)

Co-ordinates

  • Colour scales are also set like this
    • scale_colour_brewer() is an excellent starting point
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex, scales = "free") + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) + # 5: Statistics
  scale_shape_manual(values = c(1, 19)) + # 6: Co-ordinates (i.e. scales)
  scale_colour_brewer(palette = "Set1")
  • Check brewer palettes using RColorBrewer::display.brewer.all()

Co-ordinates

  • Multiple options for discrete colour palettes
    • scale_colour_manual(values = c("red", "blue", "green"))
    • scale_colour_viridis_d()
  • Colourblind palettes are provided in ggthemes
    • scale_colour_colorblind()

Co-ordinates

library(ggthemes)
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex, scales = "free") + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) + # 5: Statistics
  scale_shape_manual(values = c(1, 19)) + # 6: Co-ordinates (i.e. scales)
  scale_colour_colorblind() # Stoopid American spelling
  • Scales often require an understanding of discrete vs continuous, e.g.
    • scale_x_continuous() for continuous data
    • scale_x_discrete() for categorical data
    • scale_colour_gradient() for continuous data (e.g. heatmaps)

Themes

  • The final grammar of graphics layer \(\implies\) themes
  • Controls overall appearance:
    • font-sizes, background style, title/legend placement etc
  • Will explore in detail later if we have time
  • We do need to get rid of the horrible grey background though

Themes

  • theme_bw() applies a generally usable set of defaults
library(ggthemes)
ggplot(my_penguins) + # 1: Define the data layer
  aes(x = body_mass_g, y = flipper_length_mm, colour = species) + # 2: Mappings
  geom_point(aes(shape = sex)) +  # 3: Geometry with layer-specific aes
  facet_wrap(~sex, scales = "free") + # 4: Facets
  stat_smooth(method = "lm", se = FALSE) + # 5: Statistics
  scale_shape_manual(values = c(1, 19)) + # 6: Co-ordinates (i.e. scales)
  scale_colour_colorblind() + # Stoopid American spelling
  theme_bw() # 7: The overall theme
  • I often call theme_set(theme_bw()) at the start of a session
    • Sets as the theme for all subsequent plots

Additional Geometries

Additional Geometries

  • The choice of appropriate geometry is usually data-driven
    • Same principle as aes() / geom_point()
    • May have different mappings
  • Lines: geom_line(), geom_abline(), geom_hline(), geom_vline()
  • Distributions: geom_boxplot(), geom_violin()
  • Histograms: geom_histogram(), geom_density()
  • Bar Plots: geom_bar(), geom_col() + geom_errorbar()
  • Heatmaps: geom_tile(), geom_raster(), geom_rect()

Making Boxplots

ggplot(
  my_penguins,
  aes(species, body_mass_g)
) +
  geom_boxplot() +
  theme_bw()

Making Boxplots

## Map sex to the fill aesthetic
ggplot(my_penguins) +
  aes(
    species, body_mass_g, fill = sex
  ) +
  geom_boxplot() +
  theme_bw()

Histograms

  • The default histogram usually looks terrible \(\implies\) easy to fix
## Map sex to the fill aesthetic
ggplot(my_penguins) +
  aes(x = body_mass_g, fill = sex) +
  geom_histogram() +
  theme_bw()

Histograms

  • The default histogram usually looks terrible \(\implies\) easy to fix
    • Changing binwidth and setting colour (outline)
ggplot(my_penguins) +
  aes(x = body_mass_g, fill = sex) +
  geom_histogram(
    colour = "black", binwidth = 100
  ) +
  facet_grid(sex~species) +
  theme_bw()

Closing Comments

  • The top-level aes() can also be set in the first call to ggplot()
    • Is personal preference \(\implies\) showed layers clearly today
## My code usually looks more like this
ggplot(
  my_penguins, 
  aes(x = body_mass_g, fill = sex)
) +
  geom_histogram(colour = "black", binwidth = 100) +
  facet_grid(sex~species) +
  theme_bw()

Closing Comments

  • Will explore additional geometries through the course
  • Also discuss more detailed ggplot2 customisation
  • Non ggplot2 options can also be effective:
    • corrplot() from the package corrplot
    • pheatmap() from the package pheatmap
    • Venn Diagrams from VennDiagram
    • UpSet plots from UpSetR (is actually ggplot)

Challenge

  1. Load the pigs dataset and create a boxplot
    • Show dose across the x-axis
    • Fill by supplement type
  2. Experiment with geom_violin() as an alternative

References

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.