Introductory Visualisation

RAdelaide 2025

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

Published

July 8, 2025

Visualisation in R

Start a New R Script

  • Call the new script: IntroVisualisation.R
  • Load our favourite packages at the top of the script
library(palmerpenguins)
library(tidyverse)

Base Plotting in R

  • R comes with some very powerful plotting capabilities
    • Provided in the base package graphics
    • Always loaded with every session
  • Examples are often extremely helpful
  • People used happily for decades
    • The release of ggplot2 changed everything
  • Let’s quickly explore base plotting before moving to the good stuff

Base Plotting In R

  • Simple plots are usually easy
    • Complex figures can get really messy
  • Using the cars dataset
    • speed (mph)
    • dist (ft) each car takes to stop
plot(cars)

Base Plotting In R

  • The first two columns were automatically placed on the x & y axis
  • We could set values for x & y manually
    • Switching back to the penguins here
  • Automatically decided to plot using points
## Plot calling individual columns from penguins using `$`
plot(x = penguins$bill_depth_mm, y = penguins$bill_length_mm)

Base Plotting In R

  • The function boxplot() can also create simple figures easily
  • For categorical variables (i.e. factors) we can use the formula notation
    • y ~ x \(\implies\) y depends on x
## Make a simple boxplot showing the weights by species
boxplot(body_mass_g ~ species, data = penguins)
  • The dependent variable will always appear on the y-axis
  • The predictor will always appear on the x-axis

Base Plotting In R

  • We can also use combinations of predictor variables
## Separate by species and sex
boxplot(body_mass_g ~ sex + species, data = penguins)

Base Plotting In R

  • Histograms can be produced on an individual column
    • The number of breaks can be set manually
  • The default is pretty useful here
    • Generally simple figures without complexity
hist(penguins$body_mass_g, breaks = 20, xlab = "Body Mass (g)")

Base Plotting In R

  • Large datasets can be quickly explored using pairs()
  • Shows all pairwise combinations of columns
    • Categorical columns can be less informative
pairs(penguins)
  • When we called plot on cars, this was actually called under the hood
  • Only two columns to show in a pairwise manner

Scales

Setting Scales

  • By default, ggplot2 will detect the most appropriate scale
    • Has applied scale_x_continuous() and scale_y_continuous()
# Explicitly set the scales. This will appear identical
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_x_continuous() +
  scale_y_continuous()
  • Multiple presets are available:
    • scale_x_log10(), scale_x_sqrt(), scale_x_reverse()
    • Also available for y

Setting Scales

  • For aesthetics like colour, we often want to tailor these
    • Default is scale_colour_discrete() (Meh…)
  • Many defaults exist
    • scale_colour_brewer(), scale_colour_viridis_d()
## Check the default palette for scale_colour_brewer
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer()
  • Default palettes can be good sometimes
    • To show options for scale_colour_brewer() \(\implies\) RColorBrewer::display.brewer.all()

Setting Scales

  • I often use Set1, but try a few others
  • scale_colour_viridis_d() will give a colourblind-friendly palette
    • Other palettes are provided by other packages
## Set the palette for scale_colour_brewer to be "Set1" or anything else
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer(palette = "Set1")

Setting Scales

  • Standard 7-colour palette adapted for colourblindness is included in ggthemes (Wong 2011)
    • Many alternatives exist
    • This one is written by Americans \(\implies\) weird spelling of colourblind
library(ggthemes)
## Use the colourblind friendly palette provided by ggthemes
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()
  • The colourblind-friendly status can be checked using RGB Hex codes at https://davidmathlogic.com/colorblind/#%23D81B60-%231E88E5-%23FFC107-%23004D40

What Else Can We Do?

  • What else might be informative?
  • Can we separate by island or sex?
    • sex will have missing values
    • Let’s set the shape of the points
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex)) + # Now we have a layer-specific aesthetic 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • We can change the size of these outside the aesthetic
    • Fixed values only \(\implies\) will not respond to change in data
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + # Change the point size
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • Points can be set manually using scale_shape_manual()
    • Also scale_colour_manual()
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) ## Manually choose the point shapes

Modifying Points

  • How did I know to choose those two values?
  • Why do numbers represent different shapes
  • Enter ?pch and scroll down a little
    • 21-25 have both a colour (outline) and fill capability

Finishing Our Figure

  • The next step in making our figure look brilliant
    • Axis & Scale labels
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    # Manually add labels
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 
  • Note that this fundamentally breaks the automatic association between our data & figure
  • We can type literally anything for labels which can leave to door open to errors

Finishing Our Figure

  • The final layer in the Grammar of Graphics is the Theme
  • Controls the overall appearance not controlled elsewhere
  • The code can get long so let’s save that figure as p
    • Then I can modify on a single slide
## Save the figure for exploring theme attributes
p <- penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 

Themes

Using Themes

  • A default theme is applied: theme_grey()
  • I prefer theme_bw()
    • Removes the grey background
    • Gets most of the job done
p + theme_bw()

Using Themes

  • Additional modifications:
    • Setting the base font size for all annotations
    • Can also set colour, alignment, font face, font family etc
    • Vital for publishing figures
p + theme_bw() +
  theme(text = element_text(size = 14))

Using Themes

  • A key application is the placement of legends
p + theme_bw() +
  theme(legend.position = "bottom")

Using Themes

  • Legend can be placed inside using three steps
    • Set legend.position = "inside"
    • Set the position you want the legend
    • Set which part of the legend aligns at those co-ordinates
  • Can be extremely finicky
p + theme_bw() +
  theme(
    legend.position = "inside", # Ensure the legend is inside the plotting region
    legend.position.inside = c(0, 0), # Anchor to the bottom left
    legend.justification.inside = c(0, 0) # Set the alignment to be bottom left
  )
  • By default the legend will align the centre at the given co-ordinates

Using Themes

  • Any individual text element of a figure can be modified using element_text()
p + theme_bw() +
  theme(
    ## A slightly exaggerated modification of axis titles
    axis.title = element_text(colour = "darkred", size = 16, face = "bold")
  )

Using Themes

  • Looking at the help page ?theme \(\implies\) 4 main types of ‘element’
  1. element_text()
    • Control all text elements (axes, titles etc)
    • Doesn’t impact labels within figures
  1. element_line()
    • Control all lines (axes, gridlines, borders etc)
    • Can set colour, size, linetype, linewidth
  1. element_rect()
    • Control all rectangles (background, legends, panels etc)
    • Can set colour, fill, size, linetype, linewidth
  1. element_blank()
    • Hides an element
  • Other element types are a bit more nuanced

Using Themes

# Make the most horrible figure possible
p + theme_bw() +
  theme(
    ## A slightly exaggerated modification of axis titles
    axis.title = element_text(colour = "darkred", size = 16, face = "bold"),
    ## Make axes thick, blue lines. Ewww
    axis.line = element_line(colour = "darkblue", linewidth = 2),
    ## Hide the underlying grid
    panel.grid = element_blank(),
    ## Make the area background a light grey
    plot.background = element_rect(fill = "grey70")
  )

Using Themes

  • Plot titles align left by default
p + theme_bw() + 
  ggtitle("Penguin Bill Measurements")

  • We can use theme() to align in the centre
    • hjust is the horizontal adjustment
p + theme_bw() + 
  ggtitle("Penguin Bill Measurements") +
  theme(plot.title = element_text(hjust = 0.5))

Different Plot Types

Different Plot Types

Classic BarPlots

  • geom_bar() & geom_col()
  • geom_errorbar() & geom_errorbarh()

Classic Density plots

  • geom_boxplot() & geom_violin()
  • geom_density() & geom_histogram()

Line-based Geometry

  • geom_line(), geom_segment()
  • geom_abline(), geom_hline() & geom_vline()

Heatmaps and Grids

  • geom_raster(), geom_tile() & geom_rect()

Creating A Boxplot

  • A starting point might be to choose sex as the predictor
  • body_mass_g may be a response variable
penguins |> 
  ggplot(aes(island, body_mass_g)) +
  geom_boxplot()

Creating Our Boxplot

  • To incorporate the sex \(\implies\) add a fill aesthetic
    • colour is generally applied to shape outlines
## Fill the boxes by sex
penguins |> 
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()
  • ggplot2 will always separate multiple values/category

Creating Our Boxplot

## Remove the penguins with no recorded sex
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()

Creating Our Boxplot

  • We could also separate by island using facet_wrap()
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_wrap(~species, scales = "free_x")

Creating Our Boxplot

  • A less-intuitive alternative (facet_grid()) will allow for unequal-sized facets
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_grid(~species, scales = "free_x", space = "free_x")

Overlaying Geoms

  • geom_jitter() will draw points but with noise in either direction
  • The alpha parameter will make the points partially transparent
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_boxplot(outliers = FALSE) + # Hide any outliers from the boxes
  geom_jitter(width = 0.1, alpha = 0.5) + # Outliers will be shown here
  facet_wrap(~species + island, nrow = 1)

Trying a Violin Plot

  • Violin plots are similar to boxplots
    • can estimate distributions beyond the points (trim = FALSE)
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_violin(draw_quantiles = 0.5, trim = FALSE) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  facet_wrap(~species + island, nrow = 1)

Creating A Histogram

  • The default histogram in ggplot2 is a bit ugly
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram()
  • This can be easily improved by setting fill = "grey70" and colour = "black"
  • binwidth is automatically set \(\implies\) try binwidth = 100

Creating A Histogram

## Now apply facet_grid to show by sex & species
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram(fill = "grey70", colour = "black", binwidth = 100) +
  facet_grid(species ~ sex) +
  theme_bw()

Creating A Histogram

  • The above shows counts \(\implies\) can also show frequency (i.e. density)
  • Unfortunately, the code is ugly but does make sense
    • (The ‘stat’ is counting, so after counting convert to a frequency)
## Repeat but showing bars as frequencies
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram(
    aes(y = after_stat(density)),
    fill = "grey70", colour = "black", binwidth = 100
  ) +
  facet_grid(species ~ sex) +
  theme_bw()
  • The first plot is informative if we have roughly the same number in each panel
  • The second option can be more informative for unbalanced data

Creating a Summary Barplot

  • Bar plots are pretty common in many fields of research
  • First we’ll create a summary table with mean and sd
    • Use the mean for bars
    • sd for error bars
penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) 
# A tibble: 6 × 4
  species   sex    weight_mn weight_sd
  <fct>     <fct>      <dbl>     <dbl>
1 Adelie    male       4043.      347.
2 Adelie    female     3369.      269.
3 Gentoo    female     4680.      282.
4 Gentoo    male       5485.      313.
5 Chinstrap female     3527.      285.
6 Chinstrap male       3939.      362.

Creating a Summary Barplot

  • Now we can use geom_col()
## Begin creating a bar plot, with separate panels for each species
penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) |>
  ggplot(aes(sex, weight_mn, fill = sex)) +
  geom_col() +
  facet_wrap(~species, nrow = 1) 

Creating a Summary Barplot

penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) |>
  ggplot(aes(sex, weight_mn, fill = sex)) +
  geom_col() +
  ## Now add error bars adding/subtracting from the mean 'on the fly'
  geom_errorbar(
    aes(ymin = weight_mn - weight_sd, ymax = weight_mn + weight_sd),
    width = 0.2 # Set the width of tails on the error bars
  ) +
  facet_wrap(~species, nrow = 1) +
  ## Some extra code to make the plot look great
  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  scale_fill_brewer(palette = "Set1") +
  theme_bw()

Adding Labels To Points

  • The penguins dataset isn’t well suited to adding labels
    • Let’s make a toy dataset
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
)
# A tibble: 4 × 3
      x     y label
  <int> <dbl> <chr>
1     1     1 a    
2     2     4 b    
3     3     9 c    
4     4    16 d    

Adding Labels To Points

  • We can easily plot points, but what about labels?
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point()

Adding Labels To Points

  • Using geom_text() will overlay labels exactly on the points
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point() +
  geom_text(aes(label = label), size = 4)

Adding Labels To Points

  • Using geom_text_repel() will shift labels marginally away from the points
library(ggrepel)
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point() +
  geom_text_repel(aes(label = label), size = 4)
  • geom_label() and geom_label_repel() will add borders and fill to labels

Saving Images

  • The simple way is click Export in the Plots pane
  • The way to save using code is
ggsave("myplot.png", width = 7, height = 7, units = "in")
  • This will always save the most recent plot by default
  • Output format is determined by the suffix
  • Try saving as a pdf…

Saving Images

  • I think saving using code is preferable
  • Modify an analysis or data \(\implies\) saved figures will also update
    • This saves time & ensures reproducibility
    • Modifying font sizes etc for publication can take a while

Conclusion

Conclusion

A fabulous resource: https://r-graphics.org/

Challenges

  1. Create a barplot with error bars showing mean flipper length by species
    • Colour (i.e. fill) however you choose
  2. Create a histogram of flipper length by species
    • Facet by species
  3. Use boxplots to show the same (flipper length by species)
    • Don’t facet, but fill the boxes by sex
  4. Using points, compare flipper length to body mass
    • Colour by species including a regression line
    • Try adding stat_ellipse() to your plot

References

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.
Wong, Bang. 2011. “Color Blindness.” Nat. Methods 8 (6): 441.