Visualisation in R

Start a New R Script

  • Call the new script: IntroVisualisation.R
  • Load our favourite packages at the top of the script
library(palmerpenguins)
library(tidyverse)

Base Plotting in R

  • R comes with some very powerful plotting capabilities
    • Provided in the base package graphics
    • Always loaded with every session
  • Examples are often extremely helpful
  • People used happily for decades
    • The release of ggplot2 changed everything
  • Let’s quickly explore base plotting before moving to the good stuff

Base Plotting In R

  • Simple plots are usually easy
    • Complex figures can get really messy
  • Using the cars dataset
    • speed (mph)
    • dist (ft) each car takes to stop
plot(cars)

Base Plotting In R

  • The first two columns were automatically placed on the x & y axis
  • We could set values for x & y manually
    • Switching back to the penguins here
  • Automatically decided to plot using points
## Plot calling individual columns from penguins using `$`
plot(x = penguins$bill_depth_mm, y = penguins$bill_length_mm)

Base Plotting In R

  • The function boxplot() can also create simple figures easily
  • For categorical variables (i.e. factors) we can use the formula notation
    • y ~ x \(\implies\) y depends on x
## Make a simple boxplot showing the weights by species
boxplot(body_mass_g ~ species, data = penguins)
  • The dependent variable will always appear on the y-axis
  • The predictor will always appear on the x-axis

Base Plotting In R

  • We can also use combinations of predictor variables
## Separate by species and sex
boxplot(body_mass_g ~ sex + species, data = penguins)

Base Plotting In R

  • Histograms can be produced on an individual column
    • The number of breaks can be set manually
  • The default is pretty useful here
    • Generally simple figures without complexity
hist(penguins$body_mass_g, breaks = 20, xlab = "Body Mass (g)")

Base Plotting In R

  • Large datasets can be quickly explored using pairs()
  • Shows all pairwise combinations of columns
    • Categorical columns can be less informative
pairs(penguins)

The Grammar of Graphics

  • ggplot2 has become the industry standard for visualisation (Wickham 2016)
  • Core & essential part of the tidyverse
  • Developed by Hadley Wickham as his PhD thesis
  • An implementation of The Grammar of Graphics (Wilkinson 2005)
    • Breaks visualisation into layers

The Grammar of Graphics

Taken from https://r.qcbs.ca/workshop03/book-en/grammar-of-graphics-gg-basics.html

The Grammar of Graphics

Everything is added in layers

  1. Data
    • Usually a data.frame (or tibble)
    • Can be piped in \(\implies\) modify on the fly
  1. Aesthetics
    • x & y co-ordinates
    • colour, fill, shape, size, linetype
    • grouping & transparency (alpha)
  1. Geometric Objects
    • points, lines, boxplot, histogram, bars etc
  1. Facets: Panels within plots
  1. Statistics: Computed summaries
  1. Coordinates
    • polar, map, cartesian etc
    • defaults to cartesian
  1. Themes: overall layout
    • default themes automatically applied

An Initial Example

  • Using the example dataset cars
  • Two columns:
    • speed (mph)
    • distance each car takes to stop
  • We can make a classic x vs y plot using points
  • The predictor (x) would be speed
  • The response (y) would be distance

An Initial Example

  • We may as well start by piping our data in
cars |>
  ggplot(aes(x = speed, y = dist))
  • We have defined the plotting aesthetics
    • x & y
    • Don’t need to name if passing in order
  • Axis limits match the data
  • No geometry has been specified \(\implies\) nothing was drawn

An Initial Example

  • To add points, we add geom_point() after calling ggplot()
  • Adding + after ggplot() says “But wait! There’s more…”
cars |>
  ggplot(aes(x = speed, y = dist)) + 
  geom_point() 
  • When ggplot2 was created neither pipe had been developed yet

An Initial Example

  • To add points, we add geom_point() after calling ggplot()
    • Adding + after ggplot() says “But wait! There’s more…”
cars |> # Layer 1: Data
  ggplot(aes(x = speed, y = dist)) + # Layer 2: Aesthetics
  geom_point() # Layer 3: Geometry
  • By default:
    • Layer 4: No facets
    • Layer 5: No summary statistics
    • Layer 6: Cartesian co-ordinate system
    • Layer 7: Crappy theme with grey background 🤮

An Initial Example

  • A simple summary statistic to add might be stat_smooth()
  • Automatically chooses the smoother
    • Usually a loess curve or regression line
    • The standard error region is shown by default
cars |> # Layer 1: Data
  ggplot(aes(x = speed, y = dist)) + # Layer 2: Aesthetics
  geom_point() + # Layer 3: Geometry
  stat_smooth() # Layer 5: Statistics

Visualising Our Penguins


What visualisations could we produce to inspect penguins?

Creating Our First Plot

## Compare the two bill measurements
penguins |> # Layer 1: Data
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) + # Layer 2: Aesthetics
  geom_point()  # Layer 3: Geometry

Creating Our First Plot

  • There seem to be groups. Are these based on species? \(\implies\) Add colour
## Compare the two bill measurements
penguins |> 
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point()

Creating Our First Plot

  • We can also add regression lines
    • We’ll add equations later
    • Try without the se = FALSE and see what happens
## Add regression lines as a new geom
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) # Layer 5: Statistics

Understanding Aesthetics

  • Setting the colour in the call to ggplot() \(\implies\) all layers will use this
  • If we shift colour = species to geom_point() \(\implies\) ???
## Only use colour for the points
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE)

Understanding Aesthetics

  • We could set this again if we choose
## Set colour for the points and regression lines separately
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(aes(colour = species), method = "lm", se = FALSE)
  • It’s clunky here, but can give fine control for complex plots

Using Facets

  • Alternatively, we could plot each species in it’s own panel (or facet)
  • Using ~ notation to say all facets depend on species
## Plot each species in a separate panel
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~species) # Layer 4: Facets

Using Facets

  • We can allow x and y axes to scale separately for each panel
    • Not always a helpful strategy
## Plot each species in a separate panel, allowing axes to be scaled freely
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm)) +
  geom_point(aes(colour = species)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~species, scales = "free") 

Scales

Setting Scales

  • By default, ggplot2 will detect the most appropriate scale
    • Has applied scale_x_continuous() and scale_y_continuous()
# Explicitly set the scales. This will appear identical
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_x_continuous() +
  scale_y_continuous()
  • Multiple presets are available:
    • scale_x_log10(), scale_x_sqrt(), scale_x_reverse()
    • Also available for y

Setting Scales

  • For aesthetics like colour, we often want to tailor these
    • Default is scale_colour_discrete() (Meh…)
  • Many defaults exist
    • scale_colour_brewer(), scale_colour_viridis_d()
## Check the default palette for scale_colour_brewer
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer()
  • Default palettes can be good sometimes
    • To show options for scale_colour_brewer() \(\implies\) RColorBrewer::display.brewer.all()

Setting Scales

  • I often use Set1, but try a few others
  • scale_colour_viridis_d() will give a colourblind-friendly palette
    • Other palettes are provided by other packages
## Set the palette for scale_colour_brewer to be "Set1" or anything else
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_brewer(palette = "Set1")

Setting Scales

  • Standard 7-colour palette adapted for colourblindness is included in ggthemes (Wong 2011)
    • Many alternatives exist
    • This one is written by Americans \(\implies\) weird spelling of colourblind
library(ggthemes)
## Use the colourblind friendly palette provided by ggthemes
penguins |>
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

What Else Can We Do?

  • What else might be informative?
  • Can we separate by island or sex?
    • sex will have missing values
    • Let’s set the shape of the points
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex)) + # Now we have a layer-specific aesthetic 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • We can change the size of these outside the aesthetic
    • Fixed values only \(\implies\) will not respond to change in data
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + # Change the point size
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind()

Modifying Points

  • Points can be set manually using scale_shape_manual()
    • Also scale_colour_manual()
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) ## Manually choose the point shapes

Modifying Points

  • How did I know to choose those two values?
  • Why do numbers represent different shapes
  • Enter ?pch and scroll down a little
    • 21-25 have both a colour (outline) and fill capability

Finishing Our Figure

  • The next step in making our figure look brilliant
    • Axis & Scale labels
## Try setting different point shapes based on the recorded sex
penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    # Manually add labels
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 

Finishing Our Figure

  • The final layer in the Grammar of Graphics is the Theme
  • Controls the overall appearance not controlled elsewhere
  • The code can get long so let’s save that figure as p
    • Then I can modify on a single slide
## Save the figure for exploring theme attributes
p <- penguins |>
  filter(!is.na(sex)) |> # Remove the penguins with unrecorded sex
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, colour = species)) +
  geom_point(aes(shape = sex), size = 3) + 
  stat_smooth(method = "lm", se = FALSE) +
  scale_colour_colorblind() +
  scale_shape_manual(values = c(19, 1)) +
  labs(
    x = "Bill Depth (mm)", y = "Bill Length (mm)", 
    colour = "Species", shape = "Sex"
  ) 

Themes

Using Themes

  • A default theme is applied: theme_grey()
  • I prefer theme_bw()
    • Removes the grey background
    • Gets most of the job done
p + theme_bw()

Using Themes

  • Additional modifications:
    • Setting the base font size for all annotations
    • Can also set colour, alignment, font face, font family etc
    • Vital for publishing figures
p + theme_bw() +
  theme(text = element_text(size = 14))

Using Themes

  • A key application is the placement of legends
p + theme_bw() +
  theme(legend.position = "bottom")

Using Themes

  • Legend can be placed inside using three steps
    • Set legend.position = "inside"
    • Set the position you want the legend
    • Set which part of the legend aligns at those co-ordinates
  • Can be extremely finicky
p + theme_bw() +
  theme(
    legend.position = "inside", # Ensure the legend is inside the plotting region
    legend.position.inside = c(0, 0), # Anchor to the bottom left
    legend.justification.inside = c(0, 0) # Set the alignment to be bottom left
  )

Using Themes

  • Any individual text element of a figure can be modified using element_text()
p + theme_bw() +
  theme(
    ## A slightly exaggerated modification of axis titles
    axis.title = element_text(colour = "darkred", size = 16, face = "bold")
  )

Using Themes

  • Looking at the help page ?theme \(\implies\) 4 main types of ‘element’
  1. element_text()
    • Control all text elements (axes, titles etc)
    • Doesn’t impact labels within figures
  1. element_line()
    • Control all lines (axes, gridlines, borders etc)
    • Can set colour, size, linetype, linewidth
  1. element_rect()
    • Control all rectangles (background, legends, panels etc)
    • Can set colour, fill, size, linetype, linewidth
  1. element_blank()
    • Hides an element
  • Other element types are a bit more nuanced

Using Themes

# Make the most horrible figure possible
p + theme_bw() +
  theme(
    ## A slightly exaggerated modification of axis titles
    axis.title = element_text(colour = "darkred", size = 16, face = "bold"),
    ## Make axes thick, blue lines. Ewww
    axis.line = element_line(colour = "darkblue", linewidth = 2),
    ## Hide the underlying grid
    panel.grid = element_blank(),
    ## Make the area background a light grey
    plot.background = element_rect(fill = "grey70")
  )

Using Themes

  • Plot titles align left by default
p + theme_bw() + 
  ggtitle("Penguin Bill Measurements")

  • We can use theme() to align in the centre
    • hjust is the horizontal adjustment
p + theme_bw() + 
  ggtitle("Penguin Bill Measurements") +
  theme(plot.title = element_text(hjust = 0.5))

Different Plot Types

Different Plot Types

Classic BarPlots

  • geom_bar() & geom_col()
  • geom_errorbar() & geom_errorbarh()

Classic Density plots

  • geom_boxplot() & geom_violin()
  • geom_density() & geom_histogram()

Line-based Geometry

  • geom_line(), geom_segment()
  • geom_abline(), geom_hline() & geom_vline()

Heatmaps and Grids

  • geom_raster(), geom_tile() & geom_rect()

Creating A Boxplot

  • A starting point might be to choose sex as the predictor
  • body_mass_g may be a response variable
penguins |> 
  ggplot(aes(island, body_mass_g)) +
  geom_boxplot()

Creating Our Boxplot

  • To incorporate the sex \(\implies\) add a fill aesthetic
    • colour is generally applied to shape outlines
## Fill the boxes by sex
penguins |> 
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()
  • ggplot2 will always separate multiple values/category

Creating Our Boxplot

## Remove the penguins with no recorded sex
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot()

Creating Our Boxplot

  • We could also separate by island using facet_wrap()
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_wrap(~species, scales = "free_x")

Creating Our Boxplot

  • A less-intuitive alternative (facet_grid()) will allow for unequal-sized facets
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(island, body_mass_g, fill = sex)) +
  geom_boxplot() +
  facet_grid(~species, scales = "free_x", space = "free_x")

Overlaying Geoms

  • geom_jitter() will draw points but with noise in either direction
  • The alpha parameter will make the points partially transparent
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_boxplot(outliers = FALSE) + # Hide any outliers from the boxes
  geom_jitter(width = 0.1, alpha = 0.5) + # Outliers will be shown here
  facet_wrap(~species + island, nrow = 1)

Trying a Violin Plot

  • Violin plots are similar to boxplots
    • can estimate distributions beyond the points (trim = FALSE)
penguins |> 
  filter(!is.na(sex)) |>
  ggplot(aes(sex, body_mass_g, fill = sex)) +
  geom_violin(draw_quantiles = 0.5, trim = FALSE) +
  geom_jitter(width = 0.1, alpha = 0.5) +
  facet_wrap(~species + island, nrow = 1)

Creating A Histogram

  • The default histogram in ggplot2 is a bit ugly
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram()
  • This can be easily improved by setting fill = "grey70" and colour = "black"
  • binwidth is automatically set \(\implies\) try binwidth = 100

Creating A Histogram

## Now apply facet_grid to show by sex & species
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram(fill = "grey70", colour = "black", binwidth = 100) +
  facet_grid(species ~ sex) +
  theme_bw()

Creating A Histogram

  • The above shows counts \(\implies\) can also show frequency (i.e. density)
  • Unfortunately, the code is ugly but does make sense
    • (The ‘stat’ is counting, so after counting convert to a frequency)
## Repeat but showing bars as frequencies
penguins |>
  filter(!is.na(sex)) |>
  ggplot(aes(body_mass_g)) +
  geom_histogram(
    aes(y = after_stat(density)),
    fill = "grey70", colour = "black", binwidth = 100
  ) +
  facet_grid(species ~ sex) +
  theme_bw()

Creating a Summary Barplot

  • Bar plots are pretty common in many fields of research
  • First we’ll create a summary table with mean and sd
    • Use the mean for bars
    • sd for error bars
penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) 
# A tibble: 6 × 4
  species   sex    weight_mn weight_sd
  <fct>     <fct>      <dbl>     <dbl>
1 Adelie    male       4043.      347.
2 Adelie    female     3369.      269.
3 Gentoo    female     4680.      282.
4 Gentoo    male       5485.      313.
5 Chinstrap female     3527.      285.
6 Chinstrap male       3939.      362.

Creating a Summary Barplot

  • Now we can use geom_col()
## Begin creating a bar plot, with separate panels for each species
penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) |>
  ggplot(aes(sex, weight_mn, fill = sex)) +
  geom_col() +
  facet_wrap(~species, nrow = 1) 

Creating a Summary Barplot

penguins |>
  filter(!is.na(sex)) |>
  summarise(
    weight_mn = mean(body_mass_g, na.rm = TRUE),
    weight_sd = sd(body_mass_g, na.rm = TRUE),
    .by = c(species, sex)
  ) |>
  ggplot(aes(sex, weight_mn, fill = sex)) +
  geom_col() +
  ## Now add error bars adding/subtracting from the mean 'on the fly'
  geom_errorbar(
    aes(ymin = weight_mn - weight_sd, ymax = weight_mn + weight_sd),
    width = 0.2 # Set the width of tails on the error bars
  ) +
  facet_wrap(~species, nrow = 1) +
  ## Some extra code to make the plot look great
  scale_y_continuous(expand = expansion(c(0, 0.05))) +
  scale_fill_brewer(palette = "Set1") +
  theme_bw()

Adding Labels To Points

  • The penguins dataset isn’t well suited to adding labels
    • Let’s make a toy dataset
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
)
# A tibble: 4 × 3
      x     y label
  <int> <dbl> <chr>
1     1     1 a    
2     2     4 b    
3     3     9 c    
4     4    16 d    

Adding Labels To Points

  • We can easily plot points, but what about labels?
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point()

Adding Labels To Points

  • Using geom_text() will overlay labels exactly on the points
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point() +
  geom_text(aes(label = label), size = 4)

Adding Labels To Points

  • Using geom_text_repel() will shift labels marginally away from the points
library(ggrepel)
tibble(
  x = 1:4, y = x^2, label = c("a", "b", "c", "d")
) |> 
  ggplot(aes(x, y)) +
  geom_point() +
  geom_text_repel(aes(label = label), size = 4)
  • geom_label() and geom_label_repel() will add borders and fill to labels

Saving Images

  • The simple way is click Export in the Plots pane
  • The way to save using code is
ggsave("myplot.png", width = 7, height = 7, units = "in")
  • This will always save the most recent plot by default
  • Output format is determined by the suffix
  • Try saving as a pdf…

Saving Images

  • I think saving using code is preferable
  • Modify an analysis or data \(\implies\) saved figures will also update
    • This saves time & ensures reproducibility
    • Modifying font sizes etc for publication can take a while

Conclusion

Conclusion

A fabulous resource: https://r-graphics.org/

Challenges

  1. Create a barplot with error bars showing mean flipper length by species
    • Colour (i.e. fill) however you choose
  2. Create a histogram of flipper length by species
    • Facet by species
  3. Use boxplots to show the same (flipper length by species)
    • Don’t facet, but fill the boxes by sex
  4. Using points, compare flipper length to body mass
    • Colour by species including a regression line
    • Try adding stat_ellipse() to your plot

References

Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilkinson, Leland. 2005. The Grammar of Graphics. Springer New York, NY. https://doi.org/https://doi.org/10.1007/0-387-28695-0.
Wong, Bang. 2011. “Color Blindness.” Nat. Methods 8 (6): 441.