R Markdown

RAdelaide 2025

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
Telethon Kids Institute

Published

July 10, 2025

Starting With Markdown

A Brief Primer on Markdown

  • Markdown is a simple and elegant way to create formatted HTML
    • Text is entered as plain text
    • Formatting usually doesn’t appear on screen (but can)
    • The parsing to HTML often occurs using pandoc
  • Often used for Project README files etc.
  • Not R-specific but is heavily-used across data-science
  1. Go to the File drop-down menu in RStudio
  2. New File -> Markdown File
  3. Save As README.md

Editing Markdown

  • Section Headers are denoted by one or more # symbols
    • # is the highest level, ## is next highest etc.
  • Italic text is set by enclosing text between a single asterisk (*) or underscore (_)
  • Bold text is set by using two asterisks (**) or two underscores (__)

Editing Markdown

  • Dot-point Lists are started by prefixing each line with -
    • Next level indents are formed by adding 2 or 4 spaces before the next -
  • Numeric Lists are formed by starting a line with 1.
    • Subsequent lines don’t need to be numbered in order

Editing Markdown

Let’s quickly edit our file so there’s something informative

Enter this on the top line

# RAdelaide 2025

Two lines down add this

## Day 1

Leave another blank line then add

1. Introduction to R and R Studio
2. Importing Data
3. Data Exploration
4. Data Visualisation

Editing Markdown

Underneath the list enter:

**All material** can be found at [the couse homepage](http://blackochrelabs.au/RAdelaide25/)

  • Here we’ve set the first two words to appear in bold font
  • The section in the square brackets will appear as text with a hyperlink to the site in the round brackets
  • Click the Preview Button and an HTML document appears
  • Note that README.html has also been produced
    • Sites like github/gitlab render this automatically
    • Obsidian also renders interactively

Creating Our Own Report

Making Our Own Reports

Now we can modify the code to create our own analysis.

  • Delete everything in your R Markdown file EXCEPT the header
  • We’ll analyse the pigs dataset
  • Edit the title to be something suitable

Making Our Own Reports

What do we need for our report?

  • Load and describe the data using clear text explanations
    • Maybe include the questions being asked by the study
  • Create figures which show any patterns, trends or issues
  • Perform an analysis
  • State conclusions
  • Send to collaborators

Making Our Own Reports

  • My “first” real chunk always loads the packages we need
  • We’ll also have to load our data \(\implies\) need to understand file paths

Creating a Code Chunk

  • Alt+Ctrl+I creates a new chunk on Windows/Linux
    • Cmd+Option+I on OSX
  • Type load-packages next to the ```{r
    • This is the chunk name
    • Really helpful habit to form
  • Enter library(tidyverse) in the chunk body
    • We’ll add other packages as we go

Executing a Code Chunk

  • Note that I prefer my Chunk Output in the Console
  • Some others prefer it inline. It’s a personal preference
  • We write code chunks to be executed sequentially
  • Can also execute interactively as we develop code
  • Click the Run Current Chunk button (or use Ctrl+Shift+Enter)
    • Clicking the arrow next to Run will show platform specific shortcuts
  • The output will appear in the Console
    • If not, set Chunk Output in Console (the “cog” next to Render)

Dealing With Messages

Knit…

  • The tidyverse is a little too helpful sometimes
    • These messages look horrible in a final report
    • Are telling us which packages/version tidyverse has loaded
    • Also informing us of conflicts (e.g. dplyr::filter Vs. stats::filter)
  • Can be helpful when running an interactive session
  • We can hide these from our report

Dealing With Messages

  1. Go to the top of your file directly below the YAML
  2. Create a new chunk
  3. Name it setup
  4. Place a comma after setup and add include = FALSE
    • This will hide the chunk from the report
  1. In the chunk body add knitr::opts_chunk$set(message = FALSE)
    • This sets a global parameter for all chunks
    • i.e. Don’t print “helpful” messages

Knit…

Other Chunk Options

  • Here’s my setup chunk for this presentation
knitr::opts_chunk$set(
  echo = TRUE, include = TRUE, warning = FALSE, message = FALSE, 
  results = 'hide',
  fig.align = "center",  fig.show = "asis", fig.width = 6, fig.height = 8
)
  • When you’ve seen my results, I’ve set results = 'asis' in that chunk header

Structuring Our Own Reports

  • I like to load all data straight after loading packages
    • Keeps key setup steps cleanly organised in your file
  • We should describe our data after loading
    • Can include any modifications we make during parsing
  • RMarkdown always compiles from the directory it is in
    • File paths should be relative to this
  • The function here() from the package here looks for an Rproj file
    • Sets this directory as the root directory
    • Type here::here() in your Console

Structuring Our Own Reports

Below the load-packages chunk:

  1. Create a new chunk
  2. Name it load-data
  3. In the chunk body load pigs using read_csv() as below
pigs <- here::here("data/pigs.csv") |> # Define the file location
  read_csv() |> # Import the data
    mutate(
        ## Set the appropriate factor levels
        dose = factor(dose, levels = c("Low", "Med", "High")),
        supp = factor(supp, levels = c("VC", "OJ"))
    )

Chunks can be run interactively using Ctrl+Shift+Enter

Describing Data

Now let’s add a section header for our analysis to start the report

  1. Type ## Data Description after the header and after leaving a blank line
  2. Use your own words to describe the data
    • Consider things like how many individuals, different methods, measures etc.

60 guinea pigs were given vitamin C, either in their drinking water in via orange juice. 3 dose levels were given representing low, medium and high doses. This experimental design gave 6 groups with 10 guinea pigs in each. Odontoblast length was measured in order to assess the impacts on tooth growth

Describing Data

  • In my version, I mentioned the study size
    \(\implies\) we can take this directly from the data
    • Very useful as participants change
    • Can sometimes re-use code for similar experiments
  • nrow(pigs) would give us the number of pigs
  • Replace the number 60 in your description with `r nrow(pigs)`
  • Recompile (i.e. Knit)

Visualising The Data

  • The next step might be to visualise the data using a boxplot
  • Start a new chunk with ```{r boxplot-data}
pigs |> 
    ggplot(aes(dose, len, fill = supp)) +
    geom_boxplot() +
    labs(
        x = "Dose",
        y = "Odontoblast Length (pm)", 
        fill = "Method"
    ) +
    scale_fill_brewer(palette = "Set2") +
    theme_bw()

Visualising the Data

  • Type a description of the figure in the fig.cap section of the chunk header
    • This will need to be placed inside quotation marks

My example text:

Odontoblast length shown by supplement method and dose level

  • If you’re unhappy with the dimensions
    \(\implies\)change fig.width or fig.height in the chunk header
    • Default values are fig.width = 8 & fig.height = 6 (inches)

Summarising Data

  • Next we might like to summarise the data as a table
    • Show group means & standard deviations
  • Add the following to a new chunk called data-summary
    • I’ve used the HTML code for \(\pm\) (&#177;)
pigs |>
    summarise(
        n = n(),
        mn_len = mean(len), 
        sd_len = sd(len),
        .by = c(supp, dose)
    ) |>
    mutate(
        mn_len = round(mn_len, 2),
        sd_len = round(sd_len, 2),
        len = paste0(mn_len, " &#177;", sd_len)
    ) |>
    dplyr::select(supp, dose, n, len)
# A tibble: 6 × 4
  supp  dose      n len             
  <fct> <fct> <int> <chr>           
1 VC    Low      10 7.98 &#177;2.75 
2 VC    Med      10 16.77 &#177;2.52
3 VC    High     10 26.14 &#177;4.8 
4 OJ    Low      10 13.23 &#177;4.46
5 OJ    Med      10 22.7 &#177;3.91 
6 OJ    High     10 26.06 &#177;2.66

Summarising Data

  • This has given a tibble output
  • We can produce an HTML table using pander

Add the following to your load-packages chunk

library(pander)

Producing Tables

pigs |>
    summarise(
        n = n(), 
        mn_len = mean(len), 
        sd_len = sd(len),
        .by = c(supp, dose)
    ) |>
    mutate(
        mn_len = round(mn_len, 2),
        sd_len = round(sd_len, 2),
        len = paste0(mn_len, " &#177;", sd_len)
    ) |>
    dplyr::select(supp, dose, n, len) |>
    rename_with(str_to_title) |>
    pander(
        justify = "llrr",
        caption = "Odontoblast length for each group shown as mean&#177;SD"
    )
Odontoblast length for each group shown as mean±SD
Supp Dose N Len
VC Low 10 7.98 ±2.75
VC Med 10 16.77 ±2.52
VC High 10 26.14 ±4.8
OJ Low 10 13.23 ±4.46
OJ Med 10 22.7 ±3.91
OJ High 10 26.06 ±2.66

Analysing Data

  • The default output from lm() doesn’t look great
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    summary()

Call:
lm(formula = len ~ supp + dose + supp:dose, data = pigs)

Residuals:
   Min     1Q Median     3Q    Max 
 -8.20  -2.72  -0.27   2.65   8.27 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)        7.980      1.148   6.949 4.98e-09 ***
suppOJ             5.250      1.624   3.233  0.00209 ** 
doseMed            8.790      1.624   5.413 1.46e-06 ***
doseHigh          18.160      1.624  11.182 1.13e-15 ***
suppOJ:doseMed     0.680      2.297   0.296  0.76831    
suppOJ:doseHigh   -5.330      2.297  -2.321  0.02411 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.631 on 54 degrees of freedom
Multiple R-squared:  0.7937,    Adjusted R-squared:  0.7746 
F-statistic: 41.56 on 5 and 54 DF,  p-value: < 2.2e-16

Analysing Data

  • pander can again be used to ‘tidy up’ the output from lm
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    summary() |>
    pander(add.significance.stars = TRUE)

Analysing Data

  Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.98 1.148 6.949 4.984e-09 * * *
suppOJ 5.25 1.624 3.233 0.002092 * *
doseMed 8.79 1.624 5.413 1.463e-06 * * *
doseHigh 18.16 1.624 11.18 1.131e-15 * * *
suppOJ:doseMed 0.68 2.297 0.2961 0.7683
suppOJ:doseHigh -5.33 2.297 -2.321 0.02411 *
Fitting linear model: len ~ supp + dose + supp:dose
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
60 3.631 0.7937 0.7746

Interpretation:

  • At Low Dose, OJ increases length by 5.25 above VC
  • Both Med & High increase length for VC
  • The difference in length for OJ is the same for Med as for Low
  • The gains for length by OJ are completely lost at High Dose

Creating Summary Tables

  • Multiple other packages exist for table creation
    • All do some things brilliantly, none does everything
  • pander is a good all-rounder
    • Tables are very simplistic
    • Also enables easy in-line results
  • knitr::kable() is another good all-rounder
    • Can be nicely tailored using kableExtra

Creating Summary Tables

  • To use other packages, \(\implies\) broom::tidy()
    • Will convert lm() output to a tibble
    • This can be passed to other packages which make HTML / \(\LaTeX\) tables
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    broom::tidy()
# A tibble: 6 × 5
  term            estimate std.error statistic  p.value
  <chr>              <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)        7.98       1.15     6.95  4.98e- 9
2 suppOJ             5.25       1.62     3.23  2.09e- 3
3 doseMed            8.79       1.62     5.41  1.46e- 6
4 doseHigh          18.2        1.62    11.2   1.13e-15
5 suppOJ:doseMed     0.680      2.30     0.296 7.68e- 1
6 suppOJ:doseHigh   -5.33       2.30    -2.32  2.41e- 2

Creating Summary Tables

  • reactable creates amazing looking HTML tables
    • Incredibly customisable
  • DT also creates fantastic HTML tables
    • Less flexible with formatting
    • Allows simple downloading to csv, xls etc.
  • gt is popular with some
  • xtable is excellent for \(\LaTeX\) output

Creating Summary Tables

  • Many of us really care how the output looks
  • I spend huge amounts of time getting this just right
library(reactable)
lm(len ~ supp + dose + supp:dose, data = pigs) |>
  broom::tidy() |>
  mutate(
    term = term |> 
      str_replace_all("supp", "Supp = ") |> 
      str_replace_all("dose", "Dose = ") |> 
      str_replace_all(":", " & ") 
  ) |>
  rename_with(str_to_title) |>
  reactable(
    sortable = TRUE, filterable = TRUE,
    defaultColDef = colDef(format = colFormat(digits = 3)),
    theme = reactableTheme(style = list(fontSize = 16)),
    columns = list(
      P.value = colDef(
        name = "P-value",
        cell = \(value) {
          fmt <- ifelse(value > 0.01, "%.3f", "%.2e")
          sprintf(fmt, value)
        }
      )
    )
  )

Complete the Analysis

  • After you’re happy with the way your analysis looks
    • A good habit is to finish with a section called Session Info
    • Add a code chunk which calls the R command sessionInfo()
    • Or even sessionInfo() |> pander()
  • So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our collaborators, or upload to Google docs
  • NB: HTML tables don’t work so well in MSWord \(\implies\) stick with pander?

Closing Comments

Summary

This basic process is incredibly useful

  • We never need to cut & paste anything between R and other documents
  • Every piece of information comes directly from our R analysis
  • We can very easily incorporate new data as it arrives
  • Source data is never modified
  • Creates reproducible research
  • Highly compatible with collaborative analysis & version control (Git)

I learned using R scripts but now I only use these in formal packages, or if defining functions to use across multiple analyses

A Challenge

Return to the penguins dataset and perform a complete RMarkdown analysis on any combination of variables you decide

  1. Choose how to describe the data
  2. Choose whether to include diagnostic plots
  3. Choose which figures to include