R Markdown

ASI: Introduction to R

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

Published

September 3, 2025

Making Our Own Report

Making Our Own Report

Now we can modify the code to create our own analysis.

  • Delete everything in your R Markdown file EXCEPT the header
  • We’ll analyse the pigs dataset
  • Edit the title to be something suitable

Making Our Own Reports

What do we need for our report?

  • Load and describe the data using clear text explanations
    • Maybe include the questions being asked by the study
  • Create figures which show any patterns, trends or issues
  • Perform an analysis
  • State conclusions
  • Send to collaborators

Making Our Own Reports

  • First we’ll need to load the data
    • Then we can describe the data
  • RMarkdown always compiles from the directory it is in
    • File paths should be relative to this
  • My “first” real chunk always loads the packages we need

Creating a Code Chunk

  • Alt+Ctrl+I creates a new chunk on Windows/Linux
    • Cmd+Option+I on OSX
  • Type load-packages next to the ```{r
    • This is the chunk name
    • Really helpful habit to form
  • Enter library(tidyverse) in the chunk body
    • We’ll add other packages as we go

Knit…

Dealing With Messages

  • The tidyverse is a little too helpful sometimes
    • These messages look horrible in a final report
    • Are telling us which packages/version tidyverse has loaded
    • Also informing us of conflicts (e.g. dplyr::filter Vs. stats::filter)
    • Can be helpful when running an interactive session
  • We can hide these from our report

Dealing With Messages

  1. Go to the top of your file (below the YAML)
  2. Create a new chunk
  3. Name it setup
  4. Place a comma after setup and add include = FALSE
    • This will hide the chunk from the report
  1. In the chunk body add knitr::opts_chunk$set(message = FALSE)
    • This sets a global parameter for all chunks
    • i.e. Don’t print “helpful” messages

Knit…

Making Our Own Reports

  • I like to load all data straight after loading packages
  • Gets the entire workflow sorted at the beginning
  • Alerts to any problems early

Below the load-packages chunk:

  • Create a new chunk
  • Name it load-data
  • In the chunk body load pigs using read_csv()
pigs <- read_csv("data/pigs.csv") |>
    mutate(
        dose = fct(dose, levels = c("Low", "Med", "High")),
        supp = fct(supp, levels = c("VC", "OJ"))
    )

Chunks can be run interactively using Ctrl+Alt+Shift+P

Describing Data

Now let’s add a section header for our analysis to start the report

  1. Type ## Data Description after the header and after leaving a blank line
  2. Use your own words to describe the data
    • Consider things like how many participants, different methods, measures we have etc.

60 guinea pigs were given vitamin C, either in their drinking water in via orange juice. 3 dose levels were given representing low, medium and high doses. Odontoblast length was measured in order to assess the impacts on tooth growth

Describing Data

  • In my version, I mentioned the study size
  • We can take this directly from the data
    • Very useful as participants change
  • nrow(pigs) would give us the number of pigs

Replace the number 60 in your description with `r nrow(pigs)`

Knit…

Visualising The Data

The next step might be to visualise the data using a boxplot

  • Start a new chunk and name it boxplot-data
pigs |>
    ggplot(aes(dose, len, fill = supp)) +
    geom_boxplot() +
    labs(
        x = "Dose",
        y = "Odontoblast Length (pm)", 
        fill = "Method"
    ) +
    scale_fill_brewer(palette = "Set2") +
    theme_bw()

Visualising the Data

  • We can control the figure size using fig.height or fig.width
  • Type a description of the figure in the fig.cap section of the chunk header
    • This will need to be placed inside quotation marks

My example text:

Odontoblast length shown by supplement method and dose level

Summarising Data

  • Next we might like to summarise the data as a table
    • Show group means & standard deviations
  • Add the following to a new chunk called data-summary
    • I’ve used the HTML code for \(\pm\) (&#177;)
pigs |>
    summarise(
        n = n(),
        mn_len = mean(len), 
        sd_len = sd(len),
        .by = c(supp, dose)
    ) |>
    mutate(
        mn_len = round(mn_len, 2),
        sd_len = round(sd_len, 2),
        len = paste0(mn_len, " &#177;", sd_len)
    ) |>
    dplyr::select(supp, dose, n, len)
# A tibble: 6 × 4
  supp  dose      n len             
  <fct> <fct> <int> <chr>           
1 VC    Low      10 7.98 &#177;2.75 
2 VC    Med      10 16.77 &#177;2.52
3 VC    High     10 26.14 &#177;4.8 
4 OJ    Low      10 13.23 &#177;4.46
5 OJ    Med      10 22.7 &#177;3.91 
6 OJ    High     10 26.06 &#177;2.66

Knit…

Summarising Data

  • This has given a tibble output
  • We can produce an HTML table using pander

Add the following to your load-packages chunk

library(pander)

Producing Tables

pigs |>
    summarise(
        n = n(), 
        mn_len = mean(len), 
        sd_len = sd(len),
        .by = c(supp, dose)
    ) |>
    mutate(
        mn_len = round(mn_len, 2),
        sd_len = round(sd_len, 2),
        len = paste0(mn_len, " &#177;", sd_len)
    ) |>
    dplyr::select(supp, dose, n, len) |>
    rename_with(str_to_title) |>
    pander(
        caption = "Odontoblast length for each group shown as mean&#177;SD"
    )
Odontoblast length for each group shown as mean±SD
Supp Dose N Len
VC Low 10 7.98 ±2.75
VC Med 10 16.77 ±2.52
VC High 10 26.14 ±4.8
OJ Low 10 13.23 ±4.46
OJ Med 10 22.7 ±3.91
OJ High 10 26.06 ±2.66

Analysing Data

  • Performing statistical analysis is beyond the scope of today BUT
    • The function lm() is used to perform linear regression
    • We’ll perform an analysis remembering that ~ means ‘depends on’
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    summary()

Call:
lm(formula = len ~ supp + dose + supp:dose, data = pigs)

Residuals:
   Min     1Q Median     3Q    Max 
 -8.20  -2.72  -0.27   2.65   8.27 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)        7.980      1.148   6.949 4.98e-09 ***
suppOJ             5.250      1.624   3.233  0.00209 ** 
doseMed            8.790      1.624   5.413 1.46e-06 ***
doseHigh          18.160      1.624  11.182 1.13e-15 ***
suppOJ:doseMed     0.680      2.297   0.296  0.76831    
suppOJ:doseHigh   -5.330      2.297  -2.321  0.02411 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.631 on 54 degrees of freedom
Multiple R-squared:  0.7937,    Adjusted R-squared:  0.7746 
F-statistic: 41.56 on 5 and 54 DF,  p-value: < 2.2e-16

Analysing Data

  • pander can again be used to ‘tidy up’ the output from lm
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    summary() |>
    pander(add.significance.stars = TRUE)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.98 1.148 6.949 4.984e-09 * * *
suppOJ 5.25 1.624 3.233 0.002092 * *
doseMed 8.79 1.624 5.413 1.463e-06 * * *
doseHigh 18.16 1.624 11.18 1.131e-15 * * *
suppOJ:doseMed 0.68 2.297 0.2961 0.7683
suppOJ:doseHigh -5.33 2.297 -2.321 0.02411 *
Fitting linear model: len ~ supp + dose + supp:dose
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
60 3.631 0.7937 0.7746

Interpretation:

  • At Low Dose, OJ increases length by 5.25 above VC
  • Both Med & High increase length for VC
  • The difference in length for OJ is the same for Med as for Low
  • The gains for length by OJ are completely lost at High Dose

Creating Summary Tables

  • Multiple other packages exist for table creation
    • All do some things brilliantly, none does everything
  • pander is a good all-rounder
    • Tables are very simplistic
    • Also enables easy in-line results

Creating Summary Tables

  • To use other packages, \(\implies\) broom::tidy()
    • Will convert lm() output to a tibble
    • This can be passed to other packages which make HTML / \(\LaTeX\) tables
lm(len ~ supp + dose + supp:dose, data = pigs) |>
    broom::tidy()
# A tibble: 6 × 5
  term            estimate std.error statistic  p.value
  <chr>              <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)        7.98       1.15     6.95  4.98e- 9
2 suppOJ             5.25       1.62     3.23  2.09e- 3
3 doseMed            8.79       1.62     5.41  1.46e- 6
4 doseHigh          18.2        1.62    11.2   1.13e-15
5 suppOJ:doseMed     0.680      2.30     0.296 7.68e- 1
6 suppOJ:doseHigh   -5.33       2.30    -2.32  2.41e- 2

Creating Summary Tables

  • reactable creates amazing looking tables
    • A bit difficult to download the table data
  • DT also creates fantastic tables
    • Less flexible with formatting
    • Allows simple downloading to csv, xls etc.
  • gt is popular with some
  • xtable is excellent for \(\LaTeX\) output

Complete the Analysis

After you’re happy with the way your analysis looks

  • A good habit is to finish with a section called Session Info
  • Add a code chunk which calls the R command sessionInfo()

So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our collaborators, or upload to Google docs

Summary

This basic process is incredibly useful

  • We never need to cut & paste anything between R and other documents
  • Every piece of information comes directly from our R analysis
  • We can very easily incorporate new data as it arrives
  • Source data is never modified
  • Creates reproducible research
  • Highly compatible with collaborative analysis & version control (Git)

I learned using R scripts but now I only use these in formal packages, or if defining functions to use across multiple analyses

Advanced Options

  • The R package workflowr is very helpful for larger workflows
    • Can include multiple HTML pages
    • Strong integration with git
  • Highly customisable output
    • Code folding
    • Bootstrap themes etc.
    • Can use custom css files
    • Interactive plots using plotly

Challenge

  • Load my_penguins
  • Describe the data
  • Visualise the data
  • (If you’re keen) fit a linear model for one of the measurements & the predictor(s) of your choice