<- here::here("data/pigs.csv") |> # Define the file location
pigs read_csv() |> # Import the data
mutate(
## Set the appropriate factor levels
dose = factor(dose, levels = c("Low", "Med", "High")),
supp = factor(supp, levels = c("VC", "OJ"))
)
R Markdown
RAdelaide 2025
R Markdown
Writing Reports Using rmarkdown
rmarkdown
is a cohesive way to- Load & wrangle data
- Analyse data, including figures & tables
- Publish everything in a complete report/analysis
- The package
knitr
is the engine behind this- Replaced the
Sweave
package about 8-10 years ago
- Replaced the
Extends the markdown
language to incorporate R code
Starting With Markdown
A Brief Primer on Markdown
- Markdown is a simple and elegant way to create formatted HTML
- Text is entered as plain text
- Formatting usually doesn’t appear on screen (but can)
- The parsing to HTML often occurs using
pandoc
- Often used for Project README files etc.
- Not R-specific but is heavily-used across data-science
- Go to the File drop-down menu in RStudio
- New File -> Markdown File
- Save As
README.md
Editing Markdown
- Section Headers are denoted by one or more
#
symbols#
is the highest level,##
is next highest etc.
- Italic text is set by enclosing text between a single asterisk (
*
) or underscore (_
) - Bold text is set by using two asterisks (
**
) or two underscores (__
)
R Markdown
Writing Reports Using rmarkdown
- This course was prepared using the next generation known as quarto
- Extends RMarkdown across multiple languages
- People have written entire PhD theses using Rmarkdown
We can output our analysis directly as:
- HTML
- MS Word Documents
- PDF Documents (If you have \(\LaTeX\) installed)
- Slidy,
ioslides
or PowerPoint presentations - Complete Books (using
bookdown
)
We never need to use MS Word, Excel or PowerPoint again!
Creating an R Markdown document
Let’s create our first rmarkdown
document
- Go to the
File
drop-down menu in RStudio - New File -> R Markdown…
Compiling The Report
- The default format is an
html_document
& we can change this later. - Generate the default document by clicking
Knit
Looking At The File
A header section is enclosed between the ---
lines at the top
- Nothing can be placed before this!
- Uses YAML (YAML Ain’t Markup Language)
- Editing is beyond the scope of this course
- Can set custom
.css
files, load LaTeX packages, set parameters etc.
Getting Help
- Check the help for a guide to the syntax.
Help > Markdown Quick Reference
- Well established syntax \(\implies\) ChatGPT or search engines
quarto
allows chunk arguments to be set inside a chunk- e.g.
#| echo: false
- Not Rmarkdown syntax but should work if placed in the chunk header
- e.g.
Creating Our Own Report
Making Our Own Reports
Now we can modify the code to create our own analysis.
- Delete everything in your R Markdown file EXCEPT the header
- We’ll analyse the
pigs
dataset - Edit the title to be something suitable
Creating a Code Chunk
Alt+Ctrl+I
creates a new chunk on Windows/LinuxCmd+Option+I
on OSX
- Type
load-packages
next to the ```{r- This is the chunk name
- Really helpful habit to form
- Enter
library(tidyverse)
in the chunk body- We’ll add other packages as we go
Executing a Code Chunk
- Note that I prefer my Chunk Output in the Console
- Some others prefer it inline. It’s a personal preference
- We write code chunks to be executed sequentially
- Can also execute interactively as we develop code
- Click the
Run Current Chunk
button (or useCtrl+Shift+Enter
)- Clicking the arrow next to
Run
will show platform specific shortcuts
- Clicking the arrow next to
- The output will appear in the Console
- If not, set
Chunk Output in Console
(the “cog” next toRender
)
- If not, set
Dealing With Messages
Knit…
- The
tidyverse
is a little too helpful sometimes- These messages look horrible in a final report
- Are telling us which packages/version
tidyverse
has loaded - Also informing us of conflicts (e.g.
dplyr::filter
Vs.stats::filter
)
- Can be helpful when running an interactive session
- We can hide these from our report
Other Chunk Options
- Here’s my
setup
chunk for this presentation
::opts_chunk$set(
knitrecho = TRUE, include = TRUE, warning = FALSE, message = FALSE,
results = 'hide',
fig.align = "center", fig.show = "asis", fig.width = 6, fig.height = 8
)
- When you’ve seen my results, I’ve set
results = 'asis'
in that chunk header
Structuring Our Own Reports
- I like to load all data straight after loading packages
- Keeps key setup steps cleanly organised in your file
- We should describe our data after loading
- Can include any modifications we make during parsing
- RMarkdown always compiles from the directory it is in
- File paths should be relative to this
- The function
here()
from the packagehere
looks for anRproj
file- Sets this directory as the root directory
- Type
here::here()
in yourConsole
Describing Data
Now let’s add a section header for our analysis to start the report
- Type
## Data Description
after the header and after leaving a blank line - Use your own words to describe the data
- Consider things like how many individuals, different methods, measures etc.
60 guinea pigs were given vitamin C, either in their drinking water in via orange juice. 3 dose levels were given representing low, medium and high doses. This experimental design gave 6 groups with 10 guinea pigs in each. Odontoblast length was measured in order to assess the impacts on tooth growth
Visualising The Data
- The next step might be to visualise the data using a boxplot
- Start a new chunk with ```{r boxplot-data}
|>
pigs ggplot(aes(dose, len, fill = supp)) +
geom_boxplot() +
labs(
x = "Dose",
y = "Odontoblast Length (pm)",
fill = "Method"
+
) scale_fill_brewer(palette = "Set2") +
theme_bw()
Summarising Data
- Next we might like to summarise the data as a table
- Show group means & standard deviations
- Add the following to a new chunk called
data-summary
- I’ve used the HTML code for \(\pm\) (±)
|>
pigs summarise(
n = n(),
mn_len = mean(len),
sd_len = sd(len),
.by = c(supp, dose)
|>
) mutate(
mn_len = round(mn_len, 2),
sd_len = round(sd_len, 2),
len = paste0(mn_len, " ±", sd_len)
|>
) ::select(supp, dose, n, len) dplyr
# A tibble: 6 × 4
supp dose n len
<fct> <fct> <int> <chr>
1 VC Low 10 7.98 ±2.75
2 VC Med 10 16.77 ±2.52
3 VC High 10 26.14 ±4.8
4 OJ Low 10 13.23 ±4.46
5 OJ Med 10 22.7 ±3.91
6 OJ High 10 26.06 ±2.66
Producing Tables
|>
pigs summarise(
n = n(),
mn_len = mean(len),
sd_len = sd(len),
.by = c(supp, dose)
|>
) mutate(
mn_len = round(mn_len, 2),
sd_len = round(sd_len, 2),
len = paste0(mn_len, " ±", sd_len)
|>
) ::select(supp, dose, n, len) |>
dplyrrename_with(str_to_title) |>
pander(
justify = "llrr",
caption = "Odontoblast length for each group shown as mean±SD"
)
Supp | Dose | N | Len |
---|---|---|---|
VC | Low | 10 | 7.98 ±2.75 |
VC | Med | 10 | 16.77 ±2.52 |
VC | High | 10 | 26.14 ±4.8 |
OJ | Low | 10 | 13.23 ±4.46 |
OJ | Med | 10 | 22.7 ±3.91 |
OJ | High | 10 | 26.06 ±2.66 |
Analysing Data
- The default output from
lm()
doesn’t look great
lm(len ~ supp + dose + supp:dose, data = pigs) |>
summary()
Call:
lm(formula = len ~ supp + dose + supp:dose, data = pigs)
Residuals:
Min 1Q Median 3Q Max
-8.20 -2.72 -0.27 2.65 8.27
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.980 1.148 6.949 4.98e-09 ***
suppOJ 5.250 1.624 3.233 0.00209 **
doseMed 8.790 1.624 5.413 1.46e-06 ***
doseHigh 18.160 1.624 11.182 1.13e-15 ***
suppOJ:doseMed 0.680 2.297 0.296 0.76831
suppOJ:doseHigh -5.330 2.297 -2.321 0.02411 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.631 on 54 degrees of freedom
Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
Creating Summary Tables
- Multiple other packages exist for table creation
- All do some things brilliantly, none does everything
pander
is a good all-rounder- Tables are very simplistic
- Also enables easy in-line results
knitr::kable()
is another good all-rounder- Can be nicely tailored using
kableExtra
- Can be nicely tailored using
Complete the Analysis
- After you’re happy with the way your analysis looks
- A good habit is to finish with a section called
Session Info
- Add a code chunk which calls the R command
sessionInfo()
- Or even
sessionInfo() |> pander()
- A good habit is to finish with a section called
- So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our collaborators, or upload to Google docs
- NB: HTML tables don’t work so well in MSWord \(\implies\) stick with
pander
?
Closing Comments
Summary
This basic process is incredibly useful
- We never need to cut & paste anything between R and other documents
- Every piece of information comes directly from our R analysis
- We can very easily incorporate new data as it arrives
- Source data is never modified
- Creates reproducible research
- Highly compatible with collaborative analysis & version control (Git)
I learned using R scripts but now I only use these in formal packages, or if defining functions to use across multiple analyses
A Challenge
Return to the penguins
dataset and perform a complete RMarkdown analysis on any combination of variables you decide
- Choose how to describe the data
- Choose whether to include diagnostic plots
- Choose which figures to include