<- read_csv("data/pigs.csv") |>
pigs mutate(
dose = fct(dose, levels = c("Low", "Med", "High")),
supp = fct(supp, levels = c("VC", "OJ"))
)
R Markdown
ASI: Introduction to R
R Markdown
Writing Reports Using rmarkdown
rmarkdown
is a cohesive way to- Load & wrangle data
- Analyse data, including figures & tables
- Publish everything in a complete report/analysis
- The package
knitr
is the engine behind this- Replaced the
Sweave
package about 8-10 years ago
- Replaced the
- Extends the
markdown
language to incorporate R code- Newest incarnation is
quarto
\(\implies\) better multi-language integration
- Newest incarnation is
A Brief Primer on Markdown
- Markdown is a very simple and elegant way to create formatted HTML
- Text is entered as plain text
- Formatting usually doesn’t appear on screen (but can)
- The parsing to HTML often occurs using
pandoc
- Often used for Project README files etc.
- Not R-specific but is heavily-used across data-science
- Go to the File drop-down menu in RStudio
- New File -> Markdown File
- Save As
README.md
Editing Markdown
- Section Headers are denoted by on or more
#
symbols#
is the highest level,##
is next highest etc.
- Italic text is set by enclosing text between a single asterisk (
*
) or underscore (_
) - Bold text is set by using two asterisks (
**
) or two underscores (__
)
- Dot-point Lists are started by prefixing each line with
-
- Next level indents are formed by adding 2 or 4 spaces before the next
-
- Next level indents are formed by adding 2 or 4 spaces before the next
- Numeric Lists are formed by starting a line with
1.
- Subsequent lines don’t need to be numbered in order
R Markdown
Writing Reports Using rmarkdown
We can output our analysis directly as:
- HTML
- MS Word Documents
- PDF Documents (If you have \(\LaTeX\) installed)
- HTML or PowerPoint presentations
We never need to use MS Word, Excel or PowerPoint again!
Creating an R Markdown document
Let’s create our first rmarkdown
document
- Go to the
File
drop-down menu in RStudio - New File -> R Markdown…
Looking At The File
A header section is enclosed between the ---
lines at the top
- Nothing can be placed before this!
- Uses YAML (YAML Ain’t Markup Language)
- Editing is beyond the scope of this course
- Can set custom
.css
files, load LaTeX packages, set parameters etc.
Getting Help
Check the help for a guide to the syntax.
Help > Markdown Quick Reference
- Increasing numbers of
#
gives Section->
Subsection->
Subsubsection etc. - Bold is set by **Knit** (or __Knit__)
- Italics can be set using a single asterisk/underline: *Italics* or _Italics_
Typewriter font
is set using a single back-tick `Typewriter`
Compiling The Report
The default format is an html_document
& we can change this later. Generate the default document by clicking Knit
Making Our Own Report
Making Our Own Report
Now we can modify the code to create our own analysis.
- Delete everything in your R Markdown file EXCEPT the header
- We’ll analyse the
pigs
dataset - Edit the title to be something suitable
Creating a Code Chunk
Alt+Ctrl+I
creates a new chunk on Windows/LinuxCmd+Option+I
on OSX
- Type
load-packages
next to the ```{r- This is the chunk name
- Really helpful habit to form
- Enter
library(tidyverse)
in the chunk body- We’ll add other packages as we go
Knit…
Dealing With Messages
- The
tidyverse
is a little too helpful sometimes- These messages look horrible in a final report
- Are telling us which packages/version
tidyverse
has loaded - Also informing us of conflicts (e.g.
dplyr::filter
Vs.stats::filter
) - Can be helpful when running an interactive session
- We can hide these from our report
Making Our Own Reports
- I like to load all data straight after loading packages
- Gets the entire workflow sorted at the beginning
- Alerts to any problems early
Below the load-packages
chunk:
- Create a new chunk
- Name it
load-data
- In the chunk body load
pigs
usingread_csv()
Chunks can be run interactively using Ctrl+Alt+Shift+P
Describing Data
Now let’s add a section header for our analysis to start the report
- Type
## Data Description
after the header and after leaving a blank line - Use your own words to describe the data
- Consider things like how many participants, different methods, measures we have etc.
60 guinea pigs were given vitamin C, either in their drinking water in via orange juice. 3 dose levels were given representing low, medium and high doses. Odontoblast length was measured in order to assess the impacts on tooth growth
Visualising The Data
The next step might be to visualise the data using a boxplot
- Start a new chunk and name it
boxplot-data
|>
pigs ggplot(aes(dose, len, fill = supp)) +
geom_boxplot() +
labs(
x = "Dose",
y = "Odontoblast Length (pm)",
fill = "Method"
+
) scale_fill_brewer(palette = "Set2") +
theme_bw()
Summarising Data
- Next we might like to summarise the data as a table
- Show group means & standard deviations
- Add the following to a new chunk called
data-summary
- I’ve used the HTML code for \(\pm\) (±)
|>
pigs summarise(
n = n(),
mn_len = mean(len),
sd_len = sd(len),
.by = c(supp, dose)
|>
) mutate(
mn_len = round(mn_len, 2),
sd_len = round(sd_len, 2),
len = paste0(mn_len, " ±", sd_len)
|>
) ::select(supp, dose, n, len) dplyr
# A tibble: 6 × 4
supp dose n len
<fct> <fct> <int> <chr>
1 VC Low 10 7.98 ±2.75
2 VC Med 10 16.77 ±2.52
3 VC High 10 26.14 ±4.8
4 OJ Low 10 13.23 ±4.46
5 OJ Med 10 22.7 ±3.91
6 OJ High 10 26.06 ±2.66
Knit…
Producing Tables
|>
pigs summarise(
n = n(),
mn_len = mean(len),
sd_len = sd(len),
.by = c(supp, dose)
|>
) mutate(
mn_len = round(mn_len, 2),
sd_len = round(sd_len, 2),
len = paste0(mn_len, " ±", sd_len)
|>
) ::select(supp, dose, n, len) |>
dplyrrename_with(str_to_title) |>
pander(
caption = "Odontoblast length for each group shown as mean±SD"
)
Supp | Dose | N | Len |
---|---|---|---|
VC | Low | 10 | 7.98 ±2.75 |
VC | Med | 10 | 16.77 ±2.52 |
VC | High | 10 | 26.14 ±4.8 |
OJ | Low | 10 | 13.23 ±4.46 |
OJ | Med | 10 | 22.7 ±3.91 |
OJ | High | 10 | 26.06 ±2.66 |
Analysing Data
- Performing statistical analysis is beyond the scope of today BUT
- The function
lm()
is used to perform linear regression - We’ll perform an analysis remembering that
~
means ‘depends on’
- The function
lm(len ~ supp + dose + supp:dose, data = pigs) |>
summary()
Call:
lm(formula = len ~ supp + dose + supp:dose, data = pigs)
Residuals:
Min 1Q Median 3Q Max
-8.20 -2.72 -0.27 2.65 8.27
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.980 1.148 6.949 4.98e-09 ***
suppOJ 5.250 1.624 3.233 0.00209 **
doseMed 8.790 1.624 5.413 1.46e-06 ***
doseHigh 18.160 1.624 11.182 1.13e-15 ***
suppOJ:doseMed 0.680 2.297 0.296 0.76831
suppOJ:doseHigh -5.330 2.297 -2.321 0.02411 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.631 on 54 degrees of freedom
Multiple R-squared: 0.7937, Adjusted R-squared: 0.7746
F-statistic: 41.56 on 5 and 54 DF, p-value: < 2.2e-16
Creating Summary Tables
- Multiple other packages exist for table creation
- All do some things brilliantly, none does everything
pander
is a good all-rounder- Tables are very simplistic
- Also enables easy in-line results
Complete the Analysis
After you’re happy with the way your analysis looks
- A good habit is to finish with a section called
Session Info
- Add a code chunk which calls the R command
sessionInfo()
So far we’ve been compiling everything as HTML, but let’s switch to an MS Word document. We could email this to our collaborators, or upload to Google docs
Summary
This basic process is incredibly useful
- We never need to cut & paste anything between R and other documents
- Every piece of information comes directly from our R analysis
- We can very easily incorporate new data as it arrives
- Source data is never modified
- Creates reproducible research
- Highly compatible with collaborative analysis & version control (Git)
I learned using R scripts but now I only use these in formal packages, or if defining functions to use across multiple analyses
Advanced Options
- The
R
packageworkflowr
is very helpful for larger workflows- Can include multiple HTML pages
- Strong integration with
git
- Highly customisable output
- Code folding
- Bootstrap themes etc.
- Can use custom
css
files - Interactive plots using
plotly
Challenge
- Load
my_penguins
- Describe the data
- Visualise the data
- (If you’re keen) fit a linear model for one of the measurements & the predictor(s) of your choice