http://blackochrelabs.au/20250902_ASI_RWorkshop/
Introduction
We would like to acknowledge that today we’re meeting on the lands of the Noongar-Whadjuk people. We acknowledge the deep feelings of attachment and relationship of the Noongar people to their Place.
We also pay respects to the cultural authority of Aboriginal and Torres Strait Islander peoples from other areas of Australia today, and pay our respects to Elders past, present and emerging.
As a scientist, I’m honoured to be working with many Indigenous Australians, who have a proud history of being some of the first acknowledged scientists from the emerging historical record
Who Am I?
Stephen (Stevie) Pederson (They/Them)
- Adelaide, Kaurna Country, SA
- Bioinformatician, Black Ochre Data Labs, The Kids Research Institute Australia
- Bioinformatician, Dame Roma Mitchell Cancer Research Laboratories (2020-2022)
- Co-ordinator, UofA Bioinformatics Hub (2014-2020)
Used to write Excel macros for fun whilst being a musician
Who Am I?
- R User for ~20 years \(\implies\) learnt when R was difficult!
- Senior Author of 7 Bioconductor Packages
ngsReports
, extraChIPs
, transmogR
, motifTestR
strandCheckR
, sSNAPPY
, tadar
- Co-Chair, Bioconductor Community Advisory Board
Made countless typos, horrible decisions and catastrophic errors
Today’s Tutors
- Dr Sam Buckberry (Black Ochre Data Labs)
- Dr Jennifer Currenti, Eric Alves (Harry Perkins Institute for Medical Research)
Homepage and Material
- The workshop homepage is http://blackochrelabs.au/20250902_ASI_RWorkshop/
- Data and course material available here
- Will stay live in perpetuity
- Links to notes available
- Slides are directly re-formatted as a simple webpage
- Also in presentation style by clicking the
RevealJS
link below the TOC
(Top RH Corner)
A Brief Introduction to R
Why use R?
- Can make amazing figures
- Easy to manipulate data interactively
- Heavily used for analysis of biological data (along with Python)
- Can handle extremely large datasets
- Packages explicitly designed for complex analysis
- Huge user base of biological researchers (www.bioconductor.org)
- (Can be) very fast
- Can also run as scripts on HPC clusters
I regularly work with data containing millions of lines
Why use R?
- Never use drop-down menus or buttons \(\implies\) type every command
- Complex processes are often implemented as ‘simple’ functions
- Reproducible Research!!!
- Transparent methods
- Integration with version control such as
git
- Avoids common Excel pitfalls (almost) never modify files on disk!
Experience is the best teacher \(\implies\) please practice your skills
What is R?
- Derivative of
S
(John Chambers et al, Bell Labs 1976)
R
first appeared in 1993
- Ross Ihaka and Robert Gentleman (U of Auckland)
- Authors wrote for their own research and students
- First official release (v1.0.0) in Feb 2000
- Now estimated >2 million users (2012)
- Ross Ihaka is of NZ Maori descent
What is R?
- Open source language
- No corporate ownership \(\implies\) free software
- Code is managed by the community of users
R
is formally run by a volunteer committee (R Core)
- Mostly academics
- John Chambers is still a member
- Annual release schedule with patches as required
- Being open source creates headaches for University & Business IT departments
- No guarantees of being virus free
- The community self-regulates
R Packages
- Packages are the key to R’s flexibility and power
- Are collections (or libraries) of related functions
ggplot2
\(\implies\) Create plots using the grammar of graphics
readr
\(\implies\) Read files into `R
R Packages
- Loading a package is like the Matrix where Trinity installs the set of skills to fly a helicopter
- Crap packages are generally identified by the users and just not-used
- Scientific rigour is usually checked during review of the accompanying publication
- Robert Gentleman was also a founder of Bioconductor before moving to 23&me
- \(>\) 22,000 packages are stored on CRAN (https://cran.r-project.org)
- Not curated for statistical quality or documentation
- Automated testing for successful installation
- Packages updated as developers release updates
R Packages
- Bioconductor is a secondary repository (https://www.bioconductor.org)
- \(>\) 3000 packages with a more biological/genomics focus
- Curated for language consistency & documentation
- Packages updayed in bi-annual release cycle (except bug-fixes)
- Some packages also live on github or bitbucket
- Will include latest (unstable) development versions
R Packages
- Installation of a package can be done through a variety of methods:
install.packages("packageName")
will only install from CRAN
- Bioconductor also provide a CRAN package:
BiocManager
BiocManager::install("packageName")
installs from CRAN, Bioconductor & github
- The CRAN package
pak
also installs from multiple locations
Helpful Resources
Much of today is inspired by a two-day developers workshop I attended with Hadley Wickham. Also gave me an opportunity to have some great conversations with Winston Chang
Using R
The R Console
- This is the ‘ugly, old-school’ way of using R
- Still very useful, e.g. testing code on a HPC
- Let’s try using
R
as a standalone tool \(\implies\) open R
NOT RStudio
- On
linux
:
Open a terminal then enter R
- OSX: Click
on your dock
- Windows: Click
in your Start Menu
- Do not open

- This is
R
at it’s ugliest (how I learned R)
The R Console
- This is often referred to as the
R Console
- At it’s simplest is just a calculator
The R Console
R
has many standard functions
- NB: We placed the value inside the brackets after the function name
I never use a calculator program on my laptop, always R
The R Console
We can create objects with names
- We have just created an object called
x
- View the contents of the object by entering it’s name in the
Console
- The
<-
symbol is like an arrow \(\implies\) “put the value 5
into x
”
The R Console
- The object
x
only exists in the R Environment
- We can pass objects to functions and perform operations on the
The R Console
- Everything we’ve just done is trivial
- Real analysis isn’t
- If we perform a series of steps
- Should we keep a copy of what we’ve done?
- If so, how should we do that?
- A common strategy is to record our code as an R Script
R Studio
makes that easy & convenient