Welcome & Introduction

ASI: Introduction to R

Dr Stevie Pederson

Black Ochre Data Labs
Telethon Kids Institute

September 2, 2025




http://blackochrelabs.au/20250902_ASI_RWorkshop/

Introduction

We would like to acknowledge that today we’re meeting on the lands of the Noongar-Whadjuk people. We acknowledge the deep feelings of attachment and relationship of the Noongar people to their Place.

We also pay respects to the cultural authority of Aboriginal and Torres Strait Islander peoples from other areas of Australia today, and pay our respects to Elders past, present and emerging.

As a scientist, I’m honoured to be working with many Indigenous Australians, who have a proud history of being some of the first acknowledged scientists from the emerging historical record

Who Am I?

Stephen (Stevie) Pederson (They/Them)

  • Adelaide, Kaurna Country, SA
  • Bioinformatician, Black Ochre Data Labs, The Kids Research Institute Australia
  • Bioinformatician, Dame Roma Mitchell Cancer Research Laboratories (2020-2022)
  • Co-ordinator, UofA Bioinformatics Hub (2014-2020)

Who Am I?

  • R User for ~20 years \(\implies\) learnt when R was difficult!
  • Senior Author of 7 Bioconductor Packages
    • ngsReports, extraChIPs, transmogR, motifTestR
    • strandCheckR, sSNAPPY, tadar
  • Co-Chair, Bioconductor Community Advisory Board

Made countless typos, horrible decisions and catastrophic errors

Today’s Tutors

  • Dr Sam Buckberry (Black Ochre Data Labs)
  • Dr Jennifer Currenti, Eric Alves (Harry Perkins Institute for Medical Research)

Housekeeping

  • Toilets
  • Morning Tea

Homepage and Material

  • The workshop homepage is http://blackochrelabs.au/20250902_ASI_RWorkshop/
    • Data and course material available here
    • Will stay live in perpetuity
  • Links to notes available
    • Slides are directly re-formatted as a simple webpage
    • Also in presentation style by clicking the RevealJS link below the TOC
      (Top RH Corner)

A Brief Introduction to R

Why use R?

  • Can make amazing figures
  • Easy to manipulate data interactively
  • Heavily used for analysis of biological data (along with Python)
    • Can handle extremely large datasets
    • Packages explicitly designed for complex analysis
    • Huge user base of biological researchers (www.bioconductor.org)
  • (Can be) very fast
  • Can also run as scripts on HPC clusters

Why use R?

  • Never use drop-down menus or buttons \(\implies\) type every command
    • Complex processes are often implemented as ‘simple’ functions
  • Reproducible Research!!!
    • Transparent methods
    • Integration with version control such as git
  • Avoids common Excel pitfalls (almost) never modify files on disk!

Experience is the best teacher \(\implies\) please practice your skills

What is R?

  • Derivative of S (John Chambers et al, Bell Labs 1976)
  • R first appeared in 1993
    • Ross Ihaka and Robert Gentleman (U of Auckland)
    • Authors wrote for their own research and students
    • First official release (v1.0.0) in Feb 2000
    • Now estimated >2 million users (2012)

What is R?

  • Open source language
    • No corporate ownership \(\implies\) free software
    • Code is managed by the community of users
  • R is formally run by a volunteer committee (R Core)
    • Mostly academics
    • John Chambers is still a member
  • Annual release schedule with patches as required

Extending R, Chambers (2016)

R Packages

  • Packages are the key to R’s flexibility and power
    • Are collections (or libraries) of related functions
    • ggplot2 \(\implies\) Create plots using the grammar of graphics
    • readr \(\implies\) Read files into `R

R Packages

  • \(>\) 22,000 packages are stored on CRAN (https://cran.r-project.org)
    • Not curated for statistical quality or documentation
    • Automated testing for successful installation
    • Packages updated as developers release updates

R Packages

  • Bioconductor is a secondary repository (https://www.bioconductor.org)
    • \(>\) 3000 packages with a more biological/genomics focus
    • Curated for language consistency & documentation
    • Packages updayed in bi-annual release cycle (except bug-fixes)
  • Some packages also live on github or bitbucket
    • Will include latest (unstable) development versions

R Packages

  • Installation of a package can be done through a variety of methods:
    • install.packages("packageName") will only install from CRAN
  • Bioconductor also provide a CRAN package: BiocManager
    • BiocManager::install("packageName") installs from CRAN, Bioconductor & github
  • The CRAN package pak also installs from multiple locations

Using R

The R Console

  • Let’s try using R as a standalone tool \(\implies\) open R NOT RStudio
    • On linux:
      Open a terminal then enter R
    • OSX: Click on your dock
    • Windows: Click in your Start Menu
  • Do not open
  • This is R at it’s ugliest (how I learned R)

The R Console

  • This is often referred to as the R Console
  • At it’s simplest is just a calculator
1 + 1
[1] 2
2 * 2
[1] 4
2 ^ 3
[1] 8

The R Console

  • R has many standard functions
sqrt(2)
[1] 1.414214
log10(1000)
[1] 3
  • NB: We placed the value inside the brackets after the function name

The R Console

We can create objects with names

x <- 5
  • We have just created an object called x
  • View the contents of the object by entering it’s name in the Console
x
[1] 5
  • The <- symbol is like an arrow \(\implies\) “put the value 5 into x

An APL Keyboard from the 1970s

The R Console

  • The object x only exists in the R Environment
  • We can pass objects to functions and perform operations on the
x + 1
[1] 6
sqrt(x)
[1] 2.236068
x^2
[1] 25
x > 1
[1] TRUE

The R Console

  • Everything we’ve just done is trivial
  • Real analysis isn’t
  • If we perform a series of steps
    • Should we keep a copy of what we’ve done?
    • If so, how should we do that?
  • A common strategy is to record our code as an R Script
  • R Studio makes that easy & convenient