Introduction To R and RStudio

RAdelaide 2025

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

July 8, 2025

A Brief Introduction to R

Why use R?

  • Heavily used for analysis of biological data (along with Python)
    • Can handle extremely large datasets
    • Packages explicitly designed for complex analysis
    • Huge user base \(\implies\) internet resources
    • (Can be) very fast
  • Very easy to dynamically interact with large datasets
    • Can also run as static scripts on HPC clusters
    • Low-level languages like C/C++ don’t allow for interactive analysis

How do we use R?

  • Never use drop-down menus or buttons \(\implies\) type every command
    • Complex processes are often implemented as ‘simple’ functions
  • Considered a high-level language
    • Usually easy to read and understand then low-level languages (e.g. C)
    • Trade-off in regard to speed & memory management

How do we use R?

  • Reproducible Research!!!
    • Keep records (i.e. scripts) of every step of every analysis
    • Transparent methods for reviewers
  • Avoids common Excel pitfalls (almost) never modify files on disk!
    \(\implies\) load data from a file into R and work with it there

Experience is the best teacher \(\implies\) please practice your skills

What is R?

What is R?

  • Derivative of S (Bell Labs, Chambers 1977)
  • R began to appear in 1996 (Ihaka and Gentleman 1996)
    • Ross Ihaka and Robert Gentleman (U of Auckland)
    • Disentangled some proprietary S code \(\implies\) open-source
  • First official release (v1.0.0) in Feb 2000
  • My first exposure to R was 2002 (R v1.5)

What is R?

  • Open source language
    • No corporate ownership \(\implies\) free software
    • Code is managed by the community of users
  • R is formally run by volunteers \(\implies\) R Core
    • Mostly academics
    • John Chambers is still a member
  • Annual release schedule + patches
    • Most recent is R 4.5.1 (Jun 14)

Extending R, Chambers (2016)

R Packages

  • Packages are the key to R’s flexibility and power
    • Collections (or libraries) of related functions
    • ggplot2 \(\implies\) Create plots using the grammar of graphics
    • readr \(\implies\) Read files into R

R Packages

  • \(>\) 22,000 packages are stored on CRAN (https://cran.r-project.org)
    • Not curated for statistical quality or documentation
    • Automated testing for successful installation
  • Bioconductor is a secondary repository (https://www.bioconductor.org)
    • \(>\) 3000 packages with a more biological/genomics focus
    • Curated for language consistency & documentation
  • Some packages also live on github or bitbucket
    • Will include latest (unstable) development versions

R Packages

  • Installation of a package can be done through a variety of methods:
    • install.packages("packageName") will only install from CRAN
  • Bioconductor also provide a CRAN package: BiocManager
    • BiocManager::install("packageName") installs from CRAN, Bioconductor & github
  • The CRAN package pak also installs from multiple locations

Using R

The R Console

  • Let’s try using R as a standalone tool \(\implies\) open R NOT RStudio
    • On linux:
      Open a terminal then enter R
    • OSX: Click on your dock
    • Windows: Click in your Start Menu
  • Do not open
  • This is R at it’s ugliest (how I learned R)

The R Console

  • This is often referred to as the R Console
  • At it’s simplest R is just a calculator (Press Enter)
> 1 + 1
[1] 2
> 2 * 2
[1] 4
> 2 ^ 3
[1] 8

The R Console

  • R has many standard functions
    • place the value inside the brackets after the function name
> sqrt(2)
[1] 1.414214
> log10(1000)
[1] 3

The R Console

We can create objects with names

> x <- 5
  • We have just created an object called x
  • The <- symbol is like an arrow i.e. “put the value 5 into x

An APL Keyboard from the 1970s showing the arrow as a single key

The R Console

  • View the contents of the object by entering it’s name in the Console
> x
[1] 5
  • The object x only exists in the R Environment
  • Imagine your R Environment to be like an Excel Workbook
    • Each object is named just like each spreadsheet is named
      \(\implies\) objects don’t have to look like a spreadsheet

The R Console

Do R Objects Have Rules About Valid/Invalid Names

  • The only “special characters” which can be used are . and _
    • This makes perfect sense when you realise +, /, *, - etc. are mathematical operators
    • Names cannot contain spaces \(\implies\) R will think there are two objects
  • Object names must start with a letter, . or _.
    • Objects starting with a . are often hidden from view
  • Common conventions are snake_case (my_object) or camelCase (myObject)
  • Numbers can follow the initial character but cannot start a name
    • e.g. x1 is a valid name whilst 1x is not

The R Console

  • We can pass objects to functions and perform operations on them
> x + 1
[1] 6
> sqrt(x)
[1] 2.236068
> x^2
[1] 25
> x > 1
[1] TRUE

The R Console

  • Everything we’ve just done is trivial
  • Real analysis isn’t
  • If we perform a series of steps
    • Should we keep a copy of what we’ve done?
    • If so, how should we do that?

Recording Our R History

  • Try using your up and down arrow keys in the R Console
  • This will scroll through the history of commands you’ve entered
    • Can re-execute or modify any of these
  • Saved in a file called .Rhistory in your working directory
  • A common strategy is to record our code as an R Script as we write it
    • Just a plain text file with a record of our commands
  • R Studio makes that easy & convenient

Exiting R

  • To exit R type q()
  • You will then be asked: Save workspace image? [y/n/c]:
    • Type n then hit Enter
  • We can save our R environment with all of the objects we’ve created
    • By default this will be .RData in the folder we’ve been working in
  • I personally never save .RData \(\implies\) can lead to dodgy analysis
    • The code to create objects & run analysis is the important part

R Studio

Introduction to RStudio

R and RStudio are two separate but connected things

  • R is like the engine of your car
  • RStudio is the ‘cabin’ we use to control the engine
    • Comes with extra features un-related to R that improve our ‘journey’
    • Known as an IDE (Integrated Development Environment)
  • R does all the calculations, manages the data, generates plots
    • i.e. gets us to our destination
  • RStudio helps manage our code, display the plots etc
    • i.e. makes our journey easier to navigate

Some very helpful features of RStudio

  • We can write scripts and execute code interactively
  • We can see everything we need (directories, plots, code, history etc.)
  • Predictive auto-completion
  • Integration with Github Co-Pilot
  • Integration with other languages
    • markdown, \(\LaTeX\), bash, python, C++, git etc.
  • Numerous add-ons to simplify larger tasks
  • Posit is now developing Positron to better enable a variety of languages

Create an R Project

I use R Projects to manage each analysis

  1. Create a directory on your computer for today’s material
    • We recommend RAdelaide25 in your home directory
  1. Now open RStudio
    • RStudio will always open in a directory somewhere
    • Look in the Files pane (bottom-right) to see where it’s looking
      (Or type getwd() in the Console pane)
    • This is the current working directory for R

Create an R Project

We want RStudio to be looking in our new directory (RAdelaide25)
\(\implies\)R Projects make this easy

  • File > New Project > Existing Directory
  • Browse to your RAdelaide25 directory \(\implies\) Create Project

Create an R Project

  • The R Project name is always the directory name
  • Not essential, but good practice and extremely useful
  • The Project Menu is in the top-right of RStudio
  • Enables us to work on multiple analyses/datasets
  • Just open the relevant project \(\implies\) you’re ready to go

Create An Empty R Script

  1. File > New File > R Script
  2. Save As DataImport.R

RStudio

This is the basic layout we often work with

The R Console

  • This is the R Console within the RStudio IDE
  • We’ve already explored this briefly
  • In the same pane we also have two other tabs
    • Terminal: An approximation of a bash terminal (or PowerShell for Windows)
    • Background Jobs shows progress when compiling RMarkdown & Quarto \(\implies\) Not super relevant

The R Environment

Like we did earlier, in the R Console type:

> x <- 5

Where have we created the object x?

  • Is it on your hard drive somewhere?
  • Is it in a file somewhere?
  • We have placed x in our R Environment
  • Formally known as your Global Environment

The R Environment

  • The R Environment is like your desktop
  • We keep all our relevant objects here
    • Multiple objects are usually created during an analysis
    • Can save all the objects in your environment as a single .RData object
    • R can be set to automatically save your environment on exit
    • Unlike Excel: We usually save our code not our environment

The History Tab

  • Next to the Environment Tab is the History Tab
  • Keeps a record of the last ~200 lines of code
    • Very useful for remembering steps during exploration
    • Best practice is to enter + execute code from the Script Window
  • We can generally ignore the Connections and any other tabs
    • A git tab will also appear for those who use git in their project

Accessing Help

> ?sqrt
  • This will take you to the Help Tab for the sqrt() function
    • Contents may look confusing at this point but will become clearer
  • Many inbuilt functions are organised into a package called base
    • Packages group similar/related functions together
    • base is always installed and loaded with R

Additional Sources For Help

  • Help pages in R can be hit & miss
    • Some are excellent and informative \(\implies\) some aren’t
    • I regularly read my own help pages
  • Bioconductor has a support forum for Bioconductor packages
    • https://support.bioconductor.org
    • All packages have a vignette (again varying quality)
  • Google is your friend \(\implies\) maybe ChatGPT?

The Plots Pane

  • We’ve already seen the Files Tab
  • Plots appear in the Plots Tab
> plot(cars)

Other Panes

  • The Packages Tab is a bad idea
    • Can be disabled by popular request (I always do)
    • Temptation to click is strong
    • Very bad for reproducible research!!!
  • Viewer Tab is used when compiling HTML documents from RMarkdown
  • Every tab can be minimised/maximised using the buttons on the top right
  • Window separators can be be moved to resize panes manually

Cheatsheet and Shortcuts

Help > Cheatsheets > RStudio IDE Cheat Sheet

Page 2 has lots of hints:

  • Ctrl + 1 places focus on the Script Window
  • Ctrl + 2 places focus on the Console
  • Ctrl + 3 places focus on the Help Tab

Conclusion

  • R is the engine, driving everything we do
  • RStudio brings together multiple features for easy coding
  • Can easily access Help, Plots, Scripts etc
  • Integrated Development Environment (IDE)
  • Other IDEs do exist (e.g. VSCode, Positron) \(\implies\) beyond scope of course

References

Chambers, John M. 1977. Computational Methods for Data Analysis. New York: Wiley.
———. 2020. “S, r, and Data Science.” Proc. ACM Program. Lang. 4 (HOPL): 1–17.
Ihaka, Ross, and Robert Gentleman. 1996. R: A Language for Data Analysis and Graphics.” J. Comput. Graph. Stat. 5 (3): 299–314.