Loading Data Into R

ASI: Introduction to R

Author
Affiliation

Dr Stevie Pederson

Black Ochre Data Labs
The Kids Research Institute Australia

Published

September 2, 2025

Import Using the GUI

Importing Data

  1. Preview the file pigs.csv by clicking on it (View File)
    • Try in Excel if you prefer, but DO NOT save anything from Excel
  • The data measures tooth (i.e. odontoblast) length in guinea pigs
    • Using 3 dose levels of Vitamin C (“Low”, “Med”, “High”)
  • Vitamin C was given in drinking water or using orange juice
    • “OJ” or “VC”

Importing Data

  • This type of data is very easy to manage in R
    • Plain text with comma delimiters
    • Simple column structure with column names
    • No blank rows at the top or separating sub-tables
    • No blank columns
    • No rownames

Using the GUI To Load Data

Click on the pigs.csv and choose Import Dataset then stop!

(Click Update if you don’t see this)

The Preview Window


We have a preview of the data
  • This is another preview of the data before we import it
  • There are 3 columns: len, supp and dose
    • len is a double (numeric)
    • The other two are character columns

The Preview Window


We also have a preview of the code we’re about to execute

The Preview Window

  1. Select and copy all the code in the Code Preview Box
    • We’ll paste this somewhere in a minute…
  1. Click Import

  2. Magic happens!!!
    Ignore the red/blue text. This is just ‘helpful’ information

  • Now paste the copied code at the top of your script

What just happened?

The code we copied has 3 lines:

library(readr)
pigs <- read_csv("data/pigs.csv")
View(pigs)
  • Loads the package readr using library(readr)
    • Packages are collections (i.e. libraries) of related functions
    • All readr functions are about importing data
  • readr contains the function read_csv()
  • read_csv() tells R what to do with a csv file

What just happened?

The code we copied has 3 lines:

library(readr)
pigs <- read_csv("data/pigs.csv")
View(pigs)
  • The 2nd line actually loads the data into your R Environment
  • It created an object named pigs by using the file name (pigs.csv)
  • Can change this name if we wish

What just happened?

The code we copied has 3 lines:

library(readr)
pigs <- read_csv("data/pigs.csv")
View(pigs)
  • Opens a preview in a familiar Excel-like format
    • I personally don’t use this

Close the preview by clicking the cross

What just happened?

  • We have just loaded data using the default settings of read_csv()
  • The object pigs is now in our R Environment
    • The original file remains on our HDD without modification!!!
  • The code is saved in our script
    \(\implies\) we don’t need the GUI for this operation again!

Let’s Demonstrate

  1. In the Environment Tab click the broom icon ()
    • This will delete everything from your R Environment
    • It won’t unload the packages
  1. Select the code we’ve just pasted and send it to the console
    Reloading the packages won’t hurt

  2. Check the Environment Tab again and pigs is back

  • You can delete the line View(pigs)

Realistically we only need to preview it the first time. Having that preview open every time actually ends up being really annoying

Functions

Functions in R

head(pigs)
glimpse(pigs)
  • Here we have called the functions 1) head() and 2) glimpse()
    • They were both executed on the object pigs
  • Call the help page for head()
?head

(if you get multiple options, choose the one from utils)

Functions in R

  • The key place to look at is
head(x, ...)
## Default S3 method:
head(x, n = 6L, ...)
  • there are two arguments to head() \(\implies\) x and n
    • x has no default value \(\implies\) we need to provide something
    • n = 6L means n has a default value of 6 (L \(\implies\) integer)

Execute head() to show the error!!!

Functions in R

Lower down the page you’ll see

Arguments

x    an object
n    an integer vector of length up to dim(x) (or 1, for non-dimensioned objects). Blah, blah, blah…

  • Some of the rest is technical detail (sometimes very helpful)

Function Arguments

  • head() prints the first part of an object
  • Useful for very large objects (e.g. if we had 1000 pigs)
  • We can change the number of rows shown to us
head(pigs, 4)
# A tibble: 4 × 3
    len supp  dose 
  <dbl> <chr> <chr>
1   4.2 VC    Low  
2  11.5 VC    Low  
3   7.3 VC    Low  
4   5.8 VC    Low  

Function Arguments

  • Notice we didn’t provide these as named arguments
  • If passing values in order \(\implies\) no need
head(pigs, 4)
# A tibble: 4 × 3
    len supp  dose 
  <dbl> <chr> <chr>
1   4.2 VC    Low  
2  11.5 VC    Low  
3   7.3 VC    Low  
4   5.8 VC    Low  
head(x = pigs, n = 4)
# A tibble: 4 × 3
    len supp  dose 
  <dbl> <chr> <chr>
1   4.2 VC    Low  
2  11.5 VC    Low  
3   7.3 VC    Low  
4   5.8 VC    Low  

Function Arguments

  • If we name the arguments, we can pass in any order we choose
head(x = pigs, n = 4)
# A tibble: 4 × 3
    len supp  dose 
  <dbl> <chr> <chr>
1   4.2 VC    Low  
2  11.5 VC    Low  
3   7.3 VC    Low  
4   5.8 VC    Low  
head(n = 4, x = pigs)
# A tibble: 4 × 3
    len supp  dose 
  <dbl> <chr> <chr>
1   4.2 VC    Low  
2  11.5 VC    Low  
3   7.3 VC    Low  
4   5.8 VC    Low  

Understanding read_csv()

  • Earlier we called the R function read_csv()
  • Check the help page
?read_csv
  • We have four functions shown but stick to read_csv()

Understanding read_csv()

read_csv(
  file,
  col_names = TRUE, col_types = NULL, col_select = NULL,
  id = NULL, locale = default_locale(), 
  na = c("", "NA"), quoted_na = TRUE,
  quote = "\"", comment = "",
  trim_ws = TRUE,
  skip = 0, n_max = Inf,
  guess_max = min(1000, n_max),
  name_repair = "unique",
  num_threads = readr_threads(),
  progress = show_progress(),
  show_col_types = should_show_types(),
  skip_empty_rows = TRUE,
  lazy = should_read_lazy()
)
  • This function has numerous arguments (file, col_names etc.)
  • Most have default values given
    • All were defined somewhere in the GUI
    • The default assumes there are column names in the first row (col_names = TRUE)

Understanding read_csv()

All arguments for the function were defined somewhere in the GUI.

  1. Open the GUI Preview by clicking on the file again
  2. Uncheck the First Row as Names check-box

Understanding read_csv()

All arguments for the function were defined somewhere in the GUI.

  1. Open the GUI Preview by clicking on the file again
  2. Uncheck the First Row as Names check-box
    • What happened to the code?
    • How did the columns change?

Try clicking/unclicking a few more & try understand the consequences

Closing Comments

read_csv() Vs read.csv()

  • RStudio now uses read_csv() from readr by default
  • You will often see read.csv() in older scripts (from utils)
  • The newer (readr) version is:
    • slightly faster
    • more user-friendly
    • gives informative messages
    • always returns a tibble
  • Earlier functions in utils are read.*() (csv, delim etc.)
  • readr has the functions read_*() (csv, tsv, delim etc.)
  • I always use the newer ones

Reading Help Pages: Bonus Slide

  • The bottom three functions are simplified wrappers to read_delim()
  • read_csv() calls read_delim() using delim = ","
  • read_csv2() calls read_delim() using delim = ";"
  • read_tsv() calls read_delim() using delim = "\t"


What function would we call for space-delimited files?

Loading Excel Files

  • The package readxl is for loading .xls and xlsx files.
  • Not part of the core tidyverse but very compatible
library(readxl)
  • The main function is read_excel()
?read_excel

Loading Excel Files

  • This file contains multiple sheets
excel_sheets("data/RealTimeData.xlsx")
[1] "Sheet1" "Sheet2" "Sheet3"

I found this file after a random Google search for RT-PCR and Excel about 10 years ago. I didn’t keep track of who created it…

  • Once again we can click on the file \(\implies\) Import Dataset
    • Sheet1 looks pretty simple
    • First column has no name

Loading Excel Files

pcr <- read_excel("data/RealTimeData.xlsx")
colnames(pcr)[1] <- "Sample"
  • There are two pieces of data in the 1st column
    • We’ll learn how to manage this in the next session
  • Have a look at the previews of Sheet2 and Sheet3
    • Everything here is surprisingly easy to wrangle

References

Ziemann, Mark, Yotam Eren, and Assam El-Osta. 2016. “Gene Name Errors Are Widespread in the Scientific Literature.” Genome Biol. 17 (1): 177.

Footnotes

  1. https://blog.genenames.org/newsletters/2020/08/28/Summer_newsletter/↩︎

  2. https://blog.genenames.org/newsletters/2020/08/28/Summer_newsletter/↩︎