RAdelaide 2025
July 8, 2025
x <- 1:5
)data.frame
tibble
tbl_df
objects referring to SQL tablesdata.frame
with some convenience featuresR
object
R
objectxls
/xlsx
) have one or more spreadsheets
library(readxl)
.csv
)
.tsv
)
.csv
but with tabs separating columns.txt
When importing data into R
:
data.frame
columns contain all the same type of valueR
If we’re not careful:
File
> New File
> R Script
(Or Ctrl+Shift+N
)ImportPenguins.R
data.zip
from the workshop homepageRAdelaide25
Extract to here
which should create a folder named data
Make sure your files are in data
not in data/data
data
directory using the Files
pane(You should see penguins.csv
in data
)
penguins.csv
by clicking on it (View File
)
.csv
fileR
Click on penguins.csv
, choose Import Dataset
then stop! 🛑
(Click Update
if you don’t see this)
We have a preview of the data
We also have a preview of the code we’re about to execute
Code Preview
Box
Import
Now paste the copied code at the top of your script
The code we copied has 3 lines:
library(readr)
loads the package readr
readr
functions are about importing datareadr
contains the function read_csv()
read_csv()
tells R what to do with a csv fileThe code we copied has 3 lines:
R Environment
penguins
by using the file name (penguins.csv)The code we copied has 3 lines:
Excel-like
format
Close the preview by clicking the cross
read_csv()
penguins
is now in our R Environment
Environment Tab
click the broom icon 🧹
R Environment
Run
Environment Tab
again and penguins
is backYou can delete the line View(penguins)
Ctrl/Cmd + Enter
library(readr)
then type#
to be a commentlibrary(readr)
then enterpenguins
is known as a data.frame
R
equivalent to a spreadsheet
NA
Instead of View()
\(\implies\) preview by typing the object name
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <chr>, year <dbl>
Gives a preview up to 10 lines with:
A tibble
60 X 3
<chr>
, <chr>
, <dbl>
etcI personally find this more informative than View()
readr
uses a variant called a tbl_df
or tbl
(pronounced tibble)
data.frame
with convenient featurestidyverse
tidyverse
is a collection of thematically-linked packages
library(tidyverse)
loads multiple key packages
readr
is one of these \(\implies\) usually just load the tidyversedplyr
, tidyr
, readr
, ggplot2
+ otherslibrary(readr)
with library(tidyverse)
glimpse
is loaded with library(tidyverse)
What were the differences between each method?
R
head()
and glimpse()
penguins
R
head()
\(\implies\) x
and n
x
has no default value \(\implies\) we need to provide somethingn = 6L
means n
has a default value of 6 (L \(\implies\) integer
)R
Lower down the page you’ll see
Arguments
x
an object
n
an integer vector of length up to dim(x) (or 1, for non-dimensioned objects). Blah, blah, blah…
head()
prints the first part of an object# A tibble: 4 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
# ℹ 2 more variables: sex <chr>, year <dbl>
# A tibble: 4 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
# ℹ 2 more variables: sex <chr>, year <dbl>
# A tibble: 4 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
# ℹ 2 more variables: sex <chr>, year <dbl>
# A tibble: 4 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
# ℹ 2 more variables: sex <chr>, year <dbl>
glimpse()
pillar
width
argument to see what happens
read_csv()
R
function read_csv()
read_csv()
read_csv()
read_csv(
file,
col_names = TRUE, col_types = NULL, col_select = NULL,
id = NULL, locale = default_locale(),
na = c("", "NA"), quoted_na = TRUE,
quote = "\"", comment = "",
trim_ws = TRUE,
skip = 0, n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
file
, col_names
etc.)col_names = TRUE
\(\implies\) column names assumed as the first rowread_csv()
All arguments
were defined somewhere in the GUI.
First Row as Names
check-boxTry clicking/unclicking a few more & try understand the consequences
read_csv()
NA
sread_csv()
Vs read.csv()
RStudio
now uses read_csv()
from readr
by defaultread.csv()
(from utils
) in older scriptsreadr
) version is:
utils
are read.*()
(csv, delim etc.)readr
has the functions read_*()
(csv, tsv, delim etc.)readxl
is for loading .xls
and xlsx
files.Import Dataset
Sheet1
looks pretty simpleSheet2
& Sheet3