RAdelaide 2024
July 9, 2024
x <- 1:5
)R
equivalent to a spreadsheet is known as a data.frame
tibble
tbl_df
objects referring to SQL tablesdata.frame
with pretty bows & ribbonsR
object
R
objectdata.frame
s are structured with vectors as columnsR
If we’re not careful:
File
> New File
> R Script
(Or Ctrl+Shift+N
)GuineaPigs.R
Then get the data for this exercise.
data.zip
from the workshop homepageRAdelaide24
data
data
not in data/data
data
directory using the Files
pane(You should see pigs.csv
in data
)
pigs.csv
by clicking on it (View File
)
R
Click on pigs.csv
, choose Import Dataset
then stop! 🛑
(Click Update
if you don’t see this)
We have a preview of the data
We also have a preview of the code we’re about to execute
Code Preview
Box
Import
Now paste the copied code at the top of your script
The code we copied has 3 lines:
library(readr)
loads the package readr
readr
functions are about importing datareadr
contains the function read_csv()
read_csv()
tells R what to do with a csv fileThe code we copied has 3 lines:
R Environment
pigs
by using the file name (pigs.csv)The code we copied has 3 lines:
Excel-like
format
Close the preview by clicking the cross
read_csv()
pigs
is now in our R Environment
Environment Tab
click the broom icon 🧹
R Environment
Run
Environment Tab
again and pigs
is backYou can delete the line View(pigs)
Ctrl/Cmd + Enter
library(readr)
then type#
to be a commentlibrary(readr)
then enterpigs
is known as a data.frame
R
equivalent to a spreadsheet
NA
Instead of View()
\(\implies\) preview by typing the object name
Gives a preview up to 10 lines with:
A tibble
60 X 3
len
, supp
, dose
<dbl>
, <chr>
, <chr>
I personally find this more informative than View()
readr
uses a variant called a tbl_df
or tbl
(pronounced tibble)
data.frame
with convenient featurestidyverse
tidyverse
is a collection of thematically-linked packages
library(tidyverse)
loads all of these packages
readr
is one of these \(\implies\) usually just load the tidyverse [1] "broom" "conflicted" "cli" "dbplyr"
[5] "dplyr" "dtplyr" "forcats" "ggplot2"
[9] "googledrive" "googlesheets4" "haven" "hms"
[13] "httr" "jsonlite" "lubridate" "magrittr"
[17] "modelr" "pillar" "purrr" "ragg"
[21] "readr" "readxl" "reprex" "rlang"
[25] "rstudioapi" "rvest" "stringr" "tibble"
[29] "tidyr" "xml2" "tidyverse"
library(readr)
with library(tidyverse)
glimpse
is from the package (pillar
)
library(tidyverse)
What were the differences between each method?
R
head()
and glimpse()
pigs
R
head()
\(\implies\) x
and n
x
has no default value \(\implies\) we need to provide somethingn = 6L
means n
has a default value of 6 (L \(\implies\) integer
)R
Lower down the page you’ll see
Arguments
x
an object
n
an integer vector of length up to dim(x) (or 1, for non-dimensioned objects). Blah, blah, blah…
head()
prints the first part of an objectglimpse()
pillar
width
argument to see what happensread_csv()
R
function read_csv()
read_csv()
read_csv()
read_csv(
file,
col_names = TRUE, col_types = NULL, col_select = NULL,
id = NULL, locale = default_locale(),
na = c("", "NA"), quoted_na = TRUE,
quote = "\"", comment = "",
trim_ws = TRUE,
skip = 0, n_max = Inf,
guess_max = min(1000, n_max),
name_repair = "unique",
num_threads = readr_threads(),
progress = show_progress(),
show_col_types = should_show_types(),
skip_empty_rows = TRUE,
lazy = should_read_lazy()
)
file
, col_names
etc.)col_names = TRUE
)read_csv()
All arguments
were defined somewhere in the GUI.
First Row as Names
checkboxTry clicking/unclicking a few more & try understand the consequences
read_csv()
NA
sread_csv()
Vs read.csv()
RStudio
now uses read_csv()
from readr
by defaultread.csv()
(from utils
) in older scriptsreadr
) version is:
utils
are read.*()
(csv, delim etc.)readr
has the functions read_*()
(csv, tsv, delim etc.)readxl
is for loading .xls
and xlsx
files.Import Dataset
Sheet1
looks pretty simpleSheet2
& Sheet3