[1] 1 2 3 4 5 6 7 8 9 10
Understanding How R Sees Data
RAdelaide 2025
Vectors
Setup
- Start a new script
DataTypes.R
- Make sure you have a fresh R Session
- This session will start simply but we will develop some fundamental concepts
- Please ask questions often!!!
Vectors
The key building blocks for R
objects: Vectors
- There is no such thing as a scalar in
R
- Everything is based around the concept of a vector
What is a vector?
Definition
A vector is zero or more values of the same type
The 4 Atomic Vector Types
- Atomic Vectors are the building blocks for everything in
R
- There are four main types
- Plus two we can ignore
Logical Vectors
- logical: Can only hold the values
TRUE
orFALSE
## Create a logical vector
<- c(TRUE, TRUE, FALSE)
logi_vec print(logi_vec)
[1] TRUE TRUE FALSE
- Spell out that when you type an object’s name, you’re calling
print()
- Also mention that in the 70’s we didn’t have printers so it means print the object to screen
Integer Vectors
- logical
- integer: Counts, ranks or indexing positions
## Create an integer vector
<- 1:5
int_vec print(int_vec)
[1] 1 2 3 4 5
Double Precision Vectors
- logical
- integer
- double: Often (& lazily) referred to as
numeric
## Create a vector with numbers that include decimal points, i.e. doubles
<- c(0.618, 1.414, 2)
dbl_vec print(dbl_vec)
Why are these called doubles?
Character Vectors
- logical
- integer
- double
- character
## Create a character vector
<- c("blue", "red", "green")
char_vec print(char_vec)
The 4 Atomic Vector Types
These are the basic building blocks for all R
objects
- logical
- integer
- double
- character
- There are two more rare types we’ll ignore:
complex
&raw
- All
R
data structures are built on these 6 vector types
Properties of a vector
What defining properties might a vector have?
- The actual values
- Length, accessed by the function
length()
- The type, accessed by the function
typeof()
- Similar but preferable to
class()
- Similar but preferable to
- Optional attributes \(\implies\)
attributes()
- Holds data such as
names
etc.
- Holds data such as
Working with Vectors
We can combine two vectors in R
, using the function c()
c(1, 2)
[1] 1 2
- The numbers
1
&2
were both vectors withlength
1 - We have combined two vectors of length 1, to make a vector of length 2
Coercion
Coercion
What other types could logical
vectors be coerced into?
Try using the functions: as.integer()
, as.double()
& as.character()
on logi_vec
Subsetting Vectors
Subsetting Vectors
One or more elements of a vector can be called using []
<- c("A", "B", "C", "D", "E")
y 2] y[
[1] "B"
1:3] y[
[1] "A" "B" "C"
Extracting Multiple Values
What is really happening in this line?
1:5] euro[
ATS BEF DEM ESP FIM
13.76030 40.33990 1.95583 166.38600 5.94573
We are using the integer vector 1:5
to extract values from euro
int_vec
[1] 1 2 3 4 5
euro[int_vec]
ATS BEF DEM ESP FIM
13.76030 40.33990 1.95583 166.38600 5.94573
Vector Operations
R
Functions are designed to work on vectors
- 1
dbl_vec > 1
dbl_vec ^2
dbl_vecmean(dbl_vec)
sd(dbl_vec)
sqrt(int_vec)
This is one of the real strengths of R
Special Values
- We can create an empty vector of any type using
integer()
,character()
etc - These are zero length vectors \(\implies\) still have a type
- A special value which isn’t quite a vector is
NULL
- This is a zero length vector of type
NULL
- Used to represent the absence of a value
- This is a zero length vector of type
- We can also create using
c()
- We can use
NULL
to return empty vectors
NULL] int_vec[
integer(0)
Matrices
Matrices
- Vectors are strictly one dimensional and have a
length
attribute. - A
matrix
is the two dimensional equivalent
<- matrix(1:6, ncol = 2)
int_mat print(int_mat)
Arrays
Arrays extend matrices to 3 or more dimensions
Beyond the scope of today, but we just have more commas in the square brackets, e.g.
dim(iris3)
[1] 50 4 3
dimnames(iris3)
1,,] iris3[
Setosa Versicolor Virginica
Sepal L. 5.1 7.0 6.3
Sepal W. 3.5 3.2 3.3
Petal L. 1.4 4.7 6.0
Petal W. 0.2 1.4 2.5
1:2,,] iris3[
Homogeneous Data Types
- Vectors, Matrices & Arrays are the basic homogeneous data types
- All are essentially just vectors
Heterogeneous Data Types
Heterogeneous Data Types
Summary of main data types in R
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1d | vector |
list |
2d | matrix |
data.frame |
3d+ | array |
Lists
A list
is a heterogeneous vector.
- Each component is an
R
object - Can be a
vector
, ormatrix
- Could be another
list
- Any other
R
object type we haven’t seen yet
These are incredibly common in R
Subsetting Lists
A list
is a vector
so we can also subset using the []
method
1]
testResults[typeof(testResults[1])
- Using single square brackets returns a
list
- i.e. is a subset of the larger object and of the same type
Data Frames
Data Frames
Finally!
- These are the most common type of data you will work with
- Each column is a
vector
- Columns can be different types of vectors
- Column vectors MUST be the same length
Data Frames & Lists
Data Frames & Lists
Data frames are actually special cases of lists
- Each column of a
data.frame
is a component of alist
- The components must all be vectors of the same length
- Data Frames can be treated identically to a
list
- Have additional subsetting operations and attributes
Common Data Frame Errors
What do you think will happen if we type:
5] pigs[
Error: Column index must be at most 3 if positive, not 5
Working With R
Objects
Name Attributes
How do we assign names?
<- c(a = 1, b = 2, c = 3) named_vec
OR we can name an existing vector
names(int_vec) <- c("a", "b", "c", "d", "e")
Matrices
We can convert vectors to matrices, as earlier
<- matrix(1:6, ncol = 2) int_mat
R
is column major so fills columns by default
<- matrix(1:6, ncol = 2, byrow = TRUE) row_mat
Lists
<- list(int_vec, dbl_vec)
my_list names(my_list) <- c("integers", "doubles")
OR
<- list(integers = int_vec, doubles = dbl_vec) my_list
Data Frames
This is exactly the same as creating lists, but
The names
attribute will also be the colnames()
<- data.frame(doubles = dbl_vec, logical = logi_vec)
my_df names(my_df) == colnames(my_df)
[1] TRUE TRUE
Loading Data
- All of these principles play a role when parsing and wrangling data
- e.g. if a numeric column has some text somewhere in a csv
- If the wrong number of delimiters appear in one of more rows
- If we known what columns should be
\(\implies\) can specify types usingcol_types
- Will force text in numeric columns to
NA
- Can enforce integers vs numeric
- Will force text in numeric columns to
- One final type we haven’t discussed is categorical columns, i.e. factors