[1] 1 2 3 4 5 6 7 8 9 10
RAdelaide 2024
July 10, 2024
The key building blocks for R
objects: Vectors
R
What is a vector?
Definition
A vector is zero or more values of the same type
A simple vector would be
[1] 1 2 3 4 5 6 7 8 9 10
What type of values are in this vector?
Another vector might be
[1] "a" "cat" "video"
What type of values are in this vector?
[1] "742" "Evergreen" "Tce"
What type of values are in this vector?
R
TRUE
or FALSE
numeric
Why are these called doubles?
These are the basic building blocks for all R
objects
complex
& raw
R
data structures are built on these 6 vector typesWhat four defining properties might a vector have?
length()
typeof()
class()
attributes()
names
etc.Let’s try them on our vectors
Were you surprised by any of the results?
We can combine two vectors in R
, using the function c()
1
& 2
were both vectors with length
1What would happen if we combined two vectors of different types?
Q: What happened to the logical
values?
Answer: R
will coerce them into a common type (i.e. integers).
What other types could logical
vectors be coerced into?
Try using the functions: as.integer()
, as.double()
& as.character()
on logi_vec
One or more elements of a vector can be called using []
Double brackets ([[]]
) can be used to return single elements only
If you tried y[[1:3]]
you would receive an error message
If a vector has name attributes, we can call values by name
Try repeating the call-by-name approach using double brackets
What was the difference in the output?
[]
returned the vector with the identical structure[[]]
removed the attributes
& just gave the valueIs it better to call by position, or by name?
Things to consider:
What is really happening in this line?
We are using the integer vector 1:5
to extract values from euro
R
Functions are designed to work on vectors
This is one of the real strengths of R
We can also combine the above logical test and subsetting
An additional logical test: %in%
(read as: “is in”)
Returns TRUE/FALSE
for each value in dbl_vec
if it is in int_vec
NB: int_vec
was coerced silently to a double
vector
length
attribute.matrix
is the two dimensional equivalentdim()
, nrow()
ncol()
rownames()
& colnames()
Some commands to try:
Ask questions if anything is confusing
x[row, col]
row
or col
blank selects the entire row/columnHow would we just get the first column?
NB: Forgetting the comma when subsetting will treat the matrix as a single vector running down the columns
Requesting a row or column that doesn’t exist is the source of a very common error message
Error in int_mat[5, ] : subscript out of bounds
Arrays extend matrices to 3 or more dimensions
Beyond the scope of today, but we just have more commas in the square brackets, e.g.
Summary of main data types in R
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1d | vector |
list |
2d | matrix |
data.frame |
3d+ | array |
A list
is a heterogeneous vector.
R
objectvector
, or matrix
list
R
object type we haven’t seen yetThese are incredibly common in R
Many R
functions provide output as a list
NB: There is a function (print.htest()
) that tells R
how to print the results to the Console
We can call the individual components of a list using the $
symbol followed by the name
Note that each component is quite different to the others.
A list
is a vector
so we can also subset using the []
method
list
Double brackets again retrieve a single element of the vector
R
objectWhen would we use either method?
Finally!
vector
dim()
, nrow()
, ncol()
, rownames()
, colnames()
colnames()
& rownames()
are NOT optional \(\implies\) assigned by default
tibble
variants have simple row numbers as rownamesLet’s load pigs
again
Individual entries can also be extracted using the square brackets
Thinking of columns being vectors is quite useful
data.frame
using the $
operatorThis does NOT work for rows!!!
R
is column major by default (as is FORTRAN
& Matlab)
R
was designed for statistical analysis, but has developed capabilities far beyond thisWe will see this advantage this afternoon
Data frames are actually special cases of lists
data.frame
is a component of a list
list
Forgetting the comma, now gives a completely different result to a matrix!
Was that what you expected?
Try using the double bracket method
What do you think will happen if we type:
Error: Column index must be at most 3 if positive, not 5
R
ObjectsHow do we assign names?
Can we remove names?
The NULL
, or empty, vector in R
is created using c()
We can also use this to remove names
Don’t forget to put the names back…
We can convert vectors to matrices, as earlier
R
is column major so fills columns by default
We can assign row names & column names after creation
Or using dimnames()
This a list of length
2 with rownames
then colnames
as the components.
OR
What happens if we try this?
This is exactly the same as creating lists, but
The names
attribute will also be the colnames()
What happens if we try to combine components that aren’t the same length?