library(tidyverse)
RAdelaide 2025
July 9, 2025
text.R
character
vectorsRun 1
in one file and Run_001
in anotherregexp
) are incredibly powerful tools in this spaceregexp
syntax is not unique to R
R
does have a few unique “quirks” thoughstringr
contains functions for text manipulationstr_detect()
str_remove()
str_extract()
str_replace()
grepl()
, grep()
, gsub()
etc. from base
stringr::str_detect()
str_detect()
returns a logical vector \(\implies\) same length as the input vectorstringr::str_detect()
.
functions as a wildcard
*
has a different meaningstringr::str_remove()
stringr::str_remove_all()
str_remove()
will only remove the first matchstr_remove_all()
will remove all matchesstringr::str_remove_all()
str_extract()
can be more usefulstringr::str_extract()
+
H
and then all following lower-case letters
[a-z]
stringr::str_extract()
stringr::str_extract()
NA
if no matchstringr::str_extract_all()
str_extract()
will only return the first matchx
regex
allows more powerful matching.
)[a-z]
)
[A-Z]
for upper-case[0-9]
for numbers[:alnum:]
represents all alpha-numeric characters+
for one or more
*
|
means OR
^
anchors a match to the start$
anchors a match to the end^
has a second meaning
[]
it means notstringr::str_view()
str_view()
stringr::str_replace()
str_replace()
is used for extracting/modifying text strings
str_extract()
string
“Hi Mum” for the pattern
“Mum”, andstringr::str_replace()
stringr::str_replace()
(pattern)
inside braces
stringr::str_replace()
str_replace()
only replaces the first match in a stringstr_replace_all()
replaces all matchesstr_detect()
\(\implies\) logical vectorstr_remove()
/ str_remove_all()
\(\implies\) remove matching patternsstr_extract()
\(\implies\) extract matching patternsstr_replace()
/ str_replace_all()
\(\implies\) modify a character vectorregex
based operationspaste()
is a very useful one
" "
paste0()
has the default separator as ""
glue
has revolutionised text manipulation
glue
character
tidyverse
syntax (e.g. rlang
)+61
instead of 0
M/F
or Male/Female
A common data type in statistics is a categorical variable (i.e. a factor
)
character
vector/column
character
vectoras.factor()
We can manually set these categories as levels
using factor()
level
forcats
forcats
is a part of the core tidyverse
factors
stringr
as.factor()
and factor(levels = ...)
are base
functionsforcats
functions start with fct_
or use _
as_factor()
parallels as.factor()
fct()
replicates factor()
with stricter error handlingfct_inorder()
sets categories in the order they appear
data.frame
then apply fct_inorder()
for nice structured plotsn
entriesstringr
fct_cross()