library(tidyverse)
RAdelaide 2024
July 9, 2024
text.R
character
vectorsRun 1
in one file and Run_001
in anotherregexp
) are incredibly powerful tools in this spaceregexp
syntax is not unique to R
R
does have a few unique “quirks” thoughstringr
contains functions for text manipulationstr_detect()
str_remove()
str_extract()
str_replace()
grepl()
, grep()
, gsub()
etc. from base
stringr::str_detect()
str_detect()
returns a logical vector
stringr::str_detect()
.
as a wild card
*
has different meaning to many other contexts.
obviously needed to follow M
in this searchstringr::str_detect()
[]
o
or u
needed to follow the M
stringr::str_detect()
^
)$
)stringr::str_view()
str_view()
stringr::str_extract()
str_extract()
to extract patternsstringr::str_extract()
stringr::str_extract()
+
stringr::str_extract()
[:alpha:]
?base::regex
stringr::str_extract_all()
str_extract()
will only return the first matchx
stringr::str_remove()
str_remove_all()
will remove all occurencesVery useful for removing file suffixes etc
stringr::str_replace()
str_replace()
is used for extracting/modifying text strings
str_extract()
string
“Hi Mum” for the pattern
“Mum”, andstringr::str_replace()
stringr::str_replace()
(pattern)
stringr::str_replace()
*
+
except the match is zero or more timesstringr::str_replace()
str_replace()
only replaces the first match in a stringstr_replace_all()
replaces all matchespaste()
is a very useful one
" "
paste0()
has the default separator as ""
glue
has revolutionised text manipulation
glue
character
tidyverse
syntax (e.g. rlang
)A common data type in statistics is a categorical variable (i.e. a factor
)
character
vector/column
character
vectoras.factor()
We can manually set these categories as levels
using factor()
level
forcats
forcats
is a part of the core tidyverse
factors
stringr
fct_inorder()
sets categories in the order they appear
data.frame
then apply fct_inorder()
for nice structured plotsn
entries