body(summary)
UseMethod("summary")
RAdelaide 2024
. . .
. . .
Tools > Install Packages...
install.packages("pkg_name")
Seurat
for scRNA. . .
. . .
ngsReports
nearly every day (still…)R
) in 2001. . .
R
generally has bi-annual releases (R 4.4.0 April 24th, 2024)
BiocManager()
is a CRAN packagegithub
::install(c("pkg1", "user/pkg2")) BiocManager
DESeq2
& edgeR
for bulk RNA-Seq AnalysisDiffBind
& extraChIPs
for ChIP-Seq Analysisfgsea
for GSEA within R
. . .
GenomicRanges
for working with GRanges
objectsbody(function_name)
always has comments removedR
directory
. . .
browseVignettes()
R has two common types of objects
S3
are very common & old (1970s)
lm()
or t.test()
S4
introduced in ’90s
S4
objects
. . .
summary()
vector
or data.frame
summary(letters)
summary(cars)
. . .
How does summary()
know what to do for different data structures?
summary()
it’s a bit oddbody(summary)
UseMethod("summary")
. . .
summary()
uses different methods depending on the object classmethods(summary)
[1] summary.aov summary.aovlist*
[3] summary.aspell* summary.check_packages_in_dir*
[5] summary.connection summary.data.frame
[7] summary.Date summary.default
[9] summary.ecdf* summary.factor
[11] summary.glm summary.infl*
[13] summary.lm summary.loess*
[15] summary.manova summary.matrix
[17] summary.mlm* summary.nls*
[19] summary.packageStatus* summary.POSIXct
[21] summary.POSIXlt summary.ppr*
[23] summary.prcomp* summary.princomp*
[25] summary.proc_time summary.rlang_error*
[27] summary.rlang_message* summary.rlang_trace*
[29] summary.rlang_warning* summary.rlang:::list_of_conditions*
[31] summary.srcfile summary.srcref
[33] summary.stepfun summary.stl*
[35] summary.table summary.tukeysmooth*
[37] summary.warnings
see '?methods' for accessing help and source code
The class is given after the dot Those marked with an asterisk are hidden
summary.data.frame()
summary()
is called on a data.frame
summary.lm()
lm
(produced by lm()
)summary.prcomp()
prcomp
(produced by prcomp()
). . .
summary.default()
body(summary.default)
summary(letters)
methods(class = "data.frame")
[1] [ [[ [[<- [<- $<- aggregate
[7] anyDuplicated anyNA as.data.frame as.list as.matrix as.vector
[13] by cbind coerce dim dimnames dimnames<-
[19] droplevels duplicated edit format formula head
[25] initialize is.na Math merge na.exclude na.omit
[31] Ops plot print prompt rbind row.names
[37] row.names<- rowsum show slotsFromS3 sort_by split
[43] split<- stack str subset summary Summary
[49] t tail transform type.convert unique unstack
[55] within xtfrm
see '?methods' for accessing help and source code
library(tidyverse)
methods(class = "data.frame")
. . .
print()
methodprint(my_tbl, n = 20)
print.tbl
(which is hidden)S3
objectsdata.frame
, list
, htest
, lm
etc). . .
is()
instead of class()
is(band_members)
[1] "tbl_df" "tbl" "data.frame" "list" "oldClass" "vector"
. . .
R
looks for print.tbl_df()
\(\rightarrow\) print.tbl()
\(\rightarrow\) print.data.frame()
etcprint.default()
Many Bioconductor Packages define S4
objects
@
symbol for “slots” as well as $
for list elements
tidyverse
tidyverse
by > 10 yearstidyomics
is an active area of Bioconductor development
. . .
S4
implementations of S3
objects
data.frame
(S3) Vs DataFrame
(S4)list
(S3) Vs List
(S4)vector
(S3) Vs Vector
(S4)rle
(S3) Vs Rle
(S4)DataFrame
and you have a data.frame
Many S4 objects & methods were developed in the days when compute resources were limited
library(S4Vectors)
<- c(rep("X", 10), rep("Y", 5))
test test
[1] "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "Y" "Y" "Y" "Y" "Y"
Rle(test)
character-Rle of length 15 with 2 runs
Lengths: 10 5
Values : "X" "Y"
data.frame
Objectsdata.frame
Objectsdata.frame
rownames
. . .
tibble
aka tbl_df
rownames
are always 1:nrow(df)
data.frame
typeDataFrame
objects ?DataFrame
S4
version
tidyverse
(dplyr
, ggplot2
etc)tidyomics
tibble
directlyas_tibble()
for DataFrame
objects
extraChIPs
S4
objects to ggplot()
DataFrame
objectsdplyr
will not work on DataFrame
objectstidyverse
)
subset()
pre-dates dplyr::filter()
rbind()
and combineRows()
\(\implies\) bind_rows()
cbind()
, combineCols()
and merge()
\(\implies\) joins
sort()
\(\implies\) arrange()
unique()
\(\implies\) distinct()
mutate()
, summarise()
, across()
, pivot_*()
DataFrame
objectstbl_df
objects)
CharacterList()
from IRanges
S4
lists can be typed \(\implies\) memory efficiencyList
objects can exist in a compressed form \(\implies\) memory efficiencyDataFrame
objects can have S4
objects as columns
S3
data frames (including tibbles) cannotBy typing a list we only need to record the type once, instead of once for each element. Can make a big difference with large objects
DataFrame
objectslibrary(IRanges)
<- c("A", "B")
genes <- CharacterList(
transcripts c("A1", "A2", "A3"), c("B1", "B2")
) transcripts
CharacterList of length 2
[[1]] A1 A2 A3
[[2]] B1 B2
. . .
<- DataFrame(Gene = genes, Transcripts = transcripts)
DF DF
DataFrame with 2 rows and 2 columns
Gene Transcripts
<character> <CharacterList>
1 A A1,A2,A3
2 B B1,B2
. . .
library(extraChIPs)
as_tibble(DF)
# A tibble: 2 × 2
Gene Transcripts
<chr> <list>
1 A <chr [3]>
2 B <chr [2]>
DataFrame
objectsDataFrame
objectslist
metadata(DF) <- list(details = "Created for RAdelaide 2024")
glimpse(DF)
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
..@ rownames : NULL
..@ nrows : int 2
..@ elementType : chr "ANY"
..@ elementMetadata: NULL
..@ metadata :List of 1
.. ..$ details: chr "Created for RAdelaide 2024"
..@ listData :List of 2
.. ..$ Gene : chr [1:2] "A" "B"
.. ..$ Transcripts:Formal class 'CompressedCharacterList' [package "IRanges"] with 5 slots
. . .
Point out the @
structure
DataFrame
objectsmcols()
mcols(DF) <- DataFrame(meta = c("Made-up genes", "Made-up transcripts"))
mcols(DF)
DataFrame with 2 rows and 1 column
meta
<character>
Gene Made-up genes
Transcripts Made-up transcripts
. . .
glimpse(DF) # This is in the @elementMetadata slot
Formal class 'DFrame' [package "S4Vectors"] with 6 slots
..@ rownames : NULL
..@ nrows : int 2
..@ elementType : chr "ANY"
..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
..@ metadata :List of 1
.. ..$ details: chr "Created for RAdelaide 2024"
..@ listData :List of 2
.. ..$ Gene : chr [1:2] "A" "B"
.. ..$ Transcripts:Formal class 'CompressedCharacterList' [package "IRanges"] with 5 slots
S4
Object StructureS4
objects have slots denoted with @
S4
class
NULL
) objectsS3
or S4
objectsS3 objects are easy to break. Just change the class attribute…
S4
Object Structurelapply
our way through these objectsobject@slotName
slot(object, "slotName")
@listData DF
$Gene
[1] "A" "B"
$Transcripts
CharacterList of length 2
[[1]] A1 A2 A3
[[2]] B1 B2
slot(DF, "listData")
$Gene
[1] "A" "B"
$Transcripts
CharacterList of length 2
[[1]] A1 A2 A3
[[2]] B1 B2
S4
Object StructureslotNames(object)
slotNames(DF)
[1] "rownames" "nrows" "elementType" "elementMetadata" "metadata"
[6] "listData"
. . .
getSlots("DFrame")
rownames nrows elementType elementMetadata metadata
"character_OR_NULL" "integer" "character" "DataFrame_OR_NULL" "list"
listData
"list"
S4
MethodsS3
method dispatch uses the method.class
syntaxS4
is very different but has some similarities. . .
S4
objects almost always have hierarchical classes
S3
objects. . .
Generic
function must be defined for each method/classS4
Methodsis()
is(DF)
[1] "DFrame" "DataFrame" "SimpleList" "RectangularData"
[5] "List" "DataFrame_OR_NULL" "Vector" "list_OR_List"
[9] "Annotated" "vector_OR_Vector"
. . .
is(DF, "DataFrame")
[1] TRUE
is(DF, "data.frame")
[1] FALSE
. . .
methods(class = "DataFrame")
S4
Methodsbody()
will return standardGeneric()
UseMethod()
. . .
getMethod(f = "nrow", signature = "DataFrame")
Method Definition:
function (x)
x@nrows
<bytecode: 0x56387f2964f0>
<environment: namespace:S4Vectors>
Signatures:
x
target "DataFrame"
defined "DataFrame"
R
S4
object classes are common
CRAN
packages (spatial/GIS). . .
tidyverse