RAdelaide 2024
July 11, 2024
limma
was first released in 2004 (Smyth 2004)
Image courtesy of National Human Genome Research Institute
Wang, M. (2021). Next-Generation Sequencing (NGS). In: Pan, S., Tang, J. (eds) Clinical Molecular Diagnostics. Springer, Singapore. https://doi.org/10.1007/978-981-16-1037-0_23
fastq
file
fq.gz
suffixR
{prefix}_R1.fq.gz
+ {prefix}_R2.fq.gz
@
+
@SRR001666.1 071112_SLXA-EAS1_s_7:5:1:817:345 length=72
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACCAAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9ICIIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
@SRR001666.2 071112_SLXA-EAS1_s_7:5:1:801:338 length=72
GTTCAGGGATACGACGTTTGTATTTTAAGAATCTGAAGCAGAAGTCGATGATAATACGCGTCGTTTTATCAT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII6IBIIIIIIIIIIIIIIIIIIIIIIIGII>IIIII-I)8I
FastQC
cutadapt
, AdapterRemoval
, trimmomatic
, TrimGalore
etcfastp
combines QC reports with read trimmingFastQC
, fastp
, cutadapt
all return reports after running
MultiQC
is an excellent standalone tool for combining all reportsngsReports
is the “go-to” Bioconductor package for this
STAR
, hisat2
or bowtie2
bam
files producedColumn | Field | Description |
---|---|---|
1 | QNAME |
The original FastQ header line |
2 | FLAG |
Information regarding pairing, primary alignment, duplicate status, unmapped etc |
3 | RNAME |
Reference sequence name (e.g. chr1) |
4 | POS |
Left-most co-ordinate in the alignment |
5 | MAPQ |
Mapping quality score |
6 | CIGAR |
Code summarising exact matches, insertions, deletions etc. |
7 | RNEXT |
Reference sequence the mate aligned to |
8 | PNEXT |
Left-most co-ordinate the mate aligned to |
9 | TLEN |
Read length |
10 | SEQ |
The original read sequence |
11 | QUAL |
The read quality scores |
NH:i:1
indicates this read aligned only onceAS:i:290
the actual alignment score produced by the alignerNM:i:2
two edits are required to perfectly match the referencesamtools
in bash
bam
files is Rsamtools
BamFile
or BamFileList
objectsScanBamParam()
GRanges
)DNAStringSet
sGRanges
objectsfeatureCounts
from the Subread
tool
RSEM
, htseq
R
:
Rsubread
or GenomicAlignments