Chapter 4 Prepare Data
4.1 Get human equivalant gene names
This step uses the gProfiler API package (Kolberg and Raudvere 2021) to create an annotation list with human equivalant gene names instead of ensembl IDs. This is useful for visualiastion and interpreation of data, but all analysis still uses the ensembl ID as these remain unique to each row of the data.
<- annotate_gene_ensembl(cts_filtered,
gene_annot organism = "btaurus",
base_URL = "http://biit.cs.ut.ee/gprofiler_archive3/e100_eg47_p14")
## g:Profiler Version URL: http://biit.cs.ut.ee/gprofiler_archive3/e100_eg47_p14
## Using organism: btaurus
## [34mImportant - this produces duplicate gene names from annotation - however gene_ensembl remain unique.
## This is why it's important to use gene_ensembl for all tasks and annotate output at end.[39m
## [1] "duplicate genes in annotation: 1976"
4.2 Prepare summarized experiment object
After the data is imported, it must be prepared in a SummarizedExperiment
object. See https://doi.org/doi:10.18129/B9.bioc.SummarizedExperiment for more details. Basically it is an object class that stores all of the RNAseq data, annotations and column data alongside each other, and is subsettable.
<-
seq_data make_summarized_experiment_object(counts_data = cts_filtered,
gene_annotations = gene_annot,
colData = coldata)
References
Kolberg, Liis, and Uku Raudvere. 2021. Gprofiler2: Interface to the g:profiler Toolset. https://CRAN.R-project.org/package=gprofiler2.