Chapter 4 Prepare Data

4.1 Get human equivalant gene names

This step uses the gProfiler API package (Kolberg and Raudvere 2021) to create an annotation list with human equivalant gene names instead of ensembl IDs. This is useful for visualiastion and interpreation of data, but all analysis still uses the ensembl ID as these remain unique to each row of the data.


gene_annot <- annotate_gene_ensembl(cts_filtered, 
                                    organism = "btaurus",
                                    base_URL = "http://biit.cs.ut.ee/gprofiler_archive3/e100_eg47_p14")
## g:Profiler Version URL:  http://biit.cs.ut.ee/gprofiler_archive3/e100_eg47_p14
## Using organism: btaurus
## [34mImportant - this produces duplicate gene names from annotation - however gene_ensembl remain unique.
##       This is why it's important to use gene_ensembl for all tasks and annotate output at end.[39m
## [1] "duplicate genes in annotation:  1976"

4.2 Prepare summarized experiment object

After the data is imported, it must be prepared in a SummarizedExperiment object. See https://doi.org/doi:10.18129/B9.bioc.SummarizedExperiment for more details. Basically it is an object class that stores all of the RNAseq data, annotations and column data alongside each other, and is subsettable.

seq_data <-
  make_summarized_experiment_object(counts_data = cts_filtered,
                                    gene_annotations = gene_annot,
                                    colData = coldata)

References

Kolberg, Liis, and Uku Raudvere. 2021. Gprofiler2: Interface to the g:profiler Toolset. https://CRAN.R-project.org/package=gprofiler2.