Skip to content

Part 1A: Loading data

Ewoud Ewing edited this page Oct 29, 2020 · 5 revisions

Loading the data

We start with creating the PathwayObject. This can be done using the automatic loader (LoadGeneSets) or with ObjectCreator. ObjectCreator needs just a vector of Gene-Sets labels, a vector with genes (as a string for each Gene-Set), a vector with each group every Gene-Set belongs to (e.g. KO or WT), it needs the Source (e.g. IPA), it needs the Type (e.g. Canonical pathways), structure (e.g. SYMBOL), and seperator (sep) (How is are the strings of the genes seperated). Gene-Sets labels, Genes and Groups need the same length. Everything else is just a single string. See example below in 1.7.

The LoadGeneSets function has all this automated, currently the LoadGeneSets function supports loading GREAT TSV files (As exported as "all data as TSV"). As well as IPA canonical pathways (exported as excel) or Functional annotations (exported as excel).

Exporting the data

Gene sets from GREAT and IPA can be automatically loaded. All it needs are standard exports in either excel form or TSV form.

Exporting GREAT

Exporting GREAT

Exporting IPA

Loading IPA

The remaining info includes: Pvalue cutoff, the minimum number of molecules per Gene-Set (Recommended >= 5), The source, if great was run with a background, the type for GREAT doenst matter as it runs 18 diferent types that the loader automatically assigns. But for IPA its important. The great output is really really large. Recommended is that the topranks be given. Meaning you only look at the top Gene-Sets. The structure structure (e.g. SYMBOL, this is important as the combinePathways check that all the data is in the same structure). As well as the organims (so it can correctly convert the structure, the name has to be of the name of the package installed of which to call. Currently supported is the org.Mm.eg.db and org.Hs.eg.db packages). Last but not least is the seperator. How are the genes combined in a string.

When Loading the data of different sources, the pipeline allows you to merge objects from different loadings into a new loading. see MergeObjects

This is followed by harmonizing and clustering: Harmonize and cluster

Example Code


Great.files <- c(system.file("extdata", "MM10.GREAT.KO.uGvsMac.bed.tsv", package = "GeneSetCluster"),
                 system.file("extdata", "MM10.GREAT.KO.uGvsMac.bed_BCKGRND.tsv", package = "GeneSetCluster"),
                 system.file("extdata", "MM10.GREAT.WT.uGvsMac.bed.tsv", package = "GeneSetCluster"),
                 system.file("extdata", "MM10.GREAT.WT.uGvsMac.bed_BCKGRND.tsv", package = "GeneSetCluster"))
Great.files.bckgrnd <- Great.files[grepl("BCKGRND", Great.files)]


Great.bckgnrd.Object1 <- LoadGeneSets(file_location = Great.files.bckgrnd, 
                              groupnames= c("KO", "WT"),
                              P.cutoff = 0.05, 
                              Mol.cutoff = 5,
                              Source = "Great",
                              Great.Background = T,#specify the background, as great has a different output if run with or without background
                              type = "Canonical_Pathways",
                              topranks = 20,#Great gives soo much output, recommended is adding a topranks filter for first 20
                              structure = "SYMBOL",
                              Organism = "org.Mm.eg.db",
                              seperator = ",")

Links

LoadGeneSets

MergeObjects

Clone this wiki locally