-
Notifications
You must be signed in to change notification settings - Fork 4
Part 1A: Loading data
We start with creating the PathwayObject. This can be done using the automatic loader (LoadGeneSets) or with ObjectCreator. ObjectCreator needs just a vector of Gene-Sets labels, a vector with genes (as a string for each Gene-Set), a vector with each group every Gene-Set belongs to (e.g. KO or WT), it needs the Source (e.g. IPA), it needs the Type (e.g. Canonical pathways), structure (e.g. SYMBOL), and seperator (sep) (How is are the strings of the genes seperated). Gene-Sets labels, Genes and Groups need the same length. Everything else is just a single string. See example below in 1.7.
The LoadGeneSets function has all this automated, currently the LoadGeneSets function supports loading GREAT TSV files (As exported as "all data as TSV"). As well as IPA canonical pathways (exported as excel) or Functional annotations (exported as excel).
Gene sets from GREAT and IPA can be automatically loaded. All it needs are standard exports in either excel form or TSV form.
The remaining info includes: Pvalue cutoff, the minimum number of molecules per Gene-Set (Recommended >= 5), The source, if great was run with a background, the type for GREAT doenst matter as it runs 18 diferent types that the loader automatically assigns. But for IPA its important. The great output is really really large. Recommended is that the topranks be given. Meaning you only look at the top Gene-Sets. The structure structure (e.g. SYMBOL, this is important as the combinePathways check that all the data is in the same structure). As well as the organims (so it can correctly convert the structure, the name has to be of the name of the package installed of which to call. Currently supported is the org.Mm.eg.db and org.Hs.eg.db packages). Last but not least is the seperator. How are the genes combined in a string.
When Loading the data of different sources, the pipeline allows you to merge objects from different loadings into a new loading. see MergeObjects
This is followed by harmonizing and clustering: Harmonize and cluster
Great.files <- c(system.file("extdata", "MM10.GREAT.KO.uGvsMac.bed.tsv", package = "GeneSetCluster"),
system.file("extdata", "MM10.GREAT.KO.uGvsMac.bed_BCKGRND.tsv", package = "GeneSetCluster"),
system.file("extdata", "MM10.GREAT.WT.uGvsMac.bed.tsv", package = "GeneSetCluster"),
system.file("extdata", "MM10.GREAT.WT.uGvsMac.bed_BCKGRND.tsv", package = "GeneSetCluster"))
Great.files.bckgrnd <- Great.files[grepl("BCKGRND", Great.files)]
Great.bckgnrd.Object1 <- LoadGeneSets(file_location = Great.files.bckgrnd,
groupnames= c("KO", "WT"),
P.cutoff = 0.05,
Mol.cutoff = 5,
Source = "Great",
Great.Background = T,#specify the background, as great has a different output if run with or without background
type = "Canonical_Pathways",
topranks = 20,#Great gives soo much output, recommended is adding a topranks filter for first 20
structure = "SYMBOL",
Organism = "org.Mm.eg.db",
seperator = ",")
Example Script: Example
Step 1A: Loading the data
Step 1B: Creating an Object
Step 2: Combine and Cluster
Step 2B: User supplied distance function
Step 2C: Highlighting-Genes
Step 3: Exporting Data
Step 4: Functional Investigation
Video: Step-by-step user guide