Skip to content

Part 1B Creating an Object

Ewoud Ewing edited this page Oct 29, 2020 · 7 revisions

Creating object with custom table

When data wasnt run in GREAT or IPA the aproach can still be used. In fact it doesnt neccesary need to be proper pathways. The thing the code needs is a group, a name, and name content. Plus some meta info. We can also simulate a dataset.

Loading IPA data

As an example we show that with simple information present the PathwayObjectcan be created


require(GeneSetCluster)

IPA.files <- c(system.file("extdata", "MM10.IPA.KO.uGvsMac.Canonical_pathways.xls", package = "GeneSetCluster"),
               system.file("extdata", "MM10.IPA.WT.uGvsMac.Canonical_pathways.xls", package = "GeneSetCluster"),
               system.file("extdata", "MM10.IPA.KO.uGvsMac.Functional_annotations.xls", package = "GeneSetCluster"),
               system.file("extdata", "MM10.IPA.WT.uGvsMac.Functional_annotations.xls", package = "GeneSetCluster"))
canonical.files <- IPA.files[grep("Canonical", IPA.files)]

##################
#Loading the data as a table
MM10.IPA.KO.uGvsMac.Canonical <- read_excel(path = system.file("extdata", "MM10.IPA.KO.uGvsMac.Canonical_pathways.xls", package = "GeneSetCluster"),
                                            skip=1, sheet = 1)
MM10.IPA.WT.uGvsMac.Canonical <- read_excel(path = system.file("extdata", "MM10.IPA.WT.uGvsMac.Canonical_pathways.xls", package = "GeneSetCluster"),
                                            skip=1, sheet = 1)

MM10.IPA.KO.uGvsMac.Canonical <- as.data.frame(MM10.IPA.KO.uGvsMac.Canonical)
MM10.IPA.WT.uGvsMac.Canonical <- as.data.frame(MM10.IPA.WT.uGvsMac.Canonical)

head(MM10.IPA.KO.uGvsMac.Canonical)

head(MM10.IPA.KO.uGvsMac.Canonical) Making R objects

IPA exports a lot of data, but we are only interested in Gene-Sets with a pvalue < 0.05 (aka -log10(pvalue) > 1.31) and more than 5 molecules. When running ObjectCreator, the user needs to do the filtering of the relevant pathways.

#Calculating the number of molecules:
#we can see that the string is comma seperated for these molecules:
MM10.IPA.KO.uGvsMac.Canonical$MoleculesCount <- NA
for(can.i in 1:nrow(MM10.IPA.KO.uGvsMac.Canonical))
{
  mol.i <- as.vector(strsplit2(as.character(MM10.IPA.KO.uGvsMac.Canonical[can.i,"Molecules"]), split=","))
  MM10.IPA.KO.uGvsMac.Canonical[can.i,"MoleculesCount"]<- length(mol.i)
}

head(MM10.IPA.KO.uGvsMac.Canonical)

head(MM10.IPA.WT.uGvsMac.Canonical)

MM10.IPA.KO.uGvsMac.Canonical.filtered <- MM10.IPA.KO.uGvsMac.Canonical[MM10.IPA.KO.uGvsMac.Canonical$`-log(p-value)` > 1.31 & 
                                                                          MM10.IPA.KO.uGvsMac.Canonical$MoleculesCount > 5,]

nrow(MM10.IPA.KO.uGvsMac.Canonical.filtered)
#We can see that we have 53 Gene-Sets which are significant according to our definition.

#Repeat for WT 
MM10.IPA.WT.uGvsMac.Canonical$MoleculesCount <- NA
for(can.i in 1:nrow(MM10.IPA.WT.uGvsMac.Canonical))
{
  mol.i <- as.vector(strsplit2(as.character(MM10.IPA.WT.uGvsMac.Canonical[can.i,"Molecules"]), split=","))
  MM10.IPA.WT.uGvsMac.Canonical[can.i,"MoleculesCount"]<- length(mol.i)
}

MM10.IPA.WT.uGvsMac.Canonical.filtered <- MM10.IPA.WT.uGvsMac.Canonical[MM10.IPA.WT.uGvsMac.Canonical$`-log(p-value)` > 1.31 & 
                                                                          MM10.IPA.WT.uGvsMac.Canonical$MoleculesCount > 5,]


nrow(MM10.IPA.KO.uGvsMac.Canonical.filtered)
nrow(MM10.IPA.WT.uGvsMac.Canonical.filtered)

nrow filtered

We can see that we have 53 and 281 Gene-Sets respectivly which are significant according to our definition.

Creating Combine

Now we combine

  • Pathways are just concatenated
  • Molecules (aka the genes) are just concatenated
  • groups is a string that is the length of the combined pathways with the repeating info.
  • Source is how the data was generated (for meta data reasons, not nessecary to add)
  • Type is what kind of data is it (for meta data reasons, not nessecary to add)
  • Structure is how the genes are presented, only important if you want to combine gene sets, the genes have to match, so the program wants to know its speaking the same language
  • organism, same as the structure, only important for combining gene sets (optional)
  • sep, how the genes in the molecules group are seperated. Important for readign the individual genes.
IPA.KOvsWT.PathwayObject <- ObjectCreator(Pathways = c(MM10.IPA.KO.uGvsMac.Canonical.filtered$`Ingenuity Canonical Pathways`,
                                                       MM10.IPA.WT.uGvsMac.Canonical.filtered$`Ingenuity Canonical Pathways`), 
                                          Molecules = c(MM10.IPA.KO.uGvsMac.Canonical.filtered$Molecules,
                                                       MM10.IPA.WT.uGvsMac.Canonical.filtered$Molecules),
                                          Groups = c(rep("KO", times = nrow(MM10.IPA.KO.uGvsMac.Canonical.filtered)),
                                                     rep("WT", times = nrow(MM10.IPA.WT.uGvsMac.Canonical.filtered))),
                                          Source = "IPA",
                                          Type = "Canonical_Pathways",#Optional
                                          structure = "SYMBOL",
                                          organism ="org.Mm.eg.db",
                                          sep = ",")

Links

https://github.com/TranslationalBioinformaticsUnit/GeneSetCluster/wiki/ObjectCreator

Clone this wiki locally