Skip to content

Part 2B User Supplied Distance

EwoudEwing edited this page May 30, 2019 · 4 revisions

Distance calculations using user supplied combinations functions

Calculating the distance between Gene-Sets is by default calculating the RR. However in certain cases the user wants to use a different function to combine the data. The function allows to use a user supplied functions. e.g. the Jaccard similarity index can be used, this calculated the overlap between Gene-sets. Note: Jaccard is quite suitable for larger Gene-Sets but biased in smaller Gene-Sets.

The pipeline creates a matrix with Gene-Sets as rows and Genes as Columns. So any function supplied needs to accommodate the 2 different Gene Sets you want to calculate the distance in between needs to have an A and an B (or X and Y) that takes a vector with

GeneSet_1 = c(Gene_1 = 0,Gene_2 = 0,Gene_3 = 1,Gene_4 = 0,Gene_5 = 0,Gene_6 = 1,Gene_7 = 1,Gene_8 = 0)

GeneSet_2 = c(Gene_1 = 1,Gene_2 = 1,Gene_3 = 1,Gene_4 = 0,Gene_5 = 0,Gene_6 = 0,Gene_7 = 0,Gene_8 = 0)

Where in this case there 1 gene shared (Gene_3).


jaccard <- function(A,B)
{
  #The Jaccard similarity index compares members 
  #for two sets to see which members are shared and which are distinct. 
  #It's a measure of similarity for the two sets of data, with a range from 0% to 100%. 
  #The higher the percentage, the more similar the two populations.
  
  M <- sum(as.vector(A) == 1 & as.vector(B) == 1)
  A.c <- sum(as.vector(A) == 1 & as.vector(B) == 0)
  B.c <- sum(as.vector(A) == 0 & as.vector(B) == 1)
  J <- M/(A.c+B.c+M)
  return(J)
}


IPA.Object.J <- CombineGeneSets(Object = IPA.object1, combineMethod = "Jaccard", combineMethod.supplied = jaccard)
IPA.Object.J <- ClusterGeneSets(Object = IPA.Object.J, 
                                                clusters = 4, 
                                                method = "kmeans", 
                                                order = "group")

PlotGeneSets(Object = IPA.Object.J, fontsize =5,
            legend = T,
            annotation.mol=F,
            main="Jaccard distance", RR.max = 50)

Jaccard

Clone this wiki locally