Skip to content

ClusterGeneSets: User supplied clustering function

Ewoud Ewing edited this page May 5, 2020 · 1 revision

When running the ClusterGeneSets function, there are 2 types of clustering provided, kmeans and hierarchical clustering. When running these clustering there is the option to run this group by group if you don't want different groups to be mixed. For most applications these options will provide enough clustering methods to group the gene sets together. In the case that the user wants another clustering option ClusterGeneSets does provides an opportunity to use a different clustering function

In the code there is the following option:

>     canonical.df$cluster <- user_function(Object@Data.RR)

It takes the calculated distance (standard is RR) and it uses the distance between gene sets to cluster. What the function allows to do is use a function defined by the user and the only thing the output needs to have is for every Pathway a number that corrosponds to the cluster it belongs to.

Examples

If you want to run a clustering method where the output is 1:29, this needs to be cut so that it provides the output 1 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 5 5 5 5 5 6 6 6 6 6 6

user.cluster <- function(data)
{
 x <- hclust(dist(t(data)), method = "ward.D2")
 x <- cutree(x, k = 5)
 return(x)
 }

Then to run the ClusterGeneSets function:

Object <- ClusterGeneSets(Object, 
                          clusters = 5, 
                          method = "User_supplied", 
                          order = "group",
                          molecular.signature = "All", 
                          user_function = user.cluster )
Clone this wiki locally