Skip to content

Postprocessing: Gene Ranking

Jordi Abante edited this page Jun 13, 2018 · 7 revisions

The user can use a provided utility to rank all Human genes in the Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using the Jensen-Shannon distance (JSD) based on the method described in [2]. This utility must be run within an R session.

Usage in R (when replicate reference data is available):

    setwd("path/to/informME/src/R_src/")
    source("jsGrank.R")
    rankGenes(refVrefFiles,testVrefFiles,inFolder,outFolder,tName,rName)

where

  • refVrefFiles is a vector of BIGWIG files that contain the JSD values of a test/reference comparison

  • testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons

  • inFolder is the directory that contains the JSD files

  • outFolder is the directory used to write the result in an .xlsx file

  • tName is a string providing a name for the test phenotype

  • rName is a string providing a name for the reference phenotype

In this case, the function generates the file gRank-JSD-tName-VS-rName.xlsx.

Usage in R (when no replicate reference data is available):

    setwd("path/to/informME/src/R_src/")
    source("jsGrank.R")
    rankGenes(c(),testVrefFiles,inFolder,outFolder,tName,rName)

where

  • testVrefFiles is a vector of BIGWIG files that contain the JSD values of available test/reference comparisons

  • inFolder is the directory that contains the JSD files outFolder is the directory used to write the result in an .xlsx file

  • tName is a string providing a name for the test phenotype

  • rName is a string providing a name for the reference phenotype

In this case, the function generates the file gRankRRD-JSD-tName-VS-rName.xlsx.

NOTE 1: For this utility, the following tools must be installed in R: GenomicFeatures, GenomicRanges, Homo.sapiens, rtracklayer, TxDb.Hsapiens.UCSC.hg19.knownGene, XLConnect

NOTE 2: More information about this utility can be found in informME/src/R\_src/postprocess/README.txt, with a relevant excerpt reproduced below for convenience:

jsGrank.R
---------

This is an R script that ranks all Human genes in the 
Bioconductor library TxDb.Hsapiens.UCSC.hg19.knownGene using 
the Jensen-Shannon distance (JSD) based on the method described 
in [1]. It should be run within an R session.

  default usage (replicate reference data is available):

   source("jsGrank.R")
   rankGenes(refVrefFiles,testVrefFiles,inFolder,outFolder,
             tName,rName)

   # refVrefFiles is a vector of BIGWIG files that contain the
   # JSD values of a test/reference comparison. 
   # For example: if
   #
   # JSD-lungnormal-1-VS-lungnormal-2.bw 
   # JSD-lungcancer-3-VS-lungnormal-1.bw 
   # JSD-lungnormal-3-VS-lungnormal-2.bw
   # 
   # are available, then set 
   # 
   # textVrefFiles <- c("JSD-lungnormal-1-VS-lungnormal-2.bw",
   #                    "JSD-lungnormal-3-VS-lungnormal-1.bw",
   #                    "JSD-lungnormal-3-VS-lungnormal-2.bw")
   #
   # testVrefFiles is a vector of BIGWIG files that contain the  
   # JSD values of available test/reference comparisons. 
   # For example: if 
   #
   # JSD-lungcancer-1-VS-lungnormal-1.bw  
   # JSD-lungcancer-2-VS-lungnormal-2.bw 
   # JSD-lungcancer-3-VS-lungnormal-3.bw 
   # 
   # are available, then set 
   # 
   # textVrefFiles <- c("JSD-lungcancer-1-VS-lungnormal-1.bw",
   #                    "JSD-lungcancer-2-VS-lungnormal-2.bw",
   #                    "JSD-lungcancer-3-VS-lungnormal-3.bw")
   #
   # inFolder is the directory that contains the JSD files
   # outFolder is the directory used to write the result  
   # (a .xlsx file).
   # 
   # For example:
   # 
   # inFolder  <- "/path/to/in-folder/"
   # outFolder <- "/path/to/out-folder/"
   #
   # tName and rName are strings providing names for the 
   # test and reference phenotypes.
   #
   # For example: 
   #
   # tName <- "lungcancer"
   # rName <- "lungnormal"

  default usage (no replicate reference data is available):  

   source("jsGrank.R")
   rankGenes(c(),testVrefFiles,inFolder,outFolder,
             tName,rName)

   # testVrefFiles is a vector of BIGWIG files that contain the  
   # JSD values of available test/reference comparisons. 
   # For example: if 
   #
   # JSD-lungcancer-1-VS-lungnormal-1.bw 
   # JSD-lungcancer-2-VS-lungnormal-2.bw 
   # JSD-lungcancer-3-VS-lungnormal-3.bw 
   # 
   # are available, then set 
   # 
   # textVrefFiles <- c("JSD-lungcancer-1-VS-lungnormal-1.bw",
   #                    "JSD-lungcancer-2-VS-lungnormal-2.bw",
   #                    "JSD-lungcancer-3-VS-lungnormal-3.bw")
   #
   # inFolder is the directory that contains the JSD files
   # outFolder is the directory used to write the result 
   # (a .xlsx file).
   # 
   # For example:
   # 
   # inFolder  <- "/path/to/in-folder/"
   # outFolder <- "/path/to/out-folder/"
   #
   # tName and rName are strings providing names for the 
   # test and reference phenotypes.
   #
   # For example: 
   #
   # tName <- "lungcancer"
   # rName <- "lungnormal"
   
  requirements:

   The following R libraries must be installed:
   - GenomicFeatures
   - GenomicRanges
   - Homo.sapiens
   - rtracklayer
   - TxDb.Hsapiens.UCSC.hg19.knownGene
   - XLConnect


REFERENCES
----------

[1] Jenkninson, G., Abante, J., Feinberg, A.P., and 
    Goutsias, J. (2018) An information-theoretic approach 
    to the modeling and analysis of whole-genome bisulfite 
    sequencing data, BMC Bioinformatics, 19:87, 
    https://doi.org/10.1186/s12859-018-2086-5.
Clone this wiki locally