This directory contains the code and resources of the following paper:
"Disease Gene Prediction with Privileged Information and Heteroscedastic Dropout". Under review.
- LUPI_RGCN is a relational GNN-based classification algorithm for graphs. It takes disease gene network as inputs and predict the possible associations.
- Our experimental results are based on gene disease network data developed in [1]. The data can be obtained here data
We introduce LUPI_RGCN algorithm to address the gene disease prioritization problem. To achieve this goal, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. The model is trained under a VAE framework with reparameterization trick.
- [figures] contains the figures that are used in the paper.
- [LUPI_RGCN] contains implementation of our model for the disease gene network.
There are three datasets in the link.
- GeneGene_Hs: The HumanNet gene interaction network of size 12331 x 12331.
- GenePhene: a cell array containing Gene-Phenotype networks of 9 species.
- GP_SPECIES: The names of the species corresponding to the networks in 'GenePhene' variable.
- geneIds: The entrezdb ids of genes, corresponding to the rows of the matrix 'GeneGene_Hs' (or 'GenePhene' matrices).
- pheneIds: a cell array containing OMIM ids for phenotypes of 9 species.
- PhenotypeSimilaritiesLog: Similarity network between OMIM diseases.
- Microarray expression data - vector of real-valued features for a gene per row (Refer paper for details).
- OMIM word-count data - term-document matrix for OMIM diseases (Refer paper for details).
- To run the code, you need python (3.7 I used) installed and other packages, such as pytorch(1.5.0), pytorch-geometric(1.6.1), numpy, pandas, matplotlib.
- The data is .mat form and in order to run it in python, you need to process the data (processing.py) and then run python LUPI_RGCN.py.
[1] Li,Y. et al. (2019) Pgcn: Disease gene prioritization by disease and gene embedding through graph convolutional neural networks. bioRxiv.