PrioritizingSusceptibilityGenes_GWAS

These java codes have been used to analyze CRC data-set described in the following manuscript: "A Network Approach to Prioritizing Susceptibility Genes for Genome-wide Association Studies"

If this file does not answer your questions, please feel free to e-mail any questions to Ting Hu or Somayeh Kafaie at:

ting.hu@mun.ca
somayeh.kafaie@mun.ca

DIRECTORIES:

1-Filtering:    	 
This program receives genome information of samples froma file (inputFile) in the 	
following format and returns the result in a file (outputFile) with a similar 		
format including only selected SNPs (instead of all of them).
    - inputFileFormat:
        It is a table where each row represents a sample and the first 6 columns are 	    
    personal information of that sample and the rest of the columns are SNPs, 		    
    where the column labels reflect the snp name (e.g. snp1) with the name of the 	    
    minor allele appended (i.e. snp1_2 in the first instance, as 2 is the minor 	    
    allele) for the additive component.  Assuming A is the minor allele, it will 		    
    recode genotypes as follows:
            SNP       SNP_A
            ---       -----
            A A   ->    0
            A T   ->    1
            T T   ->    2
            0 0   ->   NA
   This code is able to use (ReliefF + TURF) or (SURF + TURF) for filtering.

2-NetworkCreation:       
2-1-calculatingIG_pcount: 
   For all SNPs selected after filtering, we calculate the pairwise information 	   
   gain using the following java code. We also, generate 1000 permutations of the 	   
   dataset by shuffling phenotypes and assigning them to the samples and calculate 	  
   the pair-wise information gain for all SNPs of every permutated dataset as well.
   
   As implemented in “calculatingIG_pcount/MainNetPermutations.java”, to get the 	   
   result faster, we ran 100 different instances of the code, where each generates 	   
   the IG between all SNPs as well as the IG for 10 permutations and calculates P 	   
   (P here is a short number counting the number of permutations with value 		   
   greater than or equal to the value of the IG for our dataset). 
   
2-2-mergingPerms_netParameters:
   The result of the previous step is used to create the network for different IG 	   
   cut off values (from 0.02 to 0.008 decrementing by 0.001) and observe the 		   
   change of different features like the number of vertices, the number of edges, 	   
   the degree distribution and the size of the largest component.

3-NetworkAnalysis:   
3-1-General:
   This code reads IG values and p-counts for all pairs of SNPs and creates the
   network based on given thresholds (IG_th and p_th). It calculates the adjacency
       matrix and the graph and finds main network parameters.

3-2-calSignificance_Clustering_Assortativity:
   This program reads the list of edges of the network (for a given IG_th) and 		   
   creates its adjacency matrix and graph. Then, by swapping the edges 			   
   (n_swapped_edges*#edges) times, we create random networks and repeat this to 	   
   generate 1000 (i.e., n_rand_networks) random networks. By measuring clustering 	   
   coefficient and assortativity coefficient for the real network as well as all 	   
   random networks, we calculate p value and return it.

PACKAGES TO INSTALL:

The code has been written in java using Eclipse IDE.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
1-Filtering		1-Filtering
2-NetworkCreation		2-NetworkCreation
3-NetworkAnalysis		3-NetworkAnalysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PrioritizingSusceptibilityGenes_GWAS

About

Uh oh!

Releases

Packages

Languages

thmib/prioritizing-genes-GWAS

Folders and files

Latest commit

History

Repository files navigation

PrioritizingSusceptibilityGenes_GWAS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages