Skip to content

lntran26/ConfuseNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interpreting supervised machine learning inferences in population genomics using haplotype matrix permutations

Supervised machine learning methods, such as convolutional neural networks (CNNs), that use haplotype matrices as input data have become powerful tools for population genomics inference. However, these methods often lack interpretability, making it difficult to understand which population genetic features drive their predictions—a critical limitation for method development and biological interpretation. Here we introduce a systematic permutation approach that progressively disrupts population genetics features within input test haplotype matrices, including linkage disequilibrium, haplotype structure, and allele frequencies. By measuring performance degradation after each permutation, the importance of each feature can be assessed. We applied our approach to three published CNNs for positive selection and demographic history inference.

In this repository, we use the term "ConfuseNN" to refer to our permutation approach, since we are attempting to "confuse" the networks by testing them on disrupted data.

Preprint

bioRxiv

Reproduce

To reproduce the result for each of the three CNNs evaluated in this work, refer to the three respective subdirs, each with their own specifications.

Example code for haplotype matrix permutations

For ease of adoptability, we have provided the code to perform all permutations described in our paper in minimal_example.ipynb, with visualization. How these permutations are applied in practice likely varies depending on the simulation and training procedure. For examples of how we customized our permutation approach to each CNN, refer to corresponding subdir with further descriptions.

Extra: Conference presentations of this work

About

Haplotype matrix permutation for interpreting pop gen ML models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published