Adversarial attack of sequence-free enhancer prediction identifies patterns of chromatin architecture

Jamil Gafur¹, Olivia W Lang², & William KM Lai^2,3

¹ Department of Computer Science, University of Iowa, Iowa City, Iowa 52242, USA

² Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14850, USA

³ Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA

Abstract

The wide range of cellular complexity created by multicellular organisms is due in large part to the intricate and synergistic interplay of regulatory complexes throughout the eukaryotic genome. These regulatory elements ‘enhance’ specific gene programs and have been shown to operate in diverse networks that are distinct across cell states of the same organism. Attempts to characterize and predict enhancers have typically focused on leveraging information-dense DNA sequence in parallel with epigenomic assays. We examined the viability of enhancer prediction using only a minimal set of epigenomic datasets without direct DNA information. We demonstrate that chromatin datasets are sufficient to identify enhancers genome-wide with high accuracy. By training networks leveraging data from multiple cell types simultaneously, we generated a cell-type invariant enhancer prediction platform that utilized only the patterns of protein binding for inference. We also showed the utility of swarm-based adversarial attacks (APSO) to deconvolute trained genomic neural networks for the first time. Critically, unlike saliency mapping or other game-theory based approaches, APSO is completely network-architecture independent and can be applied to any prediction engine to derive the features that drive inference.

Network Training and Analysis

This repo has been designed to completely re-generate the analysis and figures contained within the manuscript (DOI: XXXXXX). Random seeds have been set as neccessary to maximize reproducibility. Scripts should be executed in numerical order per sub-folder.

1. Preprocessing

Navigate to the Preprocessing folder and follow the instructions

2. Building the environment

conda create --prefix ~/work/ChromEnh python=3.9

conda activate ~/work/ChromEnh

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

Validate pytorch is able to see GPU:

python
import torch
torch.cuda.is_available()

pip install h5py tqdm matplotlib scikit-learn seaborn

The APSO code should be cloned from here:

git clone https://github.com/EpiGenomicsCode/Adversarial_Observation

And moved into the 'util' folder.

3. Download data

Shell scripts should be executed sequentially in:

00_preprocessing

4. Train networks

There are currently three different studies that can be performed:

Chromosome dropout traininig
Cell line independent training
Large model training

These networks can be executed asynchronously in:

01_network_training

5. Explainable AI analysis

Shell scripts should be executed sequentially in:

03_xai

6. Large network analysis

Shell scripts should be executed sequentially in:

04_largenetwork

FAQ

How do I add my own model?

Navigate to the util/Model folder and add your model as a new python file
Edit the loadModel function in the util.py file

Name		Name	Last commit message	Last commit date
Latest commit History 364 Commits
00_preprocessing		00_preprocessing
01_network_training		01_network_training
02_enhancer_validation		02_enhancer_validation
03_xai		03_xai
04_largenetwork		04_largenetwork
bin		bin
data		data
figures		figures
tables		tables
util		util
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
swarm_analysis.py		swarm_analysis.py
swarm_analysis_inverse.py		swarm_analysis_inverse.py
train_network.py		train_network.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adversarial attack of sequence-free enhancer prediction identifies patterns of chromatin architecture

Jamil Gafur¹, Olivia W Lang², & William KM Lai^2,3

Abstract

Network Training and Analysis

1. Preprocessing

2. Building the environment

3. Download data

4. Train networks

5. Explainable AI analysis

6. Large network analysis

FAQ

About

Uh oh!

Releases

Packages

Contributors 3

Languages

EpiGenomicsCode/ChromEnhancer

Folders and files

Latest commit

History

Repository files navigation

Adversarial attack of sequence-free enhancer prediction identifies patterns of chromatin architecture

Jamil Gafur1, Olivia W Lang2, & William KM Lai2,3

Abstract

Network Training and Analysis

1. Preprocessing

2. Building the environment

3. Download data

4. Train networks

5. Explainable AI analysis

6. Large network analysis

FAQ

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Jamil Gafur¹, Olivia W Lang², & William KM Lai^2,3

Packages