- Download using git clone https://github.com/uhlerlab/APOLLO.git
- In a new conda environment, install the required packages: conda create --name --file requirement.txt
This repository contains notebooks demonstrating the application of APOLLO to four different applications, including two applications to paired sequencing-based modalities and two applications to multiplexed imaging data. Each notebook contains detailed instructions for applying to user-provided data, including how to preprocess data for training. For paired sequencing-based modalities, please refer to the section "Application to paired scRNA-seq and scATAC-seq". For multiplexed imaging data, please refer to the section "Application to paired chromatin and protein images".
- Preprocessing scATAC-seq data: atac_rna/preprocess_shareseq.ipynb
- Normalization of scRNA-seq and scATAC-seq data for training input: atac_rna/train_lord_randNoise_sharedRecon_shareseq_filter_bce_morefilter.ipynb
- Preprocessing imaging data: chromark/preprocess.ipynb. This requires nuclear segmentation masks.
Notebooks for step 1 and 2 training can be found in the "APOLLO training" sections for both the SHARE-seq application and the multiplexed imaging application.
train_lord_randNoise_sharedRecon_shareseq_filter_bce_morefilter.ipynb
train_lord_randNoise_sharedRecon_shareseq_morefilter_reverse_bce.ipynb
train_clf_lord_shareseq_celltype_bce_morefilter.ipynb
Genes or gene ontology terms with significant changes along each principal component of the latent spaces
plot_lord_bce_pca_sampling.ipynb - identify differentially expressed genes or peaks along each principal component of the shared or modality-specific latent spaces plot_lord_bce_pca_sampling_withAnnotations_curve.ipynb - plot differentially expressed genes or peaks along each principal component of the shared or modality-specific latent spaces plot_lord_bce_pca_sampling_withAnnotations.ipynb - plot the enriched gene ontology terms of the genes or peaks represented by the shared or modality-specific latent spaces
preprocess_shareseq.ipynb
train_cnnvae_splitChannels_conditional_lord_randNoise_bce.ipynb - BCE loss used for reconstruction
train_cnnvae_splitChannels_conditional_lord_randNoise.ipynb - MSE loss used for reconstruction
train_cnnvae_splitChannels_conditional_lord_randNoise_reverse_bce.ipynb - inference step for the model trained with BCE loss
train_cnnvae_splitChannels_conditional_lord_randNoise_reverse.ipynb - inference step for the model trained with MSE loss
train_cnnvae_splitChannels_conditional_lord_randNoise_fullyJoint.ipynb - step 1, latent optimization
train_cnnvae_splitChannels_conditional_lord_randNoise_reverse_fullyJoint.ipynb - step 2, inference
train_cnnvae_splitChannels_conditional_lord_randNoise_correctBCE_noSharedRecon.ipynb - step 1, latent optimization (without decoders mapping from the shared latent space to reconstruction) train_cnnvae_splitChannels_conditional_lord_randNoise_reverse_correctBCEvalLoss_noSharedRecon.ipynb - step 2, inference
train_cnnvae_splitChannels_conditional.ipynb
Phenotype classification using real images, reconstructed images from full latent space, reconstructed images from shared latent space, or protein images predicted from chromatin
plot_Clf_conditions_sampling.ipynb - plot results
train_clf_conditions_c2c_fullrecon_sampling.ipynb - train classifiers using reconstructed chromatin images from the full latent space
train_clf_conditions_c2c_sharedrecon_sampling.ipynb - train classifiers using reconstructed chromatin images from the shared latent space
train_Clf_conditions_c2p_sampling.ipynb - train classifiers using protein images predicted from chromatin
train_clf_conditions_originalImg_chromatin_sampling.ipynb - train classifiers using the original chromatin images
train_clf_conditions_originalImg_sampling.ipynb - train classifiers using the original protein images
train_Clf_conditions_p2p_fullrecon_sampling.ipynb - train classifiers using reconstructed protein images from the full latent space
train_Clf_conditions_p2p_sharedRecon_sampling.ipynb - train classifiers using reconstructed protein images from the shared latent space
getNMCO_allFeatures.ipynb - preprocess
getNMCOgroups.ipymb - group chromatin features by correlation and selecting one representative feature for each group
getNMCOgroups_protein.ipymb - group protein features by correlation and selecting one representative feature for each group
plot_examples_centerPCs_percentiles_noHeldOut.ipynb
plot_nmco_centerPCs_percentiles_chromatin_allfeatures_sampling_groupNMCOde.ipynb - identify chromatin features with significant changes along PCs of the latent spaces
plot_nmco_centerPCs_percentiles_chromatin_allfeatures_sampling_groupNMCO.ipynb - plot the significant chromatin features
plot_nmco_centerPCs_percentiles_protein_allfeatures_sampling.ipynb - identify protein features with significant changes along PCs of the latent spaces
plot_nmco_centerPCs_percentiles_protein_allfeatures_sampling_groupNMCO.ipynb - plot the significant protein features
train_clf_conditions_nmco_sampling.ipynb - train phenotype classifier using all represeentative morphological features
train_clf_conditions_nmco_sampling_featureAblation.ipynb - train phenotype classifier with feature ablation
plot_clf_conditions_nmco_sampling.ipynb - plot results
train_pred_chromatinImg2proteinFeatures.ipynb - train regression models compareImg2Features.ipynb - plot results
preprocess.ipynb
benchmarking_inpainting.ipynb - compare to the previous image inpainting method for protein image prediction: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007348
./citeseq directory
./hpa contains all three notebooks for the three models trained using each pair of chromain, ER, and microtubule markers.
./simulation contains the results of applying APOLLO to 5 simulated datasets with known ground truth of disentanglement.