Skip to content

The following repository is a work in progress. The Aim is to deconvolve samples with SC and SN, comparing performance.

License

Notifications You must be signed in to change notification settings

greenelab/deconvolution_sc_sn_comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deconvolution using single-cell RNA sequencing dataset combined with a single-nucleus cell type.

The aim of this research project is to evaluate methods of transformation that can be applied to single-nucleus RNA sequencing dataset in order to improve deconvolution of bulk RNA-seq. Single-cell RNA-seq and bulk RNA-seq share similar expression (cytoplasmic and nuclear RNA), while single-nucleus RNA-seq only contains nuclear RNA. Our central hypothesis is that differences makes single-nucleus RNA-seq a poor deconvolution reference. We compare the two modalities in both simulations and real data, and we present some transformation options.

Reproducing the results

Example on running bash on HPC:

sbatch scripts/0_preprocess_data.sh

Contribute to the research!

Instructions for adding your own method to the analysis:

  • Preprocess data as usual (if you want to add more data, see instructions below).

  • If your method requieres training, you can add the training code to the same script where we train scVI models. You can also train independently.

  • You can now create references for deconvolution with your method:

    • Add your transformation to the same datasets as seen in the simulations script See where we highlight "Add your transformation here!" line 698.
  • Each of the notebooks have a "Settings" cell at the top. Add your reference identifier to these variables, add a color in the palette, and add a name for it for the plots. You might need to adjust size if the plots don't look right depending on the number of transforms you add.

Instructions for adding more data to the analysis:

  • Start by preprocessing single cell and single nucleus datasets. You can add our own Jupyter notebook to the notebooks folder. Then, run the preprocessing shell. Choose a new identifier for your dataset, add it to the data folder: data/YOURS.
  • After, it's just a matter of adding your new dataset identifier to the shell scripts:

Example:

datasets=("ADP" "PBMC" "MBC" "MSB") to datasets=("ADP" "PBMC" "MBC" "MSB" "YOURS")

  • If you only want to add data to the "real bulk" analysis, add it to the shell scripts that contain "Real_ADP", and add a array to the job at the top (we only use one real bulk dataset):

Example:

dataset=("Real_ADP") to dataset=("Real_ADP" "YOURS)

Data Access and Processing

Please download the Excel sheet: data/details/Data_Details.xlsx. This contains all links, names, filtering steps, and detials ona ll datasets used.

About

The following repository is a work in progress. The Aim is to deconvolve samples with SC and SN, comparing performance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages