Deconvolution using single-cell RNA sequencing dataset combined with a single-nucleus cell type.

The aim of this research project is to evaluate methods of transformation that can be applied to single-nucleus RNA sequencing dataset in order to improve deconvolution of bulk RNA-seq. Single-cell RNA-seq and bulk RNA-seq share similar expression (cytoplasmic and nuclear RNA), while single-nucleus RNA-seq only contains nuclear RNA. Our central hypothesis is that differences makes single-nucleus RNA-seq a poor deconvolution reference. We compare the two modalities in both simulations and real data, and we present some transformation options.

Reproducing the results

Fork or branch and clone the repository. Github has many tutorials on this.
Run the bash script to create the conda environments needed. This creates the conda environments from the environments folder using the yml files (two environments, one for R (env_deconv_r) and one for Python (env_deconv)).
Download the data we use, putting in the appropiate folder (data/ID). All data is publicly available and easily downloadable. All links and details can be found on the Excel sheet here.
After downloading the data, run the shell scripts, in order:
- 0_preprocess_data.sh
  - Runs preprocessing notebooks. Preprocessing and QC for all datasets.
- 1_train_scvi_models_sim.sh
  - Runs training scripts Trains scVI models (conditional and not conditional), with and without groups of differentially expressed genes.
- 2_prepare_deconvolution_sim.sh
  - Runs script to prepare files for deconvolution (only simulations). Files needed are one reference for each transform, where we transform one cell type at a time.
- 3_run_bayesprism
  - Runs script for deconvolution through BayesPrism/InstaPrism using the references and pseudobulks created on 2.Tutorials on InstaPrism available here.
- 4_process_results_sim.sh
  - Runs script to process the results from deconvolution (only of simulation), computes RMSE and Pearson, and puts it in a format for analysis.
- 5_results_notebook_sim.sh
  - Runs notebook to visualize the results from the simulations deconvolution. Includes plots from paper's figures.
You can look at the results in the results notebooks after! All plots included in all figures of the paper will be available in these.

Example on running bash on HPC:

sbatch scripts/0_preprocess_data.sh

Contribute to the research!

Instructions for adding your own method to the analysis:

Preprocess data as usual (if you want to add more data, see instructions below).
If your method requieres training, you can add the training code to the same script where we train scVI models. You can also train independently.
You can now create references for deconvolution with your method:
- Add your transformation to the same datasets as seen in the simulations script See where we highlight "Add your transformation here!" line 698.
Each of the notebooks have a "Settings" cell at the top. Add your reference identifier to these variables, add a color in the palette, and add a name for it for the plots. You might need to adjust size if the plots don't look right depending on the number of transforms you add.

Instructions for adding more data to the analysis:

Start by preprocessing single cell and single nucleus datasets. You can add our own Jupyter notebook to the notebooks folder. Then, run the preprocessing shell. Choose a new identifier for your dataset, add it to the data folder: data/YOURS.
After, it's just a matter of adding your new dataset identifier to the shell scripts:

Example:

datasets=("ADP" "PBMC" "MBC" "MSB") to datasets=("ADP" "PBMC" "MBC" "MSB" "YOURS")

If you only want to add data to the "real bulk" analysis, add it to the shell scripts that contain "Real_ADP", and add a array to the job at the top (we only use one real bulk dataset):

Example:

dataset=("Real_ADP") to dataset=("Real_ADP" "YOURS)

Data Access and Processing

Please download the Excel sheet: data/details/Data_Details.xlsx. This contains all links, names, filtering steps, and detials ona ll datasets used.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
data		data
environments		environments
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deconvolution using single-cell RNA sequencing dataset combined with a single-nucleus cell type.

Reproducing the results

Contribute to the research!

Instructions for adding your own method to the analysis:

Instructions for adding more data to the analysis:

Data Access and Processing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

greenelab/deconvolution_sc_sn_comparison

Folders and files

Latest commit

History

Repository files navigation

Deconvolution using single-cell RNA sequencing dataset combined with a single-nucleus cell type.

Reproducing the results

Contribute to the research!

Instructions for adding your own method to the analysis:

Instructions for adding more data to the analysis:

Data Access and Processing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages