🏭 ENCODE sc/snATAC Automated Processing

Note: This pipeline is currently a work in progress.

This is the automated portion of the ENCODE single-cell/single-nucleus ATAC-Seq pipeline.

Information on the specific analysis steps can be found in the pipeline specification document.

Requirements

A Linux-based OS
A conda-based Python 3 installation
Snakemake v6.6.1+ (full installation)
An ENCODE DCC account with access to the necessary datasets

Additional requirements for cloud execution:

Kubectl
A cloud provider CLI for Kubernetes cluster creation
A cloud provider CLI for remote storage (if different from above)

All other dependencies are handled by the pipeline itself

Running the Pipeline

Local Execution

Install any necessary requirements above

Download the pipeline

git clone https://github.com/kundajelab/ENCODE_scatac

Activate the snakemake conda environment:
```
conda activate snakemake
```
Configure the pipeline in the /config directory. Detailed information can be found here.
Run the pipeline:
```
snakemake -k --use-conda --cores $NCORES 
```
Here, $NCORES is the number of cores to utilize

Note: When run for the first time, the pipeline will take some time to install conda packages.

Cloud Execution with Kubernetes

Install and configure the pipeline as specified above
Create a cloud cluster. Note that setup specifics may differ depending on the cloud provider. Example setup instructions for GCP and for Azure.
Configure remote storage. Instructions for each provider can be found here. For our purpose, only the environment variables and command line configuration are needed.
Run the pipeline:
```
snakemake -k --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX --jobs $NJOBS --envvars $VARS
```
Here:
- $REMOTE is the cloud storage provider, and should be one of {S3,GS,FTP,SFTP,S3Mocked,gfal,gridftp,iRODS,AzBlob,XRootD}
- $PREFIX is the target bucket name or subfolder in storage
- $NJOBS is the maximum number of jobs to be run in parallel
- $VARS is a list of environment variables for accessing remote storage. The --envvars flag can be omitted if no variables are required.

Additional Execution Modes

This pipeline has been tested locally and on the cloud via Kubernetes. However, Snakemake offers a number of additional execution modes.

Documentation on cluster execution

Documentation on cloud execution

Authors

Austin Wang
Primary developer
atwang@stanford.edu

Surag Nair
Secondary developer and advisor
surag@stanford.edu

Ben Parks
Secondary developer and advisor
bparks@stanford.edu

Laksshman Sundaram
Advisor
lakss@stanford.edu

Caleb Lareau
Advisor
clareau@stanford.edu

William Greenleaf
Supervisor
wjg@stanford.edu

Anshul Kundaje
Supervisor
akundaje@stanford.edu

Name		Name	Last commit message	Last commit date
Latest commit History 955 Commits
config		config
gke		gke
legacy		legacy
utils		utils
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🏭 ENCODE sc/snATAC Automated Processing

Requirements

Running the Pipeline

Local Execution

Cloud Execution with Kubernetes

Additional Execution Modes

Authors

About

Uh oh!

Releases

Packages

Languages

License

ENCODE-DCC/ENCODE_scatac

Folders and files

Latest commit

History

Repository files navigation

🏭 ENCODE sc/snATAC Automated Processing

Requirements

Running the Pipeline

Local Execution

Cloud Execution with Kubernetes

Additional Execution Modes

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages