Skip to content

Core-Bioinformatics/cell-types-train-control-workflow

 
 

Repository files navigation

cell-types-train-control-workflow

Workflow for programmatic generation of classifiers for multiple tools.

Schematic representation of the process is shown below:

This is a workflow-of-workflows that sets up and triggers the execution of the following child workflows:

Each of these workflows generates classifiers based on all specified training datasets.

This workflow scans a user-provided comma-separated text file for specified SCXA dataests, imports them and trains a range of classifiers for each dataset. The following columns are expected in the config file:

  • dataset id (from SCXA)
  • technology type ("droplet" or "smart-seq")
  • matrix type (raw, filtered, CPM- or TPM-normalised)
  • number of clusters in marker gene file
  • barcode column (in SDRF file)
  • cell type column

Example config file can be found here.

Setting up config

The control workflow's and individual methods' parameters can be set from nextflow.config. You can speficy the path to the training datasets file there (data/datasets.txt is used by default). See the comments in the config file for further information.

Running the workflow

Prior to running the workflow, you will need to fetch and update the submodules for individual tool workflows. Run the following command from the workflow directory:

./bin/fetch_tool_training_workflows.sh 

You will need to have conda installed to run the workflow. It is recommended to use a clean environment to avoid dependency conflicts. Issue the following commands:

conda create -n nextflow && conda activate nextflow 
conda install nextflow 
./bin/run_control_workflow.sh <profile>

In the run_control_workflow.sh script, the <profile> parameter might be either standard or cluster depending on where you run the process. More information provided here.

Outputs

Workflow outputs can be found in data/<DATASET_ID> directory.

About

Control workflow for generating pre-trained classifiers on SCXA datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Nextflow 95.6%
  • Shell 4.4%