cell-types-train-control-workflow

Workflow for programmatic generation of classifiers for multiple tools.

Schematic representation of the process is shown below:

This is a workflow-of-workflows that sets up and triggers the execution of the following child workflows:

Each of these workflows generates classifiers based on all specified training datasets.

This workflow scans a user-provided comma-separated text file for specified SCXA dataests, imports them and trains a range of classifiers for each dataset. The following columns are expected in the config file:

dataset id (from SCXA)
technology type ("droplet" or "smart-seq")
matrix type (raw, filtered, CPM- or TPM-normalised)
number of clusters in marker gene file
barcode column (in SDRF file)
cell type column

Example config file can be found here.

Setting up config

The control workflow's and individual methods' parameters can be set from nextflow.config. You can speficy the path to the training datasets file there (data/datasets.txt is used by default). See the comments in the config file for further information.

Running the workflow

Prior to running the workflow, you will need to fetch and update the submodules for individual tool workflows. Run the following command from the workflow directory:

./bin/fetch_tool_training_workflows.sh

You will need to have conda installed to run the workflow. It is recommended to use a clean environment to avoid dependency conflicts. Issue the following commands:

conda create -n nextflow && conda activate nextflow 
conda install nextflow 
./bin/run_control_workflow.sh <profile>

In the run_control_workflow.sh script, the <profile> parameter might be either standard or cluster depending on where you run the process. More information provided here.

Outputs

Workflow outputs can be found in data/<DATASET_ID> directory.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
bin		bin
data		data
envs		envs
.gitignore		.gitignore
README.md		README.md
classifier_training.png		classifier_training.png
example_config.txt		example_config.txt
exclusions.yaml		exclusions.yaml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cell-types-train-control-workflow

Setting up config

Running the workflow

Outputs

About

Uh oh!

Releases

Packages

Languages

Core-Bioinformatics/cell-types-train-control-workflow

Folders and files

Latest commit

History

Repository files navigation

cell-types-train-control-workflow

Setting up config

Running the workflow

Outputs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages