Skip to content

Commit f9dec7c

Browse files
authored
[DEV_Metagenomics_Illumina] Minor documentation updates (#151)
- Pipeline doc - fixed typo in sed command - NF workflow doc - added explicit definitions for all profiles available - removed slurm from example command-line calls - updated headings to adjust slurm removal - simplified instructions for different run approaches - added conda env configuration information as additional information under the conda profile definition.
1 parent 2a16111 commit f9dec7c

File tree

2 files changed

+30
-34
lines changed

2 files changed

+30
-34
lines changed

Metagenomics/Illumina/Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107-A.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ awk -F $'\t' ' BEGIN { OFS=FS } { if ( $3 == "lineage" ) { print $1,$3,$5,$6,$7,
543543
else if ( $2 == "ORF has no hit to database" || $2 ~ /^no taxid found/ ) \
544544
{ print $1,"NA","NA","NA","NA","NA","NA","NA","NA" } else { n=split($3,lineage,";"); \
545545
print $1,lineage[n],$5,$6,$7,$8,$9,$10,$11 } } ' sample-1-gene-tax-out.tmp | \
546-
sed no support/NA/g' | sed 's/superkingdom/domain/' | sed 's/# ORF/gene_ID/' | \
546+
sed 's/no support/NA/g' | sed 's/superkingdom/domain/' | sed 's/# ORF/gene_ID/' | \
547547
sed 's/lineage/taxid/' > sample-1-gene-tax-out.tsv
548548
```
549549

Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina/README.md

Lines changed: 29 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Implementation Tools
66

7-
The current GeneLab Illumina metagenomics sequencing data processing pipeline (MGIllumina-A), [GL-DPPD-7107-A.md](../../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107-A.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) containers, [docker](https://docs.docker.com/get-started/) containers, or [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
7+
The current GeneLab Illumina metagenomics sequencing data processing pipeline (MGIllumina-A), [GL-DPPD-7107-A.md](../../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107-A.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) containers, [Docker](https://docs.docker.com/get-started/) containers, or [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
88

99
> **Note on reference databases**
1010
> Many reference databases are relied upon throughout this workflow. They will be installed and setup automatically the first time the workflow is run. All together, after installed and unpacked, they will take up about about 340 GB of storage, but they may also require up to 500GB during installation and initial un-packing, so be sure there is enough room on your system before running the workflow.
@@ -13,24 +13,18 @@ The current GeneLab Illumina metagenomics sequencing data processing pipeline (M
1313

1414
## Utilizing the Workflow
1515

16-
1. [Installing Nextflow, Singularity, and conda](#1-install-nextflow-and-singularity)
16+
1. [Installing Nextflow, Singularity, and conda](#1-installing-nextflow-singularity-and-conda)
1717
1a. [Install Nextflow and conda](#1a-install-nextflow-and-conda)
1818
1b. [Install Singularity](#1b-install-singularity)
19-
2019
2. [Download the workflow files](#2-download-the-workflow-files)
21-
2220
3. [Fetch Singularity Images](#3-fetch-singularity-images)
23-
2421
4. [Run the workflow](#4-run-the-workflow)
25-
4a. [Approach 1: Run slurm jobs in Singularity containers with OSD or GLDS accession as input](#4a-approach-1-run-slurm-jobs-in-singularity-containers-with-osd-or-glds-accession-as-input)
26-
4b. [Approach 2: Run slurm jobs in Singularity containers with a csv file as input](#4b-approach-2-run-slurm-jobs-in-singularity-containers-with-a-csv-file-as-input)
27-
4c. [Approach 3: Run jobs locally in conda environments and specify the path to one or more existing conda environments](#4c-approach-3-run-jobs-locally-in-conda-environments-and-specify-the-path-to-one-or-more-existing-conda-environments)
28-
4d. [Modify parameters and cpu resources in the Nextflow config file](#4d-modify-parameters-and-cpu-resources-in-the-nextflow-config-file)
29-
22+
4a. [Approach 1: Start with OSD or GLDS accession as input](#4a-approach-1-start-with-an-osd-or-glds-accession-as-input)
23+
4b. [Approach 2: Start with a runsheet csv file as input](#4b-approach-2-start-with-a-runsheet-csv-file-as-input)
24+
4c. [Modify parameters and compute resources in the Nextflow config file](#4c-modify-parameters-and-compute-resources-in-the-nextflow-config-file)
3025
5. [Workflow outputs](#5-workflow-outputs)
3126
5a. [Main outputs](#5a-main-outputs)
3227
5b. [Resource logs](#5b-resource-logs)
33-
3428
6. [Post Processing](#6-post-processing)
3529

3630
<br>
@@ -125,26 +119,18 @@ Take care to use the proper number of hyphens for each argument.
125119
126120
<br>
127121
128-
#### 4a. Approach 1: Run slurm jobs in Singularity containers with OSD or GLDS accession as input
122+
#### 4a. Approach 1: Start with an OSD or GLDS accession as input
129123
130124
```bash
131-
nextflow run main.nf -resume -profile slurm,singularity --accession OSD-574
125+
nextflow run main.nf -resume -profile singularity --accession OSD-574
132126
```
133127
134128
<br>
135129
136-
#### 4b. Approach 2: Run slurm jobs in Singularity containers with a csv file as input
130+
#### 4b. Approach 2: Start with a runsheet csv file as input
137131
138132
```bash
139-
nextflow run main.nf -resume -profile slurm,singularity --input_file PE_file.csv
140-
```
141-
142-
<br>
143-
144-
#### 4c. Approach 3: Run jobs locally in conda environments and specify the path to one or more existing conda environment(s)
145-
146-
```bash
147-
nextflow run main.nf -resume -profile mamba --input_file SE_file.csv --conda_megahit <path/to/existing/conda/environment>
133+
nextflow run main.nf -resume -profile singularity --input_file PE_file.csv
148134
```
149135
150136
<br>
@@ -155,24 +141,34 @@ nextflow run main.nf -resume -profile mamba --input_file SE_file.csv --conda_meg
155141
156142
* `-resume` - Resumes workflow execution using previously cached results
157143
158-
* `-profile` – Specifies the configuration profile(s) to load; `singularity` instructs Nextflow to setup and use Singularity for all software called in the workflow.
159-
> Note: Use `docker` to instruct Nextflow to use the Docker container environment instead.
144+
* `-profile` – Specifies the configuration profile(s) to load (multiple options can be provided as a comma-separated list)
145+
* Software environment profile options (choose one):
146+
* `singularity` - instructs Nextflow to use Singularity container environments
147+
* `docker` - instructs Nextflow to use Docker container environments
148+
* `conda` - instructs Nextflow to use conda environments via the conda package manager. By default, Nextflow will create environments at runtime using the yaml files in the [workflow_code/envs](workflow_code/envs/) folder. You can change this behavior by using the `--conda_*` workflow parameters or by editing the [nextflow.config](workflow_code/nextflow.config) file to specify a centralized conda environments directory via the `conda.cacheDir` parameter
149+
* `mamba` - instructs Nextflow to use conda environments via the mamba package manager.
150+
* Other option (can be combined with the software environment option above):
151+
* `slurm` - instructs Nextflow to use the [Slurm cluster management and job scheduling system](https://slurm.schedmd.com/overview.html) to schedule and run the jobs on a Slurm HPC cluster.
160152
161153
* `--accession` – A Genelab / OSD accession number e.g. OSD-574.
162-
> *Required only if you would like to pull and process data directly from OSDR*
154+
> *Required only if you would like to download and process data directly from OSDR*
155+
156+
* `--input_file` – A single-end or paired-end runsheet csv file containing assay metadata for each sample, including sample_id, forward, reverse, and/or paired. Please see the [runsheet documentation](./examples/runsheet) in this repository for examples on how to format this file.
157+
> *Required only if `--accession` is not passed as an argument*
158+
159+
<br>
163160
164-
* `--input_file` – A single-end or paired-end input csv file containing assay metadata for each sample, including sample_id, forward, reverse, and/or paired. Please see the [runsheet documentation](./examples/runsheet) in this repository for examples on how to format this file.
165-
> *Required only if --accession is not passed as an argument*
161+
> See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details on how to run Nextflow.
162+
> For additional information on editing the `nextflow.config` file, see [Step 4d](#4d-modify-parameters-and-cpu-resources-in-the-nextflow-config-file) below.
166163
167-
> See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details on how to run Nextflow.
168164
169165
<br>
170166
171-
#### 4d. Modify parameters and cpu resources in the nextflow config file
167+
#### 4c. Modify parameters and compute resources in the Nextflow config file
172168
173-
Additionally, the parameters and workflow resources can be directly specified in the nextflow.config file. For detailed instructions on how to modify and set parameters in the nextflow.config file, please see the [documentation here](https://www.nextflow.io/docs/latest/config.html).
169+
Additionally, all parameters and workflow resources can be directly specified in the [nextflow.config](./workflow_code/nextflow.config) file. For detailed instructions on how to modify and set parameters in the config file, please see the [documentation here](https://www.nextflow.io/docs/latest/config.html).
174170
175-
Once you've downloaded the workflow template, you can modify the parameters in the `params` scope and cpus/memory requirements in the `process` scope in your downloaded version of the [nextflow.config](workflow_code/nextflow.config) file as needed in order to match your dataset and system setup. Additionally, if necessary, you'll need to modify each variable in the [nextflow.config](workflow_code/nextflow.config) file to be consistent with the study you want to process and the machine you're using.
171+
Once you've downloaded the workflow template, you can modify the parameters in the `params` scope and cpus/memory requirements in the `process` scope in your downloaded version of the [nextflow.config](workflow_code/nextflow.config) file as needed in order to match your dataset and system setup. Additionally, if necessary, you can modify each variable in the [nextflow.config](workflow_code/nextflow.config) file to be consistent with the study you want to process and the computer you're using for processing.
176172
177173
<br>
178174
@@ -214,7 +210,7 @@ nextflow run post_processing.nf --help
214210
To generate the post-processing files after running the main processing workflow successfully, modify and set the parameters in [post_processing.config](workflow_code/post_processing.config), then run the following command:
215211
216212
```bash
217-
nextflow -C post_processing.config run post_processing.nf -resume -profile slurm,singularity
213+
nextflow -C post_processing.config run post_processing.nf -resume -profile singularity
218214
```
219215
220216
The outputs of the post-processing workflow are described below:

0 commit comments

Comments
 (0)