Skip to content

Commit 91c99f7

Browse files
committed
add updates to nf rcp read me
1 parent 5b38f64 commit 91c99f7

File tree

2 files changed

+34
-16
lines changed

2 files changed

+34
-16
lines changed

RNAseq/Workflow_Documentation/NF_RCP/README.md

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Implementation Tools <!-- omit in toc -->
66

7-
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
7+
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md) and the GeneLab Prokaryotic RNAseq consensus pipeline [GL-DPPD-7115](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md), are implemented as a single [Nextflow](https://nextflow.io/) DSL2 workflow that utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This unified workflow (NF_RCP) can process both eukaryotic and prokaryotic RNAseq data through a configurable parameter (`--mode`) and is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
88

99
### Workflow & Subworkflows <!-- omit in toc -->
1010

@@ -30,7 +30,11 @@ document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md):
3030
2. **RNAseq Consensus Pipeline Subworkflow**
3131

3232
- Description:
33-
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using [version G of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md).
33+
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using either:
34+
- [Version G of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md) when the `--mode` parameter is omitted (default)
35+
- [The GeneLab Prokaryotic RCP](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md) when using `--mode microbes`
36+
37+
The selection impacts the choice of aligner and read counter tools used in the pipeline.
3438

3539
3. **V&V Pipeline Subworkflow**
3640

@@ -60,6 +64,7 @@ document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md):
6064
4a. [Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files](#4a-approach-1-run-the-workflow-on-a-genelab-rnaseq-dataset-with-automatic-retrieval-of-ensembl-reference-fasta-and-gtf-files)
6165
4b. [Approach 2: Run the workflow on a GeneLab RNAseq dataset using local Ensembl reference fasta and gtf files](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-reference-fasta-and-gtf-files)
6266
4c. [Approach 3: Run the workflow on a non-GLDS dataset using a user-created runsheet](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
67+
4d. [Approach 4: Run the workflow on a GeneLab prokaryotic RNAseq dataset](#4d-approach-4-run-the-workflow-on-a-genelab-prokaryotic-rnaseq-dataset)
6368
5. [Additional Output Files](#5-additional-output-files)
6469

6570
<br>
@@ -134,7 +139,7 @@ export NXF_SINGULARITY_CACHEDIR=$(pwd)/singularity
134139
135140
### 4. Run the Workflow
136141
137-
While in the location containing the `NF_RCP_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP workflow:
142+
While in the location containing the `NF_RCP_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are four examples of how to run the NF_RCP workflow:
138143
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
139144
140145
<br>
@@ -177,19 +182,30 @@ nextflow run NF_RCP_2.0.0/main.nf \
177182
178183
<br>
179184
185+
#### 4d. Approach 4: Run the workflow on a GeneLab prokaryotic RNAseq dataset
186+
187+
```bash
188+
nextflow run NF_RCP_2.0.0/main.nf \
189+
-profile singularity \
190+
--mode microbes \
191+
--accession OSD-185
192+
```
193+
194+
<br>
195+
180196
**Required Parameters For All Approaches:**
181197
182198
* `NF_RCP_2.0.0/main.nf` - Instructs Nextflow to run the NF_RCP workflow
183199
184200
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
185-
> Note: The output directory will be named `OSD-#` when using a OSDR or GLDS accession as input, or `results` when running the workflow with only a runsheet as input.
201+
> Note: The output directory will be named `GLDS-#` when using a OSDR or GLDS accession as input, or `results` when running the workflow with only a runsheet as input.
186202
187203
188204
<br>
189205
190206
**Additional Required Parameters For [Approach 2](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-ensembl-reference-fasta-and-gtf-files):**
191207
192-
* `--reference_version` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example)
208+
* `--reference_version` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example); only needed when using Ensembl as the reference source
193209
194210
* `--reference_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`)
195211
@@ -215,7 +231,10 @@ nextflow run NF_RCP_2.0.0/main.nf \
215231
216232
* `--runsheet_path` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
217233
> This is required when prcessing a non-OSD dataset as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
218-
234+
235+
* `--mode` - specifies which pipeline to use: set to `default` to run GL-DPPD-7101-G pipeline or set to `microbes` for the GL-DPPD-7115 prokaryotic pipeline (Default value: `default`)
236+
> This allows the workflow to process either eukaryotic (default) or prokaryotic RNAseq data using the appropriate pipeline
237+
219238
<br>
220239
221240
**Additional Optional Parameters:**
@@ -247,14 +266,14 @@ The outputs from the Analysis Staging and V&V Pipeline Subworkflows are describe
247266
**V&V Pipeline Subworkflow**
248267
249268
- Output:
250-
- VV_Logs/VV_log_final_GLbulkRNAseq.tsv (table containing V&V flags for all checks performed)
251-
- VV_Logs/VV_log_final_only_issues_GLbulkRNAseq.tsv (table containing V&V flags ONLY for checks that produced a flag code >= 30)
252-
- VV_Logs/VV_log_VV_RAW_READS_GLbulkRNAseq.tsv (table containing V&V flags ONLY for raw reads checks)
253-
- VV_Logs/VV_log_VV_TRIMMED_READS_GLbulkRNAseq.tsv (table containing V&V flags for trimmed reads checks ONLY)
254-
- VV_Logs/VV_log_VV_ALIGNMENT_GLbulkRNAseq.tsv (table containing V&V flags for alignment file checks ONLY)
255-
- VV_Logs/VV_log_VV_RSEQC_GLbulkRNAseq.tsv (table containing V&V flags for RSeQC file checks ONLY)
256-
- VV_Logs/VV_log_VV_COUNTS_GLbulkRNAseq.tsv (table containing V&V flags for gene quantification file checks ONLY)
257-
- VV_Logs/VV_log_VV_DESEQ2_ANALYSIS_GLbulkRNAseq.tsv (table containing V&V flags for DESeq2 Analysis output checks ONLY)
269+
- VV_Logs/VV_log_final_GLbulkRNAseq.csv (table containing V&V flags for all checks performed)
270+
- VV_Logs/VV_log_final_only_issues_GLbulkRNAseq.csv (table containing V&V flags ONLY for checks that produced a flag code >= 30)
271+
- VV_Logs/VV_log_VV_RAW_READS_GLbulkRNAseq.csv (table containing V&V flags ONLY for raw reads checks)
272+
- VV_Logs/VV_log_VV_TRIMMED_READS_GLbulkRNAseq.csv (table containing V&V flags for trimmed reads checks ONLY)
273+
- VV_Logs/VV_log_VV_ALIGNMENT_GLbulkRNAseq.csv (table containing V&V flags for alignment file checks ONLY)
274+
- VV_Logs/VV_log_VV_RSEQC_GLbulkRNAseq.csv (table containing V&V flags for RSeQC file checks ONLY)
275+
- VV_Logs/VV_log_VV_COUNTS_GLbulkRNAseq.csv (table containing V&V flags for gene quantification file checks ONLY)
276+
- VV_Logs/VV_log_VV_DESEQ2_ANALYSIS_GLbulkRNAseq.csv (table containing V&V flags for DESeq2 Analysis output checks ONLY)
258277
259278
<br>
260279

RNAseq/Workflow_Documentation/NF_RCP/workflow_code/workflows/rnaseq.nf

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -353,8 +353,7 @@ workflow RNASEQ {
353353
DGE_DESEQ2_RRNA_RM.out.contrasts,
354354
ADD_GENE_ANNOTATIONS_RRNA_RM.out.annotated_dge_table
355355
)
356-
VV_CONCAT_FILTER( ch_outdir,
357-
VV_RAW_READS.out.log | mix( VV_TRIMMED_READS.out.log, // Concatenate and filter V&V logs
356+
VV_CONCAT_FILTER( ch_outdir, VV_RAW_READS.out.log | mix( VV_TRIMMED_READS.out.log, // Concatenate and filter V&V logs
358357
VV_STAR_ALIGNMENT.out.log,
359358
VV_RSEQC.out.log,
360359
VV_RSEM_COUNTS.out.log,

0 commit comments

Comments
 (0)