Skip to content

Commit eead37a

Browse files
committed
add nfrcp readme revisions
1 parent 6b22f44 commit eead37a

File tree

1 file changed

+25
-30
lines changed
  • RNAseq/Workflow_Documentation/NF_RCP

1 file changed

+25
-30
lines changed

RNAseq/Workflow_Documentation/NF_RCP/README.md

Lines changed: 25 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Implementation Tools <!-- omit in toc -->
66

7-
The current GeneLab RNAseq consensus processing pipeline (RCP) for eukaryotic organisms, [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md) and the GeneLab RNAseq consensus pipeline [GL-DPPD-7115](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md), are implemented as a single [Nextflow](https://nextflow.io/) DSL2 workflow that utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
7+
The current GeneLab RNAseq consensus processing pipeline (RCP) for eukaryotic organisms ([GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md)) and prokaryotic organisms ([GL-DPPD-7115](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md)) are implemented as a single [Nextflow](https://nextflow.io/) DSL2 workflow that utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
88

99
### Workflow & Subworkflows <!-- omit in toc -->
1010

@@ -28,14 +28,13 @@ The current GeneLab RNAseq consensus processing pipeline (RCP) for eukaryotic or
2828

2929
---
3030
The NF_RCP workflow is composed of three subworkflows as shown in the image above.
31-
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7101-G pipeline
32-
document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md):
31+
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md) and [GL-DPPD-7115](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md) pipeline documents:
3332

3433
1. **Analysis Staging Subworkflow**
3534

3635
- Description:
3736
- This subworkflow extracts the metadata parameters (e.g. organism, library layout) needed for processing from the OSD/GLDS ISA archive and retrieves the raw reads files hosted on the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).
38-
> *OSD/GLDS ISA archive*: ISA directory containing Investigation, Study, and Assay (ISA) metadata files for a respective GLDS dataset - the *ISA.zip file is located in the [OSDR](https://osdr.nasa.gov/bio/repo/) under 'Files' -> 'Study Metadata Files' for any GeneLab Data Set (GLDS) in the OSDR.
37+
> *OSD/GLDS ISA archive*: ISA directory containing Investigation, Study, and Assay (ISA) metadata files for a respective GLDS dataset - the *ISA.zip file is located under 'Files' -> 'Study Metadata Files' for any GeneLab Data Set (GLDS) in the [OSDR](https://osdr.nasa.gov/bio/repo/).
3938
4039
2. **RNAseq Consensus Pipeline Subworkflow**
4140

@@ -71,10 +70,9 @@ document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md):
7170
2. [Download the Workflow Files](#2-download-the-workflow-files)
7271
3. [Fetch Singularity Images](#3-fetch-singularity-images)
7372
4. [Run the Workflow](#4-run-the-workflow)
74-
4a. [Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files](#4a-approach-1-run-the-workflow-on-a-genelab-rnaseq-dataset-with-automatic-retrieval-of-ensembl-reference-fasta-and-gtf-files)
75-
4b. [Approach 2: Run the workflow on a GeneLab RNAseq dataset using local Ensembl reference fasta and gtf files](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-reference-fasta-and-gtf-files)
76-
4c. [Approach 3: Run the workflow on a non-GLDS dataset using a user-created runsheet](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
77-
4d. [Approach 4: Run the workflow on a GeneLab prokaryotic RNAseq dataset](#4d-approach-4-run-the-workflow-on-a-genelab-prokaryotic-rnaseq-dataset)
73+
4a. [Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of reference fasta and gtf files](#4a-approach-1-run-the-workflow-on-a-genelab-rnaseq-dataset-with-automatic-retrieval-of-reference-fasta-and-gtf-files)
74+
4b. [Approach 2: Run the workflow on a GeneLab RNAseq dataset using local reference fasta and gtf files](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-reference-fasta-and-gtf-files)
75+
4c. [Approach 3: Run the workflow on a non-GeneLab dataset using a user-created runsheet](#4c-approach-3-run-the-workflow-on-a-non-genelab-dataset-using-a-user-created-runsheet)
7876
5. [Additional Output Files](#5-additional-output-files)
7977

8078
<br>
@@ -150,18 +148,20 @@ export NXF_SINGULARITY_CACHEDIR=$(pwd)/singularity
150148
### 4. Run the Workflow
151149
152150
While in the location containing the `NF_RCP_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are four examples of how to run the NF_RCP workflow:
153-
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
151+
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --reference_version) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
154152
155153
<br>
156154
157-
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files
155+
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of reference fasta and gtf files
158156
159157
```bash
160158
nextflow run NF_RCP_2.0.0/main.nf \
161159
-profile singularity \
162160
--accession OSD-194
163161
```
164162
163+
> Note: For prokaryotic RNAseq datasets, add the parameter `--mode microbes` to run the workflow using the prokaryotic pipeline ([GL-DPPD-7115](../../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md)). The default value of this parameter is `default`, which will use the eukaryotic pipeline ([GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md)).
164+
165165
<br>
166166
167167
#### 4b. Approach 2: Run the workflow on a GeneLab RNAseq dataset using local reference fasta and gtf files
@@ -172,15 +172,15 @@ nextflow run NF_RCP_2.0.0/main.nf \
172172
nextflow run NF_RCP_2.0.0/main.nf \
173173
-profile singularity \
174174
--accession OSD-194 \
175-
--reference_version 107 \
175+
--reference_version 112 \
176176
--reference_source ensembl \
177177
--reference_fasta </path/to/fasta> \
178178
--reference_gtf </path/to/gtf>
179179
```
180180
181181
<br>
182182
183-
#### 4c. Approach 3: Run the workflow on a non-OSD dataset using a user-created runsheet
183+
#### 4c. Approach 3: Run the workflow on a non-GeneLab dataset using a user-created runsheet
184184
185185
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
186186
@@ -192,30 +192,25 @@ nextflow run NF_RCP_2.0.0/main.nf \
192192
193193
<br>
194194
195-
#### 4d. Approach 4: Run the workflow on a GeneLab prokaryotic RNAseq dataset
196-
197-
```bash
198-
nextflow run NF_RCP_2.0.0/main.nf \
199-
-profile singularity \
200-
--mode microbes \
201-
--accession OSD-185
202-
```
203-
204-
<br>
205-
206195
**Required Parameters For All Approaches:**
207196
208197
* `NF_RCP_2.0.0/main.nf` - Instructs Nextflow to run the NF_RCP workflow
209198
210199
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
211-
> Note: The output directory will be named `GLDS-#` when using a OSDR or GLDS accession as input, or `results` when running the workflow with only a runsheet as input.
200+
> Note: The output directory will be named `GLDS-#` when using a OSD or GLDS accession as input, or `results` when running the workflow with only a runsheet as input.
201+
202+
203+
<br>
204+
205+
**Additional Required Parameters For [Approach 1](#4a-approach-1-run-the-workflow-on-a-genelab-rnaseq-dataset-with-automatic-retrieval-of-reference-fasta-and-gtf-files):**
212206
207+
* `--accession` - The OSD or GLDS ID for the dataset to be processed, eg. `GLDS-194` or `OSD-194`
213208
214209
<br>
215210
216-
**Additional Required Parameters For [Approach 2](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-ensembl-reference-fasta-and-gtf-files):**
211+
**Additional Required Parameters For [Approach 2](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-reference-fasta-and-gtf-files):**
217212
218-
* `--reference_version` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example); only needed when using Ensembl as the reference source
213+
* `--reference_version` - specifies the reference source version to use for the reference genome (Ensembl release `112` is used in this example); only needed when using Ensembl as the reference source
219214
220215
* `--reference_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`)
221216
@@ -235,12 +230,12 @@ nextflow run NF_RCP_2.0.0/main.nf \
235230
236231
* `--force_single_end` - forces the analysis to use single end processing; for paired end datasets, this means only R1 is used; for single end datasets, this should have no effect
237232
238-
* `--reference_store_path` - specifies the directory to store the Ensembl fasta and gtf files (Default: within the directory structure created by default in the launch directory)
233+
* `--reference_store_path` - specifies the directory to store the reference fasta and gtf files (Default: within the directory structure created by default in the launch directory)
239234
240-
* `--derived_store_path` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory)
235+
* `--derived_store_path` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory) `
241236
242-
* `--runsheet_path` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
243-
> This is required when prcessing a non-OSD dataset as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
237+
* `--runsheet_path` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the OSDR for the dataset being processed)
238+
> This is required when prcessing a non-OSDR dataset as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-genelab-dataset-using-a-user-created-runsheet)
244239
245240
* `--mode` - specifies which pipeline to use: set to `default` to run GL-DPPD-7101-G pipeline or set to `microbes` for the GL-DPPD-7115 prokaryotic pipeline (Default value: `default`)
246241
> This allows the workflow to process either eukaryotic (default) or prokaryotic RNAseq data using the appropriate pipeline

0 commit comments

Comments
 (0)