You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added separate pipeline document: [GL-DPPD-7XXX.md](../Pipeline_GL-DPPD-7XXX_Versions/GL-DPPD-7XXX.md) to document the pipeline steps for Bowtie2 alignment, used when the `--microbes` parameter is specified. In short, reads are aligned to a reference genome using Bowtie2 rather than STAR, gene counts are quantified using FeatureCounts rather than RSEM. Other steps remain unchanged.
26
+
Added separate pipeline document: [GL-DPPD-7115.md](../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md) to document the pipeline steps for Bowtie2 alignment, used when the `--microbes` parameter is specified. In short, reads are aligned to a reference genome using Bowtie2 rather than STAR, gene counts are quantified using FeatureCounts rather than RSEM. Other steps remain unchanged.
27
27
28
28
Added "_GLbulkRNAseq" suffix to output files to prevent naming conflicts with files relevant to other assays.
- Prokaryotes pipeline support via `--microbes` parameter
13
-
- In short, reads are aligned to a reference genome using Bowtie 2 rather than STAR, gene counts are quantified using featureCounts rather than RSEM. Other steps remain unchanged.
- Parallel rRNA-removed DGE analysis and results. Additional 04-DESeq2_NormCounts_rRNArm/ and 05-DESeq2_DGE_rRNArm/ directories are created for rRNA-removed DGE results.
12
+
- Prokaryotes pipeline support via `--microbes` parameter:
13
+
- Reads are aligned to a reference genome using Bowtie 2 rather than STAR, and gene counts are quantified using featureCounts instead of RSEM. Other steps remain unchanged.
14
+
- Added software versions:
15
+
- Bowtie 2 2.5.4
16
+
- subread 2.0.8
17
+
- Read alignment now outputs unaligned reads as FASTQ files.
Copy file name to clipboardExpand all lines: RNAseq/Workflow_Documentation/NF_RCP/README.md
+20-20Lines changed: 20 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,10 @@
1
-
# NF_RCP-F Workflow Information and Usage Instructions <!-- omit in toc -->
1
+
# NF_RCP Workflow Information and Usage Instructions <!-- omit in toc -->
2
2
3
3
## General Workflow Info <!-- omit in toc -->
4
4
5
5
### Implementation Tools <!-- omit in toc -->
6
6
7
-
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP-F) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
7
+
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP-F) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
8
8
9
9
### Workflow & Subworkflows <!-- omit in toc -->
10
10
@@ -17,20 +17,20 @@ The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-F]
17
17
</p>
18
18
19
19
---
20
-
The NF_RCP-F workflow is composed of three subworkflows as shown in the image above.
21
-
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7101-F pipeline
- This subworkflow extracts the metadata parameters (e.g. organism, library layout) needed for processing from the OSD/GLDS ISA archive and retrieves the raw reads files hosted on the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).
28
28
> *OSD/GLDS ISA archive*: ISA directory containing Investigation, Study, and Assay (ISA) metadata files for a respective GLDS dataset - the *ISA.zip file is located in the [OSDR](https://osdr.nasa.gov/bio/repo/) under 'Files' -> 'Study Metadata Files' for any GeneLab Data Set (GLDS) in the OSDR.
29
29
30
-
2.**RNASeq Consensus Pipeline Subworkflow**
30
+
2.**RNAseq Consensus Pipeline Subworkflow**
31
31
32
32
- Description:
33
-
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using [version F of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md).
33
+
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using [version G of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md).
34
34
35
35
3.**V&V Pipeline Subworkflow**
36
36
@@ -97,13 +97,13 @@ We recommend installing Singularity on a system wide level as per the associated
97
97
98
98
### 2. Download the Workflow Files
99
99
100
-
All files required for utilizing the NF_RCP-F GeneLab workflow for processing RNASeq data are in the [workflow_code](workflow_code) directory. To get a
100
+
All files required for utilizing the NF_RCP-F GeneLab workflow for processing RNAseq data are in the [workflow_code](workflow_code) directory. To get a
101
101
copy of latest NF_RCP-F version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
Although Nextflow can fetch Singularity images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
116
116
117
-
To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP-F workflow:
118
-
> Note: This command should be run in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
117
+
To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP-G workflow:
118
+
> Note: This command should be run in the location containing the `NF_RCP-G_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
While in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
137
+
While in the location containing the `NF_RCP-G_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
138
138
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
139
139
140
140
<br>
141
141
142
142
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files
143
143
144
144
```bash
145
-
nextflow run NF_RCP-F_1.0.4/main.nf \
145
+
nextflow run NF_RCP-G_2.0.0/main.nf \
146
146
-profile singularity \
147
147
--gldsAccession OSD-194
148
148
```
@@ -154,7 +154,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
154
154
> Note: The `--ref_source` and `--ensemblVersion` parameters should match the reference source and version number of the local reference fasta and gtf files used
155
155
156
156
```bash
157
-
nextflow run NF_RCP-F_1.0.4/main.nf \
157
+
nextflow run NF_RCP-G_2.0.0/main.nf \
158
158
-profile singularity \
159
159
--gldsAccession OSD-194 \
160
160
--ensemblVersion 107 \
@@ -170,7 +170,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
170
170
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
171
171
172
172
```bash
173
-
nextflow run NF_RCP-F_1.0.4/main.nf \
173
+
nextflow run NF_RCP-G_2.0.0/main.nf \
174
174
-profile singularity \
175
175
--gldsAccession output_directory \
176
176
--runsheetPath </path/to/runsheet>
@@ -180,7 +180,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
180
180
181
181
**Required Parameters For All Approaches:**
182
182
183
-
* `NF_RCP-F_1.0.4/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
183
+
* `NF_RCP-G_2.0.0/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
184
184
185
185
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
186
186
@@ -230,7 +230,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
230
230
All parameters listed above and additional optional arguments for the RCP workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
231
231
232
232
```bash
233
-
nextflow run NF_RCP-F_1.0.4/main.nf --help
233
+
nextflow run NF_RCP-G_2.0.0/main.nf --help
234
234
```
235
235
236
236
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -242,7 +242,7 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
242
242
### 5. Additional Output Files
243
243
244
244
The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:
245
-
> Note: The outputs from the RNASeq Consensus Pipeline Subworkflow are documented in the [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md) processing protocol.
245
+
> Note: The outputs from the RNAseq Consensus Pipeline Subworkflow are documented in the [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md) processing protocol.
0 commit comments