Skip to content

Commit 6c92e0e

Browse files
committed
update pipeline number
1 parent 8fe2072 commit 6c92e0e

File tree

5 files changed

+46
-39
lines changed

5 files changed

+46
-39
lines changed

RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
---
66

7-
**Date:** January 24, 2025
7+
**Date:** January 28, 2025
88
**Revision:** G
99
**Document Number:** GL-DPPD-7101-G
1010

@@ -23,14 +23,14 @@ Lauren Sanders (GeneLab Project Scientist)
2323

2424
## Updates from previous version
2525

26-
Added separate pipeline document: [GL-DPPD-7XXX.md](../Pipeline_GL-DPPD-7XXX_Versions/GL-DPPD-7XXX.md) to document the pipeline steps for Bowtie2 alignment, used when the `--microbes` parameter is specified. In short, reads are aligned to a reference genome using Bowtie2 rather than STAR, gene counts are quantified using FeatureCounts rather than RSEM. Other steps remain unchanged.
26+
Added separate pipeline document: [GL-DPPD-7115.md](../Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md) to document the pipeline steps for Bowtie2 alignment, used when the `--microbes` parameter is specified. In short, reads are aligned to a reference genome using Bowtie2 rather than STAR, gene counts are quantified using FeatureCounts rather than RSEM. Other steps remain unchanged.
2727

2828
Added "_GLbulkRNAseq" suffix to output files to prevent naming conflicts with files relevant to other assays.
2929

30-
Updated [Ensembl Reference Files](../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv) to use:
31-
- Animals: Ensembl release 111 → 112
32-
- Plants: Ensembl plants release 57 → 59
33-
- Bacteria: Ensembl bacteria release 57 → 59
30+
Updated [Ensembl Reference Files](../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv) now use:
31+
- Animals: Ensembl release 112
32+
- Plants: Ensembl plants release 59
33+
- Bacteria: Ensembl bacteria release 59
3434

3535
Software Updates:
3636

RNAseq/Pipeline_GL-DPPD-7XXX_Versions/GL-DPPD-7XXX.md renamed to RNAseq/Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
55
---
66

7-
**Date:** January 24, 2025
8-
**Document Number:** GL-DPPD-7XXX
7+
**Date:** January 28, 2025
8+
**Document Number:** GL-DPPD-7115
99

1010
**Submitted by:**
1111
Alexis Torres (GeneLab Data Processing Team)
@@ -93,6 +93,7 @@ Differences with default workflow:
9393
|Cutadapt|4.2|[https://cutadapt.readthedocs.io/en/stable/](https://cutadapt.readthedocs.io/en/stable/)|
9494
|TrimGalore!|0.6.10|[https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/](https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/)|
9595
|Bowtie 2|2.5.4|[https://github.com/BenLangmead/bowtie2](https://github.com/BenLangmead/bowtie2)|
96+
|subread|2.0.8|[https://subread.sourceforge.net/](https://subread.sourceforge.net/)|
9697
|Samtools|1.21|[http://www.htslib.org/](http://www.htslib.org/)|
9798
|infer_experiment|5.0.4|[http://rseqc.sourceforge.net/#infer-experiment-py](http://rseqc.sourceforge.net/#infer-experiment-py)|
9899
|geneBody_coverage|5.0.4|[http://rseqc.sourceforge.net/#genebody-coverage-py](http://rseqc.sourceforge.net/#genebody-coverage-py)|

RNAseq/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ RNAseq (Prokaryotes):
1717

1818
- Contains the current and previous GeneLab RNAseq consensus processing pipeline (RCP) versions documentation
1919

20-
* [**Pipeline_GL-DPPD-7XXX_Versions**](Pipeline_GL-DPPD-7XXX_Versions)
20+
* [**Pipeline_GL-DPPD-7115_Versions**](Pipeline_GL-DPPD-7115_Versions)
2121

2222
- Contains the current and previous GeneLab RNAseq (Prokaryotes) consensus processing pipeline (RCP) versions documentation
2323

RNAseq/Workflow_Documentation/NF_RCP/CHANGELOG.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,21 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8-
## [2.0.0](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP-G_2.0.0/RNAseq/Workflow_Documentation/NF_RCP) - 2025-01-24
8+
## [2.0.0](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP-G_2.0.0/RNAseq/Workflow_Documentation/NF_RCP) - 2025-01-28
99

1010
### Added
1111

12-
- Prokaryotes pipeline support via `--microbes` parameter
13-
- In short, reads are aligned to a reference genome using Bowtie 2 rather than STAR, gene counts are quantified using featureCounts rather than RSEM. Other steps remain unchanged.
14-
- Unaligned reads FASTQ output from STAR
15-
- Variance-stabilizing transformation (VST) counts output
16-
- Parallel rRNA-removed DGE analysis and results. Additional 04-DESeq2_NormCounts_rRNArm/ and 05-DESeq2_DGE_rRNArm/ directories are created for rRNA-removed DGE results.
12+
- Prokaryotes pipeline support via `--microbes` parameter:
13+
- Reads are aligned to a reference genome using Bowtie 2 rather than STAR, and gene counts are quantified using featureCounts instead of RSEM. Other steps remain unchanged.
14+
- Added software versions:
15+
- Bowtie 2 2.5.4
16+
- subread 2.0.8
17+
- Read alignment now outputs unaligned reads as FASTQ files.
18+
- Added Variance-stabilizing transformation (VST) counts table.**
19+
- Incorporated rRNA removal into gene counts and differential gene expression (DGE) analysis.
20+
- Separate results are generated for rRNA-removed DGE analysis, with new output directories:
21+
- `04-DESeq2_NormCounts_rRNArm/`
22+
- `05-DESeq2_DGE_rRNArm/`
1723

1824
### Changed
1925

@@ -39,10 +45,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3945
- matplotlib 3.8.3
4046
- numpy 1.26.4
4147
- scipy 1.14.1
42-
- Updated Ensembl reference files:
43-
- Animals: release 111 → 112
44-
- Plants: release 57 → 59
45-
- Bacteria: release 57 → 59
48+
- Updated [Ensembl Reference Files](../../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv) now use:
49+
- Animals: Ensembl release 112
50+
- Plants: Ensembl plants release 59
51+
- Bacteria: Ensembl bacteria release 59
4652
- Added "_GLbulkRNAseq" suffix to output files
4753
- RSeQC inner_distance minimum value now dynamically set based on read length
4854
- DESeq2 analysis now handles technical replicates

RNAseq/Workflow_Documentation/NF_RCP/README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# NF_RCP-F Workflow Information and Usage Instructions <!-- omit in toc -->
1+
# NF_RCP Workflow Information and Usage Instructions <!-- omit in toc -->
22

33
## General Workflow Info <!-- omit in toc -->
44

55
### Implementation Tools <!-- omit in toc -->
66

7-
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP-F) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
7+
The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-G](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md), is implemented as a [Nextflow](https://nextflow.io/) DSL2 workflow and utilizes [Singularity](https://docs.sylabs.io/guides/3.10/user-guide/introduction.html) to run all tools in containers. This workflow (NF_RCP-F) is run using the command line interface (CLI) of any unix-based system. While knowledge of creating workflows in Nextflow is not required to run the workflow as is, [the Nextflow documentation](https://nextflow.io/docs/latest/index.html) is a useful resource for users who want to modify and/or extend this workflow.
88

99
### Workflow & Subworkflows <!-- omit in toc -->
1010

@@ -17,20 +17,20 @@ The current GeneLab RNAseq consensus processing pipeline (RCP), [GL-DPPD-7101-F]
1717
</p>
1818

1919
---
20-
The NF_RCP-F workflow is composed of three subworkflows as shown in the image above.
21-
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7101-F pipeline
22-
document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md):
20+
The NF_RCP-G workflow is composed of three subworkflows as shown in the image above.
21+
Below is a description of each subworkflow and the additional output files generated that are not already indicated in the [GL-DPPD-7101-G pipeline
22+
document](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md):
2323

2424
1. **Analysis Staging Subworkflow**
2525

2626
- Description:
2727
- This subworkflow extracts the metadata parameters (e.g. organism, library layout) needed for processing from the OSD/GLDS ISA archive and retrieves the raw reads files hosted on the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).
2828
> *OSD/GLDS ISA archive*: ISA directory containing Investigation, Study, and Assay (ISA) metadata files for a respective GLDS dataset - the *ISA.zip file is located in the [OSDR](https://osdr.nasa.gov/bio/repo/) under 'Files' -> 'Study Metadata Files' for any GeneLab Data Set (GLDS) in the OSDR.
2929
30-
2. **RNASeq Consensus Pipeline Subworkflow**
30+
2. **RNAseq Consensus Pipeline Subworkflow**
3131

3232
- Description:
33-
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using [version F of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md).
33+
- This subworkflow uses the staged raw data and metadata parameters from the Analysis Staging Subworkflow to generate processed data using [version G of the GeneLab RCP](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md).
3434

3535
3. **V&V Pipeline Subworkflow**
3636

@@ -97,13 +97,13 @@ We recommend installing Singularity on a system wide level as per the associated
9797
9898
### 2. Download the Workflow Files
9999
100-
All files required for utilizing the NF_RCP-F GeneLab workflow for processing RNASeq data are in the [workflow_code](workflow_code) directory. To get a
100+
All files required for utilizing the NF_RCP-F GeneLab workflow for processing RNAseq data are in the [workflow_code](workflow_code) directory. To get a
101101
copy of latest NF_RCP-F version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
102102
103103
```bash
104-
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP-F_1.0.4/NF_RCP-F_1.0.4.zip
104+
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP-G_2.0.0/NF_RCP-G_2.0.0.zip
105105
106-
unzip NF_RCP-F_1.0.4.zip
106+
unzip NF_RCP-G_2.0.0.zip
107107
```
108108
109109
<br>
@@ -114,11 +114,11 @@ unzip NF_RCP-F_1.0.4.zip
114114
115115
Although Nextflow can fetch Singularity images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
116116
117-
To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP-F workflow:
118-
> Note: This command should be run in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
117+
To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP-G workflow:
118+
> Note: This command should be run in the location containing the `NF_RCP-G_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
119119
120120
```bash
121-
bash NF_RCP-F_1.0.4/bin/prepull_singularity.sh NF_RCP-F_1.0.4/config/software/by_docker_image.config
121+
bash NF_RCP-G_2.0.0/bin/prepull_singularity.sh NF_RCP-G_2.0.0/config/software/by_docker_image.config
122122
```
123123
124124
@@ -134,15 +134,15 @@ export NXF_SINGULARITY_CACHEDIR=$(pwd)/singularity
134134
135135
### 4. Run the Workflow
136136
137-
While in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
137+
While in the location containing the `NF_RCP-G_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
138138
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
139139
140140
<br>
141141
142142
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files
143143
144144
```bash
145-
nextflow run NF_RCP-F_1.0.4/main.nf \
145+
nextflow run NF_RCP-G_2.0.0/main.nf \
146146
-profile singularity \
147147
--gldsAccession OSD-194
148148
```
@@ -154,7 +154,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
154154
> Note: The `--ref_source` and `--ensemblVersion` parameters should match the reference source and version number of the local reference fasta and gtf files used
155155
156156
```bash
157-
nextflow run NF_RCP-F_1.0.4/main.nf \
157+
nextflow run NF_RCP-G_2.0.0/main.nf \
158158
-profile singularity \
159159
--gldsAccession OSD-194 \
160160
--ensemblVersion 107 \
@@ -170,7 +170,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
170170
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
171171
172172
```bash
173-
nextflow run NF_RCP-F_1.0.4/main.nf \
173+
nextflow run NF_RCP-G_2.0.0/main.nf \
174174
-profile singularity \
175175
--gldsAccession output_directory \
176176
--runsheetPath </path/to/runsheet>
@@ -180,7 +180,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
180180
181181
**Required Parameters For All Approaches:**
182182
183-
* `NF_RCP-F_1.0.4/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
183+
* `NF_RCP-G_2.0.0/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
184184
185185
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
186186
@@ -230,7 +230,7 @@ nextflow run NF_RCP-F_1.0.4/main.nf \
230230
All parameters listed above and additional optional arguments for the RCP workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
231231
232232
```bash
233-
nextflow run NF_RCP-F_1.0.4/main.nf --help
233+
nextflow run NF_RCP-G_2.0.0/main.nf --help
234234
```
235235
236236
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -242,7 +242,7 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
242242
### 5. Additional Output Files
243243
244244
The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:
245-
> Note: The outputs from the RNASeq Consensus Pipeline Subworkflow are documented in the [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md) processing protocol.
245+
> Note: The outputs from the RNAseq Consensus Pipeline Subworkflow are documented in the [GL-DPPD-7101-F](../../Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md) processing protocol.
246246
247247
**Analysis Staging Subworkflow**
248248

0 commit comments

Comments
 (0)