Skip to content

RNASeq workflow 2.0.1 patch update #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions RNAseq/Workflow_Documentation/NF_RCP/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.0.1](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP_2.0.1/RNAseq/Workflow_Documentation/NF_RCP) - 2025-07-02

### Fixed

- Fixed fastqc metrics extraction in `parse_multiqc.py` script
- Added qc file validation output listing missing entries
- Updated multiqc parsing for fastqc metrics

## [2.0.0](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP_2.0.0/RNAseq/Workflow_Documentation/NF_RCP) - 2025-04-10

### Added
Expand Down
213 changes: 213 additions & 0 deletions RNAseq/Workflow_Documentation/NF_RCP/QC_metrics_README.md

Large diffs are not rendered by default.

26 changes: 15 additions & 11 deletions RNAseq/Workflow_Documentation/NF_RCP/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,9 +128,9 @@ All files required for utilizing the NF_RCP GeneLab workflow for processing RNAs
copy of latest NF_RCP version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:

```bash
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP_2.0.0/NF_RCP_2.0.0.zip
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP_2.0.1/NF_RCP_2.0.1.zip

unzip NF_RCP_2.0.0.zip
unzip NF_RCP_2.0.1.zip
```

<br>
Expand All @@ -142,10 +142,10 @@ unzip NF_RCP_2.0.0.zip
Although Nextflow can fetch Singularity images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).

To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP workflow:
> Note: This command should be run in the location containing the `NF_RCP_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes. Approximately 8GB of RAM is needed to download and build the Singularity images.
> Note: This command should be run in the location containing the `NF_RCP_2.0.1` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes. Approximately 8GB of RAM is needed to download and build the Singularity images.

```bash
bash NF_RCP_2.0.0/bin/prepull_singularity.sh NF_RCP_2.0.0/config/software/by_docker_image.config
bash NF_RCP_2.0.1/bin/prepull_singularity.sh NF_RCP_2.0.1/config/software/by_docker_image.config
```


Expand All @@ -161,7 +161,7 @@ export NXF_SINGULARITY_CACHEDIR=$(pwd)/singularity

### 4. Run the Workflow

While in the location containing the `NF_RCP_2.0.0` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow.
While in the location containing the `NF_RCP_2.0.1` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow.

Both workflows automatically load reference files and organism-specific gene annotation files from the [GeneLab annotations table](https://github.com/nasa/GeneLab_Data_Processing/blob/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv). For organisms not listed in the table or to use alternative reference files, additional workflow parameters can be specified.

Expand All @@ -175,7 +175,7 @@ Both workflows automatically load reference files and organism-specific gene ann
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of reference fasta and gtf files

```bash
nextflow run NF_RCP_2.0.0/main.nf \
nextflow run NF_RCP_2.0.1/main.nf \
-profile singularity,local \
--accession OSD-194
```
Expand All @@ -187,7 +187,7 @@ nextflow run NF_RCP_2.0.0/main.nf \
#### 4b. Approach 2: Run the workflow on a GeneLab RNAseq dataset with custom reference fasta and gtf files

```bash
nextflow run NF_RCP_2.0.0/main.nf \
nextflow run NF_RCP_2.0.1/main.nf \
-profile singularity,local \
--accession OSD-194 \
--reference_version 112 \
Expand All @@ -205,7 +205,7 @@ nextflow run NF_RCP_2.0.0/main.nf \
#### 4c. Approach 3: Run the workflow on a non-GeneLab dataset using a user-created runsheet with automatic retrieval of reference fasta and gtf files

```bash
nextflow run NF_RCP_2.0.0/main.nf \
nextflow run NF_RCP_2.0.1/main.nf \
-profile singularity,local \
--runsheet_path </path/to/runsheet>
```
Expand All @@ -217,7 +217,7 @@ nextflow run NF_RCP_2.0.0/main.nf \
#### 4d. Approach 4: Run the workflow on a non-GeneLab dataset using a user-created runsheet with custom reference fasta and gtf files

```bash
nextflow run NF_RCP_2.0.0/main.nf \
nextflow run NF_RCP_2.0.1/main.nf \
-profile singularity \
--accession OSD-194 \
--reference_version 112 \
Expand All @@ -235,7 +235,7 @@ nextflow run NF_RCP_2.0.0/main.nf \

#### Required Parameters For All Approaches:

* `NF_RCP_2.0.0/main.nf` - Instructs Nextflow to run the NF_RCP workflow
* `NF_RCP_2.0.1/main.nf` - Instructs Nextflow to run the NF_RCP workflow

* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow; use `local` for local execution ([local.config](workflow_code/conf/local.config)) or `slurm` for SLURM cluster execution ([slurm.config](workflow_code/conf/slurm.config))
> Note: The output directory will be named `GLDS-#` when using a OSD or GLDS accession as input, or `results` when running the workflow with only a runsheet as input.
Expand Down Expand Up @@ -313,7 +313,7 @@ nextflow run NF_RCP_2.0.0/main.nf \
All parameters listed above and additional optional arguments for the RCP workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:

```bash
nextflow run NF_RCP_2.0.0/main.nf --help
nextflow run NF_RCP_2.0.1/main.nf --help
```

See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
Expand Down Expand Up @@ -354,6 +354,10 @@ The outputs from the Analysis Staging and V&V Pipeline Subworkflows are describe
- processing_info/nextflow_log_GLbulkRNAseq.txt (Nextflow execution logs captured via `nextflow log`)
- processing_info/nextflow_run_command_GLbulkRNAseq.txt (Exact command line used to initiate the workflow)

**QC metrics summary**

- Output:
- GeneLab/qc_metrics_GLbulkRNAseq.csv (comma-separated text file containing a summary of qc metrics and metadata for the dataset, see the [QC metrics README](./QC_metrics_README.md) for a complete list of field definitions)
<br>

Standard Nextflow resource usage logs are also produced as follows:
Expand Down
Loading
Loading