Skip to content

Commit bde13d4

Browse files
Merge pull request #60 from torres-alexis/DEV_NF_RCP-F_1.0.4_pr_archived
NF_RCP-F_1.0.4 - TrimGalore! will now use autodetect for adaptor type - V&V migrated from dp_tools version 1.1.8 to 1.3.3 - Fix for sample wise checks reusing same sample - Workflow usage files will all follow output directory set by workflow user - Added '_GLbulkRNAseq' filename suffixes - ERCC notebook updates: - Empty subplots now hidden - Changed box-whisker plots from ascending to descending reference concentration order, added x-axis label - Added matching order to bar plots
2 parents 6c6688b + 5f7b00f commit bde13d4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+4885
-2110
lines changed

RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-F.md

Lines changed: 360 additions & 440 deletions
Large diffs are not rendered by default.

RNAseq/Workflow_Documentation/NF_RCP-F/CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,24 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.0.4](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP-F_1.0.4/RNAseq/Workflow_Documentation/NF_RCP-F) - 2024-02-08
9+
10+
### Fixed
11+
12+
- Workflow usage files will all follow output directory set by workflow user
13+
- ERCC Notebook:
14+
- Moved gene prefix definition to start of notebook
15+
- Added fallback for scenarios where every gene has zeros: use "poscounts" estimator to calculate a modified geometric mean
16+
- Reordered box-whisker plots from descending to ascending reference concentration order, ordered bar plots similarly
17+
18+
### Changed
19+
20+
- TrimGalore! will now use autodetect for adaptor type
21+
- V&V migrated from dp_tools version 1.1.8 to 1.3.4 including:
22+
- Migration of V&V protocol code to this codebase instead of dp_tools
23+
- Fix for sample wise checks reusing same sample
24+
- Added '_GLbulkRNAseq' to output file names
25+
826
## [1.0.3](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_RCP-F_1.0.3/RNAseq/Workflow_Documentation/NF_RCP-F) - 2023-01-25
927

1028
### Added

RNAseq/Workflow_Documentation/NF_RCP-F/README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -101,9 +101,9 @@ All files required for utilizing the NF_RCP-F GeneLab workflow for processing RN
101101
copy of latest NF_RCP-F version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
102102
103103
```bash
104-
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP-F_1.0.3/NF_RCP-F_1.0.3.zip
104+
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_RCP-F_4/NF_RCP-F_1.0.4.zip
105105
106-
unzip NF_RCP-F_1.0.3.zip
106+
unzip NF_RCP-F_1.0.4.zip
107107
```
108108
109109
<br>
@@ -115,10 +115,10 @@ unzip NF_RCP-F_1.0.3.zip
115115
Although Nextflow can fetch Singularity images from a url, doing so may cause issues as detailed [here](https://github.com/nextflow-io/nextflow/issues/1210).
116116
117117
To avoid this issue, run the following command to fetch the Singularity images prior to running the NF_RCP-F workflow:
118-
> Note: This command should be run in the location containing the `NF_RCP-F_1.0.3` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
118+
> Note: This command should be run in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files) above. Depending on your network speed, fetching the images will take ~20 minutes.
119119
120120
```bash
121-
bash NF_RCP-F_1.0.3/bin/prepull_singularity.sh NF_RCP-F_1.0.3/config/software/by_docker_image.config
121+
bash NF_RCP-F_1.0.4/bin/prepull_singularity.sh NF_RCP-F_1.0.4/config/software/by_docker_image.config
122122
```
123123
124124
@@ -134,15 +134,15 @@ export NXF_SINGULARITY_CACHEDIR=$(pwd)/singularity
134134
135135
### 4. Run the Workflow
136136
137-
While in the location containing the `NF_RCP-F_1.0.3` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
137+
While in the location containing the `NF_RCP-F_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_RCP-F workflow:
138138
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
139139
140140
<br>
141141
142142
#### 4a. Approach 1: Run the workflow on a GeneLab RNAseq dataset with automatic retrieval of Ensembl reference fasta and gtf files
143143
144144
```bash
145-
nextflow run NF_RCP-F_1.0.3/main.nf \
145+
nextflow run NF_RCP-F_1.0.4/main.nf \
146146
-profile singularity \
147147
--gldsAccession OSD-194
148148
```
@@ -154,7 +154,7 @@ nextflow run NF_RCP-F_1.0.3/main.nf \
154154
> Note: The `--ref_source` and `--ensemblVersion` parameters should match the reference source and version number of the local reference fasta and gtf files used
155155
156156
```bash
157-
nextflow run NF_RCP-F_1.0.3/main.nf \
157+
nextflow run NF_RCP-F_1.0.4/main.nf \
158158
-profile singularity \
159159
--gldsAccession OSD-194 \
160160
--ensemblVersion 107 \
@@ -170,7 +170,7 @@ nextflow run NF_RCP-F_1.0.3/main.nf \
170170
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
171171
172172
```bash
173-
nextflow run NF_RCP-F_1.0.3/main.nf \
173+
nextflow run NF_RCP-F_1.0.4/main.nf \
174174
-profile singularity \
175175
--gldsAccession output_directory \
176176
--runsheetPath </path/to/runsheet>
@@ -180,7 +180,7 @@ nextflow run NF_RCP-F_1.0.3/main.nf \
180180
181181
**Required Parameters For All Approaches:**
182182
183-
* `NF_RCP-F_1.0.3/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
183+
* `NF_RCP-F_1.0.4/main.nf` - Instructs Nextflow to run the NF_RCP-F workflow
184184
185185
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
186186
@@ -230,7 +230,7 @@ nextflow run NF_RCP-F_1.0.3/main.nf \
230230
All parameters listed above and additional optional arguments for the RCP workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
231231
232232
```bash
233-
nextflow run NF_RCP-F_1.0.3/main.nf --help
233+
nextflow run NF_RCP-F_1.0.4/main.nf --help
234234
```
235235
236236
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -255,14 +255,14 @@ The outputs from the Analysis Staging and V&V Pipeline Subworkflows are describe
255255
**V&V Pipeline Subworkflow**
256256
257257
- Output:
258-
- VV_Logs/VV_log_final.tsv (table containing V&V flags for all checks performed)
259-
- VV_Logs/VV_log_final_only_issues.tsv (table containing V&V flags ONLY for checks that produced a flag code >= 30)
260-
- VV_Logs/VV_log_VV_RAW_READS.tsv (table containing V&V flags ONLY for raw reads checks)
261-
- VV_Logs/VV_log_VV_TRIMMED_READS.tsv (table containing V&V flags for trimmed reads checks ONLY)
262-
- VV_Logs/VV_log_VV_STAR_ALIGNMENTS.tsv (table containing V&V flags for alignment file checks ONLY)
263-
- VV_Logs/VV_log_VV_RSEQC.tsv (table containing V&V flags for RSeQC file checks ONLY)
264-
- VV_Logs/VV_log_VV_RSEM_COUNTS.tsv (table containing V&V flags for RSEM raw count file checks ONLY)
265-
- VV_Logs/VV_log_VV_DESEQ2_ANALYSIS.tsv (table containing V&V flags for DESeq2 Analysis output checks ONLY)
258+
- VV_Logs/VV_log_final_GLbulkRNAseq.tsv (table containing V&V flags for all checks performed)
259+
- VV_Logs/VV_log_final_only_issues_GLbulkRNAseq.tsv (table containing V&V flags ONLY for checks that produced a flag code >= 30)
260+
- VV_Logs/VV_log_VV_RAW_READS_GLbulkRNAseq.tsv (table containing V&V flags ONLY for raw reads checks)
261+
- VV_Logs/VV_log_VV_TRIMMED_READS_GLbulkRNAseq.tsv (table containing V&V flags for trimmed reads checks ONLY)
262+
- VV_Logs/VV_log_VV_STAR_ALIGNMENTS_GLbulkRNAseq.tsv (table containing V&V flags for alignment file checks ONLY)
263+
- VV_Logs/VV_log_VV_RSEQC_GLbulkRNAseq.tsv (table containing V&V flags for RSeQC file checks ONLY)
264+
- VV_Logs/VV_log_VV_RSEM_COUNTS_GLbulkRNAseq.tsv (table containing V&V flags for RSEM raw count file checks ONLY)
265+
- VV_Logs/VV_log_VV_DESEQ2_ANALYSIS_GLbulkRNAseq.tsv (table containing V&V flags for DESeq2 Analysis output checks ONLY)
266266
267267
<br>
268268

RNAseq/Workflow_Documentation/NF_RCP-F/workflow_code/bin/Quantitate_non-zero_genes_per_sample.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
2020

2121
##### Export unnormalized gene counts table
2222
#setwd(file.path(counts_dir))
23-
write.csv(txi.rsem$counts,file='RSEM_Unnormalized_Counts.csv')
23+
write.csv(txi.rsem$counts,file='RSEM_Unnormalized_Counts_GLbulkRNAseq.csv')
2424

2525
##### Count the number of genes with non-zero counts for each sample
2626
rawCounts <- txi.rsem$counts
@@ -29,7 +29,7 @@ colnames(NumNonZeroGenes) <- c("Number of genes with non-zero counts")
2929

3030
##### Export the number of genes with non-zero counts for each sample
3131
#setwd(file.path(counts_dir))
32-
write.csv(NumNonZeroGenes,file='RSEM_NumNonZeroGenes.csv')
32+
write.csv(NumNonZeroGenes,file='RSEM_NumNonZeroGenes_GLbulkRNAseq.csv')
3333

3434
## print session info ##
3535
print("Session Info below: ")

RNAseq/Workflow_Documentation/NF_RCP-F/workflow_code/bin/Quantitate_non-zero_genes_per_sample_STAR.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,15 @@ for (i in 2:length(files)) {
6767

6868
##### Export unnormalized gene counts table
6969
# setwd(file.path(counts_dir))
70-
write.csv(df_full, file = "STAR_Unnormalized_Counts.csv")
70+
write.csv(df_full, file = "STAR_Unnormalized_Counts_GLbulkRNAseq.csv")
7171

7272
##### Count the number of genes with non-zero counts for each sample
7373
num_nonzero_genes <- (as.matrix(colSums(df_full > 0), row.names = 1))
7474
colnames(num_nonzero_genes) <- c("Number of genes with non-zero counts")
7575

7676
##### Export the number of genes with non-zero counts for each sample
7777
# setwd(file.path(counts_dir))
78-
write.csv(num_nonzero_genes, file = "STAR_NumNonZeroGenes.csv")
78+
write.csv(num_nonzero_genes, file = "STAR_NumNonZeroGenes_GLbulkRNAseq.csv")
7979

8080
## print session info ##
8181
print("Session Info below: ")

0 commit comments

Comments
 (0)