nasa
diff --git a/‎RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md
Lines changed: 9 additions & 8 deletions b/‎RNAseq/Pipeline_GL-DPPD-7101_Versions/GL-DPPD-7101-G.md
Lines changed: 9 additions & 8 deletions
diff --git a/‎RNAseq/Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md
Lines changed: 437 additions & 282 deletions b/‎RNAseq/Pipeline_GL-DPPD-7115_Versions/GL-DPPD-7115.md
Lines changed: 437 additions & 282 deletions
diff --git a/‎RNAseq/Workflow_Documentation/NF_RCP/README.md
Lines changed: 25 additions & 28 deletions b/‎RNAseq/Workflow_Documentation/NF_RCP/README.md
Lines changed: 25 additions & 28 deletions
diff --git a/‎RNAseq/Workflow_Documentation/NF_RCP/examples/runsheet/README.md
Lines changed: 1 addition & 1 deletion b/‎RNAseq/Workflow_Documentation/NF_RCP/examples/runsheet/README.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎RNAseq/Workflow_Documentation/NF_RCP/workflow_code/bin/assess_strandedness.py
100644100755 b/‎RNAseq/Workflow_Documentation/NF_RCP/workflow_code/bin/assess_strandedness.py
100644100755
@@ -4,7 +4,7 @@
 
 ---
 
-**Date:** January 28, 2025 [CHANGE TO BASELINE DATE]  
+**Date:** February 19, 2025  
 **Revision:** G  
 **Document Number:** GL-DPPD-7101-G  
 
@@ -1113,10 +1113,10 @@ echo "*: ${rRNA_count} rRNA entries removed." > *_rRNA_counts.txt
 
 ### 9a. Create Sample RunSheet
 
-> Note: Rather than running the command below to create the runsheet needed for processing, the runsheet may also be created manually by following the [file specification](../Workflow_Documentation/NF_RCP-F/examples/runsheet/README.md).
+> Note: Rather than running the command below to create the runsheet needed for processing, the runsheet may also be created manually by following the [file specification](../Workflow_Documentation/NF_RCP/examples/runsheet/README.md).
 
 ```bash
-### Download the *ISA.zip file from the GeneLab Repository ###
+### Download the *ISA.zip file from the Open Science Data Repository ###
 
 dpt-get-isa-archive \
  --accession GLDS-###
@@ -1144,7 +1144,7 @@ dpt-isa-to-runsheet --accession GLDS-### \
 
 **Output Data:**
 
-- *ISA.zip (compressed ISA directory containing Investigation, Study, and Assay (ISA) metadata files for the respective GLDS dataset, used to define sample groups - the *ISA.zip file is located in the [OSDR repository]([https://genelab-data.ndc.nasa.gov/genelab/projects](https://osdr.nasa.gov/bio/repo/)) under 'Files' -> 'Study Metadata Files')
+- *ISA.zip (compressed ISA directory containing Investigation, Study, and Assay (ISA) metadata files for the respective GLDS dataset, used to define sample groups - the *ISA.zip file is located in the [OSDR repository](https://osdr.nasa.gov/bio/repo/) under 'Files' -> 'Study Metadata Files')
 
 - **{GLDS-Accession-ID}_bulkRNASeq_v{version}_runsheet.csv** (table containing metadata required for processing, version denotes the dp_tools schema used to specify the metadata to extract from the ISA archive)
 
@@ -1192,7 +1192,7 @@ organism <- "organism_that_samples_were_derived_from"
 
 runsheet_path="/path/to/directory/containing/runsheet.csv/file" ## This is the runsheet created in Step 9a above
 work_dir="/path/to/working/directory/where/script/is/executed/from" 
-counts_dir="/path/to/directory/containing/RSEM/counts/files"
+input_counts="/path/to/directory/containing/RSEM/counts/files"
 norm_output="/path/to/normalized/counts/output/directory"
 DGE_output="/path/to/DGE/output/directory"
 
@@ -1297,7 +1297,7 @@ rm(contrast.names)
 ```R
 ### Import RSEM gene count data ###
 files <- list.files(
-    path = counts_dir, 
+    path = input_counts, 
     pattern = ".genes.results", 
     full.names = TRUE
 )
@@ -1592,9 +1592,10 @@ sessionInfo()
 
 
 **Input Data:**
-* `sampleTable` (data frame mapping samples to groups, output from [Step 9e](#9e-perform-dge-analysis))
-* `contrasts` (matrix defining pairwise comparisons between groups, output from [Step 9c](#9c-create-study-group-and-contrasts))
+
+* `contrasts` (matrix defining pairwise comparisons between groups, output from [Step 9c](#9c-configure-metadata-sample-grouping-and-group-comparisons))
 * `txi.rsem` (imported RSEM count data, output from [Step 9d](#9d-import-rsem-genecounts))
+* `sampleTable` (data frame mapping samples to groups, output from [Step 9e](#9e-perform-dge-analysis))
 * `normCounts` (normalized counts, output from [Step 9e](#9e-perform-dge-analysis))
 * `VSTCounts` (variance stabilized transformed counts, output from [Step 9e](#9e-perform-dge-analysis)) 
 * `output_table` (DGE output table, output from [Step 9f](#9f-add-statistics-and-gene-annotations-to-dge-results))
 
@@ -144,23 +144,23 @@ While in the location containing the `NF_RCP-G_2.0.0` directory that was downloa
 ```bash
 nextflow run NF_RCP-G_2.0.0/main.nf \ 
    -profile singularity \
-   --gldsAccession OSD-194 
+   --accession OSD-194 
 ```
 
 <br>
 
 #### 4b. Approach 2: Run the workflow on a GeneLab RNAseq dataset using local reference fasta and gtf files
 
-> Note: The `--ref_source` and `--ensemblVersion` parameters should match the reference source and version number of the local reference fasta and gtf files used
+> Note: The `--reference_source` and `--reference_version` parameters should match the reference source and version number of the local reference fasta and gtf files used
 
 ```bash
 nextflow run NF_RCP-G_2.0.0/main.nf \ 
    -profile singularity \
-   --gldsAccession OSD-194 \
-   --ensemblVersion 107 \
-   --ref_source ensembl \ 
-   --ref_fasta </path/to/fasta> \ 
-   --ref_gtf </path/to/gtf> 
+   --accession OSD-194 \
+   --reference_version 107 \
+   --reference_source ensembl \ 
+   --reference_fasta </path/to/fasta> \ 
+   --reference_gtf </path/to/gtf> 
 ```
 
 <br>
@@ -172,8 +172,8 @@ nextflow run NF_RCP-G_2.0.0/main.nf \
 ```bash
 nextflow run NF_RCP-G_2.0.0/main.nf \ 
    -profile singularity \
-   --gldsAccession output_directory \
-   --runsheetPath </path/to/runsheet> 
+   --accession output_directory \
+   --runsheet_path </path/to/runsheet> 
 ```
 
 <br>
@@ -184,43 +184,39 @@ nextflow run NF_RCP-G_2.0.0/main.nf \
 
 * `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
 
-* `--gldsAccession OSD-###` – specifies the OSD dataset to process through the RCP workflow (replace ### with the OSD number)  
-  > Note: The primary output directory will be titled "OSD-###"
-
-* `--gldsAccession output_directory` – specifies the output directory name to use when processing a non-OSD dataset, as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
+* `--accession [OSD-###|GLDS-###]` – specifies the OSDR dataset to process through the RCP workflow (replace ### with the OSD or GLDS number)  
+  > Note: The primary output directory will be named after the accession input, e.g. "OSD-194" or "GLDS-194"
 
 
 <br>
 
 **Additional Required Parameters For [Approach 2](#4b-approach-2-run-the-workflow-on-a-genelab-rnaseq-dataset-using-local-ensembl-reference-fasta-and-gtf-files):**
 
-* `--ensemblVersion` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example) 
+* `--reference_version` - specifies the Ensembl version to use for the reference genome (Ensembl release `107` is used in this example) 
 
-* `--ref_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`) 
+* `--reference_source` - specifies the source of the reference files used (the source indicated in the Approach 2 example is `ensembl`) 
 
-* `--ref_fasta` - specifices the path to a local fasta file 
+* `--reference_fasta` - specifices the path to a local fasta file 
 
-* `--ref_gtf` - specifices the path to a local gtf file  
+* `--reference_gtf` - specifices the path to a local gtf file  
 
-  > Note: If the local reference files specified are different than the Ensembl reference files used to create the [GeneLab annotations table](https://github.com/nasa/GeneLab_Data_Processing/blob/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv), additional gene annotations associated with any Ensembl/TAIR IDs from the specified files that are not shared in the GeneLab annotations will not be added to the DGE output table(s). 
+  > Note: If the local reference files specified are different than the reference files used to create the [GeneLab annotations table](https://github.com/nasa/GeneLab_Data_Processing/blob/master/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv), additional gene annotations associated with any gene IDs from the specified files that are not shared in the GeneLab annotations will not be added to the DGE output table(s). 
 
 <br>
 
 **Optional Parameters:**
 
-* `--skipVV` - skip the automated V&V processes (Default: the automated V&V processes are active) 
+* `--skip_vv` - skip the automated V&V processes (Default: the automated V&V processes are active) 
 
-* `--outputDir` - specifies the directory to save the raw and processed data files (Default: files are saved in the launch directory)  
+* `--outdir` - specifies the directory to save the raw and processed data files (Default: files are saved in a folder named `results` created in the launch directory)  
 
 * `--force_single_end` - forces the analysis to use single end processing; for paired end datasets, this means only R1 is used; for single end datasets, this should have no effect  
 
-* `--stageLocal TRUE|FALSE` - TRUE = download the raw reads files for the OSD dataset indicated, FALSE = disable raw reads download and processing (Default: TRUE)  
-
-* `--referenceStorePath` - specifies the directory to store the Ensembl fasta and gtf files (Default: within the directory structure created by default in the launch directory)  
+* `--reference_store_path` - specifies the directory to store the Ensembl fasta and gtf files (Default: within the directory structure created by default in the launch directory)  
 
-* `--derivedStorePath` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory) 
+* `--derived_store_path` - specifies the directory to store the tool-specific indices created during processing (Default: within the directory structure created by default in the launch directory) 
 
-* `--runsheetPath` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
+* `--runsheet_path` - specifies the path to a local runsheet (Default: a runsheet is automatically generated using the metadata on the GeneLab Repository for the OSD dataset being processed)
   > This is required when prcessing a non-OSD dataset as indicated in [Approach 3 above](#4c-approach-3-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
    
 <br>
@@ -272,8 +268,9 @@ Standard Nextflow resource usage logs are also produced as follows:
 **Nextflow Resource Usage Logs**
 
    - Output:
-     - Resource_Usage/execution_report_{timestamp}.html (an html report that includes metrics about the workflow execution including computational resources and exact workflow process commands)
-     - Resource_Usage/execution_timeline_{timestamp}.html (an html timeline for all processes executed in the workflow)
-     - Resource_Usage/execution_trace_{timestamp}.txt (an execution tracing file that contains information about each process executed in the workflow, including: submission time, start time, completion time, cpu and memory used, machine-readable output)
+     - nextflow_logs/execution_report_{timestamp}.html (an html report that includes metrics about the workflow execution including computational resources and exact workflow process commands)
+     - nextflow_logs/execution_timeline_{timestamp}.html (an html timeline for all processes executed in the workflow)
+     - nextflow_logs/execution_trace_{timestamp}.txt (an execution tracing file that contains information about each process executed in the workflow, including: submission time, start time, completion time, cpu and memory used, machine-readable output)
+     - nextflow_info/pipeline_dag_{timestamp}.html (a visualization of the workflow process DAG)
 
 <br>
@@ -18,7 +18,7 @@
 | Sample Name | string | Sample Name, added as a prefix to sample-specific processed data output files. Should not include spaces or weird characters. | Mmus_BAL-TAL_LRTN_BSL_Rep1_B7 |
 | has_ERCC | bool | Set to True if ERCC spike-ins are included in the samples. This ensures ERCC normalized DGE is performed in addition to standard DGE. | True |
 | paired_end | bool | Set to True if the samples were sequenced as paired-end. If set to False, samples are assumed to be single-end. | False |
-| organism | string | Species name used to map to the appropriate gene annotations file. Supported species can be found in the `species` column of the [GL-DPPD-7110_annotations.csv](../../../../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110_annotations.csv) file. | Mus musculus |
+| organism | string | Species name used to map to the appropriate gene annotations file. Supported species can be found in the `species` column of the [GL-DPPD-7110-A_annotations.csv](../../../../../GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110-A/GL-DPPD-7110-A_annotations.csv) file. | Mus musculus |
 | read1_path | string (url or local path) | Location of the raw reads file. For paired-end data, this specifies the forward reads fastq.gz file. | /my/data/sample_1.fastq.gz |
 | read2_path | string (url or local path) | Location of the raw reads file. For paired-end data, this specifies the reverse reads fastq.gz file. For single-end data, this column should be omitted. | /my/data/sample_2.fastq.gz |
 | Factor Value[<name, e.g. Spaceflight>] | string | A set of one or more columns specifying the experimental group the sample belongs to. In the simplest form, a column named 'Factor Value[group]' is sufficient. | Space Flight |