update amplicon workflow to version SW_AmpIllumina-B_1.2.3

asaravia-butler · asaravia-butler · commit 60fe37a0a2db · 2024-12-04T21:29:34.000-08:00
diff --git a/Amplicon/Illumina/Workflow_Documentation/README.md b/Amplicon/Illumina/Workflow_Documentation/README.md
@@ -6,7 +6,7 @@
 
 |Pipeline Version|Current Workflow Version (for respective pipeline version)|
 |:---------------|:---------------------------------------------------------|
-|*[GL-DPPD-7104-B.md](../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md)|[1.2.2](SW_AmpIllumina-B)|
+|*[GL-DPPD-7104-B.md](../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md)|[1.2.3](SW_AmpIllumina-B)|
 |[GL-DPPD-7104-A.md](../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-A.md)|[1.1.1](SW_AmpIllumina-A)|
 
 *Current GeneLab Pipeline/Workflow Implementation
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/CHANGELOG.md b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/CHANGELOG.md
@@ -1,5 +1,13 @@
 # Workflow change log
 
+## [1.2.3](https://github.com/nasa/GeneLab_Data_Processing/tree/SW_AmpIllumina-B_1.2.3/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B)
+- Fixed broken decipher reference database links to the following:
+  - 16S: https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData
+  - ITS: https://www2.decipher.codes/data/Downloads/TrainingSets/UNITE_v2023_July2023.RData
+  - 18S: https://www2.decipher.codes/data/Downloads/TrainingSets/PR2_v4_13_March2021.RData
+- Visualizations default setting is now set to TRUE
+  - Disable with optional `run_workflow.py` argument `--visualizations FALSE` or setting `config.yaml` `enable_visualizations` to "FALSE"
+
 ## [1.2.2](https://github.com/nasa/GeneLab_Data_Processing/tree/SW_AmpIllumina-B_1.2.2/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B)
 - Visualizations are now optional with the default being off.
   - Enable with optional `run_workflow.py` argument `--visualizations TRUE` or setting `config.yaml` `enable_visualizations` to "TRUE"
@@ -36,4 +44,4 @@
 
 <br> 
 
-All previous workflow changes were associated with [version A of the GeneLab Amplicon Seq Illumina Pipeline](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-A.md), and can be found in the [change log of the SW_AmpIllumina-A workflow](../SW_AmpIllumina-A/CHANGELOG.md).
+All previous workflow changes were associated with [version A of the GeneLab Amplicon Seq Illumina Pipeline](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-A.md), and can be found in the [change log of the SW_AmpIllumina-A workflow](../SW_AmpIllumina-A/CHANGELOG.md).
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md
@@ -39,10 +39,11 @@ Once mamba is installed, you can install the genelab-utils conda package in a ne
 mamba create -n genelab-utils -c conda-forge -c bioconda -c defaults -c astrobiomike 'genelab-utils==1.3.35'
 ```
 
-The environment then needs to be activated by running the following command:
+The environment then needs to be activated and updated by running the following commands:
 
 ```bash
 conda activate genelab-utils
+pip install --upgrade pyOpenSSL
 ```
 <br>
 
@@ -52,15 +53,15 @@ ___
 <!-- All files required for utilizing the GeneLab workflow for processing Illumina amplicon sequencing data are in the [workflow_code](workflow_code) directory. To get a copy of latest SW_AmpIllumina-B version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
 
 ```bash
-wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/SW_AmpIllumina-B_1.2.2/SW_AmpIllumina-B_1.2.2.zip
+wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/SW_AmpIllumina-B_1.2.3/SW_AmpIllumina-B_1.2.3.zip
 
-unzip SW_AmpIllumina-B_1.2.2.zip
+unzip SW_AmpIllumina-B_1.2.3.zip
 ```
 
-This downloaded the workflow into a directory called `SW_AmpIllumina-B_1.2.2`. To run the workflow, you will need to move into that directory by running the following command:
+This downloaded the workflow into a directory called `SW_AmpIllumina-B_1.2.3`. To run the workflow, you will need to move into that directory by running the following command:
 
 ```bash
-cd SW_AmpIllumina-B_1.2.2
+cd SW_AmpIllumina-B_1.2.3
 ``` -->
 
 All files required for utilizing the GeneLab workflow for processing Illumina amplicon sequencing data are in the [workflow_code](workflow_code) directory. To get a copy of the latest SW_AmpIllumina-B version on to your system, run the following command:
@@ -132,7 +133,7 @@ ___
 * `--run` - specifies the command used to execute the snakemake workflow; snakemake-specific parameters are defined below
 
 * `--outputDir` - specifies the output directory for the output files generated by the workflow 
-   > *This is an optional command that can be added outside the quotation marks in either approach to specify the output directory. If this option is not used, the output files will be printed to the current working directory, i.e. in the `SW_AmpIllumina-B_1.2.2` directory that was downloaded in [step 2](#2-download-the-workflow-template-files).*
+   > *This is an optional command that can be added outside the quotation marks in either approach to specify the output directory. If this option is not used, the output files will be printed to the current working directory, i.e. in the `SW_AmpIllumina-B_1.2.3` directory that was downloaded in [step 2](#2-download-the-workflow-template-files).*
 
 * `--trim-primers TRUE/FALSE` - specifies to trim primers (TRUE) or not (FALSE). Default: TRUE
    > *Note: Primers should virtually always be trimmed from amplicon datasets. This option is here for cases where they have already been removed.*
@@ -167,7 +168,7 @@ ___
 * `--specify-runsheet` - specifies the runsheet to use when multiple runsheets are generated
    > *Optional parameter used in Approach 1 for datasets that have multiple assays for the same amplicon target (e.g. [OSD-249](https://osdr.nasa.gov/bio/repo/data/studies/OSD-249)).*
 
-* `--visualizations TRUE/FALSE` - if set to TRUE, the [visualizations script](workflow_code/visualizations/Illumina-R-visualizations.R) will be run. Default: FALSE
+* `--visualizations TRUE/FALSE` - if set to TRUE, the [visualizations script](workflow_code/visualizations/Illumina-R-visualizations.R) will be run. Default: TRUE
    > *Note: For instructions on manually executing the visualizations script, refer to the [stand-alone execution documentation](./workflow_code/visualizations/README.md).*
 
 <br>
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/config.yaml b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/config.yaml
@@ -132,7 +132,7 @@ plots_dir:
     "workflow_output/Final_Outputs/Plots/"
 
 enable_visualizations:
-    "FALSE"
+    "TRUE"
 
 ############################################################
 ###################### GENERAL INFO ########################
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/Illumina-PE-R-processing.R b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/Illumina-PE-R-processing.R
@@ -160,7 +160,7 @@ dna <- DNAStringSet(getSequences(seqtab.nochim))
     # downloading reference R taxonomy object (at some point this will be stored somewhere on GeneLab's server and we won't download it, but should leave the code here, just commented out)
 cat("\n\n  Downloading reference database...\n\n")
 if ( target_region == "16S" ) { 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/SILVA_SSU_r138_2019.RData", "SILVA_SSU_r138_2019.RData")
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData", "SILVA_SSU_r138_2019.RData")
     # loading reference taxonomy object
     load("SILVA_SSU_r138_2019.RData")
     # removing downloaded file
@@ -170,7 +170,7 @@ if ( target_region == "16S" ) {
 
 } else if (target_region == "ITS" ) {
 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/UNITE_v2023_July2023.RData", "UNITE_v2023_July2023.RData")    
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/UNITE_v2023_July2023.RData", "UNITE_v2023_July2023.RData")    
     # loading reference taxonomy object
     load("UNITE_v2023_July2023.RData")
     # removing downloaded file
@@ -180,7 +180,7 @@ if ( target_region == "16S" ) {
 
 } else if (target_region == "18S" ) {
 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/PR2_v4_13_March2021.RData", "PR2_v4_13_March2021.RData")    
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/PR2_v4_13_March2021.RData", "PR2_v4_13_March2021.RData")    
     # loading reference taxonomy object
     load("PR2_v4_13_March2021.RData")
     # removing downloaded file
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/Illumina-SE-R-processing.R b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/Illumina-SE-R-processing.R
@@ -128,7 +128,7 @@ dna <- DNAStringSet(getSequences(seqtab.nochim))
     # downloading reference R taxonomy object (at some point this will be stored somewhere on GeneLab's server and we won't download it, but should leave the code here, just commented out)
 cat("\n\n  Downloading reference database...\n\n")
 if ( target_region == "16S" ) { 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/SILVA_SSU_r138_2019.RData", "SILVA_SSU_r138_2019.RData")
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/SILVA_SSU_r138_2019.RData", "SILVA_SSU_r138_2019.RData")
     # loading reference taxonomy object
     load("SILVA_SSU_r138_2019.RData")
     # removing downloaded file
@@ -138,7 +138,7 @@ if ( target_region == "16S" ) {
 
 } else if (target_region == "ITS" ) {
 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/UNITE_v2023_July2023.RData", "UNITE_v2023_July2023.RData")    
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/UNITE_v2023_July2023.RData", "UNITE_v2023_July2023.RData")    
     # loading reference taxonomy object
     load("UNITE_v2023_July2023.RData")
     # removing downloaded file
@@ -148,7 +148,7 @@ if ( target_region == "16S" ) {
 
 } else if (target_region == "18S" ) {
 
-    download.file("http://www2.decipher.codes/Classification/TrainingSets/PR2_v4_13_March2021.RData", "PR2_v4_13_March2021.RData")    
+    download.file("https://www2.decipher.codes/data/Downloads/TrainingSets/PR2_v4_13_March2021.RData", "PR2_v4_13_March2021.RData")    
     # loading reference taxonomy object
     load("PR2_v4_13_March2021.RData")
     # removing downloaded file
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/run_workflow.py b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/scripts/run_workflow.py
@@ -643,8 +643,8 @@ def main():
     
     parser.add_argument('--visualizations',
                         choices=['TRUE', 'FALSE'],
-                        default='FALSE',
-                    help='If set to TRUE, enables visualization of workflow results. Default: FALSE')
+                        default='TRUE',
+                    help='If set to FALSE, disables visualization of workflow results. Default: TRUE')
     
     # Check if no arguments were provided
     if len(sys.argv) == 1:
@@ -765,4 +765,4 @@ def main():
 
 
 if __name__ == "__main__":
-    main()
+    main()