Merge pull request #69 from torres-alexis/DEV2_SW_AmpIllumina-B

asaravia-butler · web-flow · commit a3e1bac008c4 · 2024-03-19T20:08:26.000-07:00
SW_AmpIllumina-B: Vis script readme updates, fixes:
Convert local DPPD links to main repo URLs
Improve step 2 instructions in vis script readme
Add input and output files sections
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md
@@ -2,7 +2,7 @@
 
 
 ## General workflow info <!-- omit in toc -->
-The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).  
+The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).  
 
 <br>
 
@@ -190,7 +190,7 @@ ___
 ### 5. Additional output files
 
 The outputs from the `run_workflow.py` and differential abundance analysis (DAA) / visualizations scripts are described below:
-> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
+> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
 
 - **Metadata Outputs:**
   - \*_AmpSeq_v1_runsheet.csv (table containing metadata required for processing, including the raw reads files location)
diff --git a/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/visualizations/README.md b/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/visualizations/README.md
@@ -13,7 +13,6 @@ The documentation for this script and its outputs can be found in steps 6-10 of
 
 - [1. Set up the execution environment](#1-set-up-the-execution-environment)  
 - [2. Run the visualization script manually](#2-run-the-visualization-script-manually)  
-- [3. Parameter definitions](#3-parameter-definitions)
 
 <br>
 
@@ -51,42 +50,51 @@ ___
 
 ### 2. Run the visualization script manually  
 
-To run the script, the variables `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix` must be specified. The [Illumina-R-visualizations.R](Illumina-R-visualizations.R) script can be executed from the command line by providing these variables as positional arguments.
+The [Illumina-R-visualizations.R](./Illumina-R-visualizations.R) script can be executed from the command line by providing `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix` as positional arguments, in their respecive order.
 
-Additionally, the `RColorBrewer_Palette` variable can be modified in the script.  This variable determines the color palette from the RColorBrewer package that is applied to the plots.
-
-```R
-# Store command line args as variables #
-args <- commandArgs(trailingOnly = TRUE)
-runsheet_file <- paste0(args[1])
-sample_info <- paste0(args[2])
-counts <- paste0(args[3])
-taxonomy <- paste0(args[4])
-assay_suffix <- paste(args[5])
-plots_dir <- paste0(args[6])
-output_prefix <- paste0(args[7])
-########################################
-
-RColorBrewer_Palette <- "Set1"
-```
+The example command below shows how to execute the script with the following parameters:
+ * runsheet_file: /path/to/runsheet.csv  
+ * sample_info: /path/to/unique-sample-IDs.txt
+ * counts: /path/to/counts_GLAmpSeq.tsv
+ * taxonomy: /path/to/taxonomy_GLAmpSeq.tsv
+ * assay_suffix: _GL_Ampseq
+ * plots_dir: /path/to/Plots/
+ * output_prefix: my_prefix_
 
-Example run command: 
 ```bash
-Rscript /path/to/visualizations/Illumina-R-visualizations.R "{runsheet_file}" "{sample_info}" "{counts}" "{taxonomy}" "{assay_suffix}" "{plots_dir}" "{output_prefix}"
+Rscript /path/to/visualizations/Illumina-R-visualizations.R "/path/to/runsheet.csv" "/path/to/unique-sample-IDs.txt" "/path/to/counts_GLAmpSeq.tsv" "/path/to/taxonomy_GLAmpSeq.tsv" "_GL_Ampseq" "/path/to/Plots/" "my_prefix_"
 ```
 
-<br>
-
-___
-
-### 3. Parameter definitions 
+Additionally, the `RColorBrewer_Palette` variable can be modified in the script.  This variable determines the color palette from the RColorBrewer package that is applied to the plots.
 
-**Parameter Definitions for Illumina-R-visualizations.R:**
-* `runsheet_file` – specifies the runsheet containing sample metadata required for processing (output from [GL-DPPD-7104-B step 6a](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
-* `sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
-* `counts` – specifies the ASV counts table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
-* `taxonomy` – specifies the taxonomy table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
-* `assay_suffix` – specifies a string that is prepended to the start of the output file names. Default: ""
+**Parameter Definitions:**
+* `runsheet_file` – specifies the table containing sample metadata required for processing 
+* `sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow 
+* `counts` – specifies the ASV counts table 
+* `taxonomy` – specifies the taxonomy table 
+* `assay_suffix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
 * `plots_dir` – specifies the path where output files will be saved
-* `output_prefix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
+* `output_prefix` – specifies a string that is prepended to the start of the output file names. Default: ""
 * `RColorBrewer_Palette` – specifies the RColorBrewer palette that will be used for coloring in the plots. Options include "Set1", "Accent", "Dark2", "Paired", "Pastel1", "Pastel2", "Set2", and "Set3". Default: "Set1"
+
+**Input Data:**
+* *runsheet.csv (output from [GL-DPPD-7104-B step 6a](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
+* unique-sample-IDs.txt (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
+* counts_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
+* taxonomy_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
+
+**Output Data:**
+* **{output_prefix}dendrogram_by_group{assay_suffix}.png** (dendrogram of euclidean distance - based hierarchical clustering of the samples, colored by experimental groups)
+* **{output_prefix}rarefaction_curves{assay_suffix}.png** (Rarefaction curves plot for all samples)
+* **{output_prefix}richness_and_diversity_estimates_by_sample{assay_suffix}.png** (Richness and diversity estimates plot for all samples)
+* **{output_prefix}richness_and_diversity_estimates_by_group{assay_suffix}.png** (Richness and diversity estimates plot for all groups)
+* **{output_prefix}relative_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
+* **{output_prefix}relative_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
+* **{output_prefix}samplewise_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
+* **{output_prefix}samplewise_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
+* **{output_prefix}PCoA_w_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, with sample labels)
+* **{output_prefix}PCoA_without_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, without sample labels)
+* **{output_prefix}normalized_counts{assay_suffix}.tsv** (size factor normalized ASV counts table)
+* **{output_prefix}group1_vs_group2.csv** (differential abundance tables for all pairwise contrasts of groups)
+* **{output_prefix}volcano_group1_vs_group2.png** (volcano plots for all pairwise contrasts of groups)
+* {output_prefix}color_legend_{assay_suffix}.png (color legend for all groups)