Skip to content

Commit a3e1bac

Browse files
Merge pull request #69 from torres-alexis/DEV2_SW_AmpIllumina-B
SW_AmpIllumina-B: Vis script readme updates, fixes: Convert local DPPD links to main repo URLs Improve step 2 instructions in vis script readme Add input and output files sections
2 parents 35975a6 + 1eb0870 commit a3e1bac

File tree

2 files changed

+42
-34
lines changed
  • Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B

2 files changed

+42
-34
lines changed

Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33

44
## General workflow info <!-- omit in toc -->
5-
The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).
5+
The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).
66

77
<br>
88

@@ -190,7 +190,7 @@ ___
190190
### 5. Additional output files
191191
192192
The outputs from the `run_workflow.py` and differential abundance analysis (DAA) / visualizations scripts are described below:
193-
> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
193+
> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
194194
195195
- **Metadata Outputs:**
196196
- \*_AmpSeq_v1_runsheet.csv (table containing metadata required for processing, including the raw reads files location)

Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/workflow_code/visualizations/README.md

Lines changed: 40 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ The documentation for this script and its outputs can be found in steps 6-10 of
1313

1414
- [1. Set up the execution environment](#1-set-up-the-execution-environment)
1515
- [2. Run the visualization script manually](#2-run-the-visualization-script-manually)
16-
- [3. Parameter definitions](#3-parameter-definitions)
1716

1817
<br>
1918

@@ -51,42 +50,51 @@ ___
5150

5251
### 2. Run the visualization script manually
5352

54-
To run the script, the variables `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix` must be specified. The [Illumina-R-visualizations.R](Illumina-R-visualizations.R) script can be executed from the command line by providing these variables as positional arguments.
53+
The [Illumina-R-visualizations.R](./Illumina-R-visualizations.R) script can be executed from the command line by providing `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix` as positional arguments, in their respecive order.
5554

56-
Additionally, the `RColorBrewer_Palette` variable can be modified in the script. This variable determines the color palette from the RColorBrewer package that is applied to the plots.
57-
58-
```R
59-
# Store command line args as variables #
60-
args <- commandArgs(trailingOnly = TRUE)
61-
runsheet_file <- paste0(args[1])
62-
sample_info <- paste0(args[2])
63-
counts <- paste0(args[3])
64-
taxonomy <- paste0(args[4])
65-
assay_suffix <- paste(args[5])
66-
plots_dir <- paste0(args[6])
67-
output_prefix <- paste0(args[7])
68-
########################################
69-
70-
RColorBrewer_Palette <- "Set1"
71-
```
55+
The example command below shows how to execute the script with the following parameters:
56+
* runsheet_file: /path/to/runsheet.csv
57+
* sample_info: /path/to/unique-sample-IDs.txt
58+
* counts: /path/to/counts_GLAmpSeq.tsv
59+
* taxonomy: /path/to/taxonomy_GLAmpSeq.tsv
60+
* assay_suffix: _GL_Ampseq
61+
* plots_dir: /path/to/Plots/
62+
* output_prefix: my_prefix_
7263

73-
Example run command:
7464
```bash
75-
Rscript /path/to/visualizations/Illumina-R-visualizations.R "{runsheet_file}" "{sample_info}" "{counts}" "{taxonomy}" "{assay_suffix}" "{plots_dir}" "{output_prefix}"
65+
Rscript /path/to/visualizations/Illumina-R-visualizations.R "/path/to/runsheet.csv" "/path/to/unique-sample-IDs.txt" "/path/to/counts_GLAmpSeq.tsv" "/path/to/taxonomy_GLAmpSeq.tsv" "_GL_Ampseq" "/path/to/Plots/" "my_prefix_"
7666
```
7767

78-
<br>
79-
80-
___
81-
82-
### 3. Parameter definitions
68+
Additionally, the `RColorBrewer_Palette` variable can be modified in the script. This variable determines the color palette from the RColorBrewer package that is applied to the plots.
8369

84-
**Parameter Definitions for Illumina-R-visualizations.R:**
85-
* `runsheet_file` – specifies the runsheet containing sample metadata required for processing (output from [GL-DPPD-7104-B step 6a](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
86-
* `sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
87-
* `counts` – specifies the ASV counts table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
88-
* `taxonomy` – specifies the taxonomy table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
89-
* `assay_suffix` – specifies a string that is prepended to the start of the output file names. Default: ""
70+
**Parameter Definitions:**
71+
* `runsheet_file` – specifies the table containing sample metadata required for processing
72+
* `sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow
73+
* `counts` – specifies the ASV counts table
74+
* `taxonomy` – specifies the taxonomy table
75+
* `assay_suffix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
9076
* `plots_dir` – specifies the path where output files will be saved
91-
* `output_prefix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
77+
* `output_prefix` – specifies a string that is prepended to the start of the output file names. Default: ""
9278
* `RColorBrewer_Palette` – specifies the RColorBrewer palette that will be used for coloring in the plots. Options include "Set1", "Accent", "Dark2", "Paired", "Pastel1", "Pastel2", "Set2", and "Set3". Default: "Set1"
79+
80+
**Input Data:**
81+
* *runsheet.csv (output from [GL-DPPD-7104-B step 6a](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
82+
* unique-sample-IDs.txt (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
83+
* counts_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
84+
* taxonomy_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
85+
86+
**Output Data:**
87+
* **{output_prefix}dendrogram_by_group{assay_suffix}.png** (dendrogram of euclidean distance - based hierarchical clustering of the samples, colored by experimental groups)
88+
* **{output_prefix}rarefaction_curves{assay_suffix}.png** (Rarefaction curves plot for all samples)
89+
* **{output_prefix}richness_and_diversity_estimates_by_sample{assay_suffix}.png** (Richness and diversity estimates plot for all samples)
90+
* **{output_prefix}richness_and_diversity_estimates_by_group{assay_suffix}.png** (Richness and diversity estimates plot for all groups)
91+
* **{output_prefix}relative_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
92+
* **{output_prefix}relative_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
93+
* **{output_prefix}samplewise_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
94+
* **{output_prefix}samplewise_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
95+
* **{output_prefix}PCoA_w_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, with sample labels)
96+
* **{output_prefix}PCoA_without_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, without sample labels)
97+
* **{output_prefix}normalized_counts{assay_suffix}.tsv** (size factor normalized ASV counts table)
98+
* **{output_prefix}group1_vs_group2.csv** (differential abundance tables for all pairwise contrasts of groups)
99+
* **{output_prefix}volcano_group1_vs_group2.png** (volcano plots for all pairwise contrasts of groups)
100+
* {output_prefix}color_legend_{assay_suffix}.png (color legend for all groups)

0 commit comments

Comments
 (0)