You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge pull request #69 from torres-alexis/DEV2_SW_AmpIllumina-B
SW_AmpIllumina-B: Vis script readme updates, fixes:
Convert local DPPD links to main repo URLs
Improve step 2 instructions in vis script readme
Add input and output files sections
Copy file name to clipboardExpand all lines: Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
4
4
## General workflow info <!-- omit in toc -->
5
-
The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).
5
+
The current GeneLab Illumina amplicon sequencing data processing pipeline (AmpIllumina), [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md), is implemented as a [Snakemake](https://snakemake.readthedocs.io/en/stable/) workflow and utilizes [conda](https://docs.conda.io/en/latest/) environments to install/run all tools. This workflow (SW_AmpIllumina-B) is run using the command line interface (CLI) of any unix-based system. The workflow can be used even if you are unfamiliar with Snakemake and conda, but if you want to learn more about those, [this Snakemake tutorial](https://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html) within [Snakemake's documentation](https://snakemake.readthedocs.io/en/stable/) is a good place to start for that, and an introduction to conda with installation help and links to other resources can be found [here at Happy Belly Bioinformatics](https://astrobiomike.github.io/unix/conda-intro).
6
6
7
7
<br>
8
8
@@ -190,7 +190,7 @@ ___
190
190
### 5. Additional output files
191
191
192
192
The outputs from the `run_workflow.py` and differential abundance analysis (DAA) / visualizations scripts are described below:
193
-
> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](../../Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
193
+
> Note: Outputs from the Amplicon Seq - Illumina pipeline are documented in the [GL-DPPD-7104-B.md](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md) processing protocol.
194
194
195
195
- **Metadata Outputs:**
196
196
- \*_AmpSeq_v1_runsheet.csv (table containing metadata required for processing, including the raw reads files location)
To run the script, the variables `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix`must be specified. The [Illumina-R-visualizations.R](Illumina-R-visualizations.R) script can be executed from the command line by providing these variables as positional arguments.
53
+
The [Illumina-R-visualizations.R](./Illumina-R-visualizations.R)script can be executed from the command line by providing `runsheet_file`, `sample_info`, `counts`, `taxonomy`, `assay_suffix`, `plots_dir`, and `output_prefix`as positional arguments, in their respecive order.
55
54
56
-
Additionally, the `RColorBrewer_Palette` variable can be modified in the script. This variable determines the color palette from the RColorBrewer package that is applied to the plots.
57
-
58
-
```R
59
-
# Store command line args as variables #
60
-
args<- commandArgs(trailingOnly=TRUE)
61
-
runsheet_file<- paste0(args[1])
62
-
sample_info<- paste0(args[2])
63
-
counts<- paste0(args[3])
64
-
taxonomy<- paste0(args[4])
65
-
assay_suffix<- paste(args[5])
66
-
plots_dir<- paste0(args[6])
67
-
output_prefix<- paste0(args[7])
68
-
########################################
69
-
70
-
RColorBrewer_Palette<-"Set1"
71
-
```
55
+
The example command below shows how to execute the script with the following parameters:
Additionally, the `RColorBrewer_Palette` variable can be modified in the script. This variable determines the color palette from the RColorBrewer package that is applied to the plots.
83
69
84
-
**Parameter Definitions for Illumina-R-visualizations.R:**
85
-
*`runsheet_file` – specifies the runsheet containing sample metadata required for processing (output from [GL-DPPD-7104-B step 6a](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
86
-
*`sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
87
-
*`counts` – specifies the ASV counts table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
88
-
*`taxonomy` – specifies the taxonomy table (output from [GL-DPPD-7104-B step 5g](/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
89
-
*`assay_suffix` – specifies a string that is prepended to the start of the output file names. Default: ""
70
+
**Parameter Definitions:**
71
+
*`runsheet_file` – specifies the table containing sample metadata required for processing
72
+
*`sample_info` – specifies the text file containing the IDs of each sample used, required for running the SW_AmpIllumina workflow
73
+
*`counts` – specifies the ASV counts table
74
+
*`taxonomy` – specifies the taxonomy table
75
+
*`assay_suffix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
90
76
*`plots_dir` – specifies the path where output files will be saved
91
-
*`output_prefix` – specifies a string that is appended to the end of the output file names. Default: "_GLAmpSeq"
77
+
*`output_prefix` – specifies a string that is prepended to the start of the output file names. Default: ""
92
78
*`RColorBrewer_Palette` – specifies the RColorBrewer palette that will be used for coloring in the plots. Options include "Set1", "Accent", "Dark2", "Paired", "Pastel1", "Pastel2", "Set2", and "Set3". Default: "Set1"
79
+
80
+
**Input Data:**
81
+
**runsheet.csv (output from [GL-DPPD-7104-B step 6a](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#6a-create-sample-runsheet))
82
+
* unique-sample-IDs.txt (output from [run_workflow.py](/Amplicon/Illumina/Workflow_Documentation/SW_AmpIllumina-B/README.md#5-additional-output-files))
83
+
* counts_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
84
+
* taxonomy_GLAmpSeq.tsv (output from [GL-DPPD-7104-B step 5g](https://github.com/nasa/GeneLab_Data_Processing/blob/master/Amplicon/Illumina/Pipeline_GL-DPPD-7104_Versions/GL-DPPD-7104-B.md#5g-generating-and-writing-standard-outputs))
85
+
86
+
**Output Data:**
87
+
***{output_prefix}dendrogram_by_group{assay_suffix}.png** (dendrogram of euclidean distance - based hierarchical clustering of the samples, colored by experimental groups)
88
+
***{output_prefix}rarefaction_curves{assay_suffix}.png** (Rarefaction curves plot for all samples)
89
+
***{output_prefix}richness_and_diversity_estimates_by_sample{assay_suffix}.png** (Richness and diversity estimates plot for all samples)
90
+
***{output_prefix}richness_and_diversity_estimates_by_group{assay_suffix}.png** (Richness and diversity estimates plot for all groups)
91
+
***{output_prefix}relative_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
92
+
***{output_prefix}relative_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
93
+
***{output_prefix}samplewise_phyla{assay_suffix}.png** (taxonomic summaries plot based on phyla, for all samples)
94
+
***{output_prefix}samplewise_classes{assay_suffix}.png** (taxonomic summaries plot based on class, for all samples)
95
+
***{output_prefix}PCoA_w_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, with sample labels)
96
+
***{output_prefix}PCoA_without_labels{assay_suffix}.png** (principle Coordinates Analysis plot of VST transformed ASV counts, without sample labels)
97
+
***{output_prefix}normalized_counts{assay_suffix}.tsv** (size factor normalized ASV counts table)
98
+
***{output_prefix}group1_vs_group2.csv** (differential abundance tables for all pairwise contrasts of groups)
99
+
***{output_prefix}volcano_group1_vs_group2.png** (volcano plots for all pairwise contrasts of groups)
100
+
* {output_prefix}color_legend_{assay_suffix}.png (color legend for all groups)
0 commit comments