You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Microarray/Affymetrix/Pipeline_GL-DPPD-7114_Versions/GL-DPPD-7114.md
+70-29Lines changed: 70 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,6 @@
1
1
# GeneLab bioinformatics processing pipeline for Affymetrix microarray data <!-- omit in toc -->
2
2
3
-
> **This page holds an overview and instructions for how GeneLab processes Affymetrix microarray datasets. Exact processing commands and GL-DPPD-7114 version used for specific GeneLab datasets (GLDS) are provided with their processed data in the [Open Science Data
> **This page holds an overview and instructions for how GeneLab processes Affymetrix microarray datasets. Exact processing commands and GL-DPPD-7114 version used for specific GeneLab datasets (GLDS) are provided with their processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo).**
5
4
>
6
5
> \* The pipeline detailed below is currently used for animal and Arabidopsis Thaliana studies only, it will be updated soon for processing microbe microarray data and other plant data.
# General processing overview with example commands
76
75
77
-
> Exact processing commands for a specific GLDS that has been released are provided with the processed data in the [OSDR](https://osdr.nasa.gov/bio/repo).
78
-
>
79
-
> All output files in **bold** are published with the Affymetrix microarray processed data in the [OSDR](https://osdr.nasa.gov/bio/repo).
76
+
> Exact processing commands and output files listed in **bold** below are included with each Microarray processed dataset in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).
80
77
81
78
---
82
79
@@ -167,6 +164,36 @@ dir.create(DIR_DGE)
167
164
original_par<- par()
168
165
options(preferRaster=TRUE) # use Raster when possible to avoid antialiasing artifacts in images
169
166
167
+
# Utility function to improve robustness of function calls
168
+
# Used to remedy intermittent internet issues during runtime
169
+
retry_with_delay<-function(func, ...) {
170
+
max_attempts=5
171
+
initial_delay=10
172
+
delay_increase=30
173
+
attempt<-1
174
+
current_delay<-initial_delay
175
+
while (attempt<=max_attempts) {
176
+
result<- tryCatch(
177
+
expr= func(...),
178
+
error=function(e) e
179
+
)
180
+
181
+
if (!inherits(result, "error")) {
182
+
return(result)
183
+
} else {
184
+
if (attempt<max_attempts) {
185
+
message(paste("Retry attempt", attempt, "failed for function with name <", deparse(substitute(func)) ,">. Retrying in", current_delay, "second(s)..."))
186
+
Sys.sleep(current_delay)
187
+
current_delay<-current_delay+delay_increase
188
+
} else {
189
+
stop(paste("Max retry attempts reached. Last error:", result$message))
190
+
}
191
+
}
192
+
193
+
attempt<-attempt+1
194
+
}
195
+
}
196
+
170
197
df_rs<- read.csv(runsheet, check.names=FALSE) %>%
171
198
dplyr::mutate_all(function(x) iconv(x, "latin1", "ASCII", sub="")) # Convert all characters to ascii, when not possible, remove the character
172
199
## Determines the organism specific annotation file to use based on the organism in the runsheet
main=""# This function uses 'main' as a suffix to the sample name. Here we want just the sample name, thus here main is an empty string
393
+
)
394
+
} else {
395
+
stop(glue::glue("No strategy for MA plots for {raw_data}"))
358
396
}
359
397
```
360
398
@@ -677,11 +715,12 @@ if (organism %in% c("athaliana")) {
677
715
ensembl_genomes_portal="plants"
678
716
print(glue::glue("Using ensembl genomes ftp to get specific version of probeset id mapping table. Ensembl genomes portal: {ensembl_genomes_portal}, version: {ensembl_genomes_version}"))
-**differential_expression.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probeset DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1220
-
-**normalized_expression_probeset.csv** (table containing the background corrected, normalized probeset expression values for each sample. The ProbesetID is the unique index column.)
1221
-
- visualization_PCA_table.csv (file used to generate GeneLab PCA plots)
1222
-
-**raw_intensities_probe.csv** (table containing the background corrected, unnormalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1223
-
-**normalized_intensities_probe.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1258
+
-**differential_expression_GLmicroarray.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probeset DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1259
+
-**normalized_expression_probeset_GLmicroarray.csv** (table containing the background corrected, normalized probeset expression values for each sample. The ProbesetID is the unique index column.)
1260
+
- visualization_PCA_table_GLmicroarray.csv (file used to generate GeneLab PCA plots)
1261
+
-**raw_intensities_probe_GLmicroarray.csv** (table containing the background corrected, unnormalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1262
+
-**normalized_intensities_probe_GLmicroarray.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1263
+
1264
+
> All steps of the Microarray pipeline are performed using R markdown and the completed R markdown is rendered (via Quarto) as an html file (**NF_MAAffymetrix_v\*_GLmicroarray.html**) and published in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/) for the respective dataset.
- Retry wrapper for functions that utilize internet resources. This is aimed to reduce failures due solely due to intermittent network issues. (ceb6d9a3)
13
+
14
+
### Fixed
15
+
16
+
- Missing Raw Data MA Plots when handling designs that loaded as `ExpressionFeatureSet` objects. (7af7192e)
17
+
- Additionally, future unhandled raw data classes will raise an exception rather than fail to plot silently.
Copy file name to clipboardExpand all lines: Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/README.md
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -97,9 +97,9 @@ All files required for utilizing the NF_MAAffymetrix GeneLab workflow for proces
97
97
copy of latest NF_MAAffymetrix version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
While in the location containing the `NF_MAAffymetrix_1.0.2` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAffymetrix workflow:
111
+
While in the location containing the `NF_MAAffymetrix_1.0.3` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAffymetrix workflow:
112
112
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
113
113
114
114
<br>
115
115
116
116
#### 3a. Approach 1: Run the workflow on a GeneLab Affymetrix Microarray dataset
117
117
118
118
```bash
119
-
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
119
+
nextflow run NF_MAAffymetrix_1.0.3/main.nf \
120
120
-profile singularity \
121
121
--osdAccession OSD-266 \
122
122
--gldsAccession GLDS-266
@@ -129,7 +129,7 @@ nextflow run NF_MAAffymetrix_1.0.2/main.nf \
129
129
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
130
130
131
131
```bash
132
-
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
132
+
nextflow run NF_MAAffymetrix_1.0.3/main.nf \
133
133
-profile singularity \
134
134
--runsheetPath </path/to/runsheet>
135
135
```
@@ -141,7 +141,7 @@ nextflow run NF_MAAffymetrix_1.0.2/main.nf \
141
141
> Note: Specifications for the ISA Tab Archive format can be found [here](https://isa-specs.readthedocs.io/en/latest/isatab.html).
142
142
143
143
```bash
144
-
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
144
+
nextflow run NF_MAAffymetrix_1.0.3/main.nf \
145
145
-profile singularity \
146
146
--isaArchivePath </path/to/isaArchive>
147
147
```
@@ -150,7 +150,7 @@ nextflow run NF_MAAffymetrix_1.0.2/main.nf \
150
150
151
151
**Required Parameters For All Approaches:**
152
152
153
-
* `NF_MAAffymetrix_1.0.2/main.nf` - Instructs Nextflow to run the NF_MAAffymetrix workflow
153
+
* `NF_MAAffymetrix_1.0.3/main.nf` - Instructs Nextflow to run the NF_MAAffymetrix workflow
154
154
155
155
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
156
156
@@ -182,7 +182,7 @@ nextflow run NF_MAAffymetrix_1.0.2/main.nf \
182
182
All parameters listed above and additional optional arguments for the NF_MAAffymetrix workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
183
183
184
184
```bash
185
-
nextflow run NF_MAAffymetrix_1.0.2/main.nf --help
185
+
nextflow run NF_MAAffymetrix_1.0.3/main.nf --help
186
186
```
187
187
188
188
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -196,7 +196,7 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
196
196
All R code steps and output are rendered within a Quarto document yielding the following:
197
197
198
198
- Output:
199
-
- NF_MAAffymetrix_1.0.2.html (html report containing executed code and output including QA plots)
199
+
- NF_MAAffymetrix_1.0.3.html (html report containing executed code and output including QA plots)
200
200
201
201
202
202
The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:
0 commit comments