You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Microarray/Affymetrix/Pipeline_GL-DPPD-7114_Versions/GL-DPPD-7114.md
+67-25Lines changed: 67 additions & 25 deletions
Original file line number
Diff line number
Diff line change
@@ -164,6 +164,36 @@ dir.create(DIR_DGE)
164
164
original_par<- par()
165
165
options(preferRaster=TRUE) # use Raster when possible to avoid antialiasing artifacts in images
166
166
167
+
# Utility function to improve robustness of function calls
168
+
# Used to remedy intermittent internet issues during runtime
169
+
retry_with_delay<-function(func, ...) {
170
+
max_attempts=5
171
+
initial_delay=10
172
+
delay_increase=30
173
+
attempt<-1
174
+
current_delay<-initial_delay
175
+
while (attempt<=max_attempts) {
176
+
result<- tryCatch(
177
+
expr= func(...),
178
+
error=function(e) e
179
+
)
180
+
181
+
if (!inherits(result, "error")) {
182
+
return(result)
183
+
} else {
184
+
if (attempt<max_attempts) {
185
+
message(paste("Retry attempt", attempt, "failed for function with name <", deparse(substitute(func)) ,">. Retrying in", current_delay, "second(s)..."))
186
+
Sys.sleep(current_delay)
187
+
current_delay<-current_delay+delay_increase
188
+
} else {
189
+
stop(paste("Max retry attempts reached. Last error:", result$message))
190
+
}
191
+
}
192
+
193
+
attempt<-attempt+1
194
+
}
195
+
}
196
+
167
197
df_rs<- read.csv(runsheet, check.names=FALSE) %>%
168
198
dplyr::mutate_all(function(x) iconv(x, "latin1", "ASCII", sub="")) # Convert all characters to ascii, when not possible, remove the character
169
199
## Determines the organism specific annotation file to use based on the organism in the runsheet
main=""# This function uses 'main' as a suffix to the sample name. Here we want just the sample name, thus here main is an empty string
393
+
)
394
+
} else {
395
+
stop(glue::glue("No strategy for MA plots for {raw_data}"))
355
396
}
356
397
```
357
398
@@ -674,11 +715,12 @@ if (organism %in% c("athaliana")) {
674
715
ensembl_genomes_portal="plants"
675
716
print(glue::glue("Using ensembl genomes ftp to get specific version of probeset id mapping table. Ensembl genomes portal: {ensembl_genomes_portal}, version: {ensembl_genomes_version}"))
-**differential_expression.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probeset DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1217
-
-**normalized_expression_probeset.csv** (table containing the background corrected, normalized probeset expression values for each sample. The ProbesetID is the unique index column.)
1218
-
-visualization_PCA_table.csv (file used to generate GeneLab PCA plots)
1219
-
-**raw_intensities_probe.csv** (table containing the background corrected, unnormalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1220
-
-**normalized_intensities_probe.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1258
+
-**differential_expression_GLmicroarray.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probeset DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1259
+
-**normalized_expression_probeset_GLmicroarray.csv** (table containing the background corrected, normalized probeset expression values for each sample. The ProbesetID is the unique index column.)
1260
+
-visualization_PCA_table_GLmicroarray.csv (file used to generate GeneLab PCA plots)
1261
+
-**raw_intensities_probe_GLmicroarray.csv** (table containing the background corrected, unnormalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1262
+
-**normalized_intensities_probe_GLmicroarray.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1221
1263
1222
-
> All steps of the Microarray pipeline are performed using R markdown and the completed R markdown is rendered (via Quarto) as an html file (**NF_MAAffymetrix_\*.html**) and published in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/) for the respective dataset.
1264
+
> All steps of the Microarray pipeline are performed using R markdown and the completed R markdown is rendered (via Quarto) as an html file (**NF_MAAffymetrix_v\*_GLmicroarray.html**) and published in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/) for the respective dataset.
# GeneLab bioinformatics processing pipeline for Affymetrix microarray data
2
2
3
3
4
-
> **The document [`GL-DPPD-7114.md`](Pipeline_GL-DPPD-7114_Versions/GL-DPPD-7114.md) holds an overview and example commands for how GeneLab processes Affymetrix microarray datasets. See the [Repository Links](#repository-links) descriptions below for more information. Processed data output files and processing code is provided for each GLDS dataset along with the processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**
4
+
> **The document [`GL-DPPD-7114.md`](Pipeline_GL-DPPD-7114_Versions/GL-DPPD-7114.md) holds an overview and example commands for how GeneLab processes Affymetrix microarray datasets. See the [Repository Links](#repository-links) descriptions below for more information. Processed data output files and processing code is provided for each GLDS dataset along with the processed data in the [Open Science Data Repository (OSDR)](https://osdr.nasa.gov/bio/repo/).**
- Retry wrapper for functions that utilize internet resources. This is aimed to reduce failures due solely due to intermittent network issues. (ceb6d9a3)
13
+
14
+
### Fixed
15
+
16
+
- Missing Raw Data MA Plots when handling designs that loaded as `ExpressionFeatureSet` objects. (7af7192e)
17
+
- Additionally, future unhandled raw data classes will raise an exception rather than fail to plot silently.
- Workflow now produces a file called meta.sh (in the 'GeneLab' sub-directory) that contains information about the workflow run. This file is used by the post processing workflow to generate a protocol description. (5a8a255)
24
+
- POST_PROCESSING will now generate a protocol description using the contents of meta.sh and text templates. (801e2ad)
25
+
- Workflow can now be run using an ISA archive by supplying parameter: 'isaArchivePath' (as either a local path or public web uri) (8822069)
26
+
27
+
### Changed
28
+
29
+
- Update dp_tools from 1.3.2 to 1.3.4 (158ce5e)
30
+
- This updates the POST_PROCESSING workflow assay table to join multiple files by ',' instead of ',<SPACE>' and enables max flag code setting.
31
+
- Slightly reduced stringency in V&V check for log2fc computation to account for rounding errors, specifically from 99.9% of rows within tolerance to 99.5%. (9fd2c11)
32
+
- Publish directory behavior reworked to use the OSD accession as part of the default name. Now uses `resultsDir` instead of `outputDir` as the parameter name when a user does control the published files directory. (97cba72)
33
+
34
+
### Fixed
35
+
36
+
- Halt level flags now properly trigger workflow halt. (0885175)
37
+
- Boxplots now show all y-axis labels when working with many samples. (7ec10d4s)
38
+
- Density plot legend cex (character expansion) now has a minimum of 0.35 (rather than raising an exception for very large numbers of samples) (9a54fdc)
- Support for Arabidposis Thaliana datasets using the plants ensembl FTP server.
45
+
- Support for raw data FeatureSets (building on existing support for ExpressionSets)
46
+
- Better support for non-ascii characters in the runsheet, usually caused by such characters in the original ISA archive the runsheet is generated from.
47
+
48
+
### Fixed
49
+
50
+
- Typos related to shared code with Agilent 1 Channel platform.
51
+
52
+
### Changed
53
+
54
+
- Error message when encountering unique columns when reordering tables is now clearer about what unique columns were found.
55
+
- Post Processing Workflow: Assay Table Update now added '_array_' prefix to processed files instead of '_microarray_' prefix.
0 commit comments