You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# note: '!!sym(VAR)' syntax allows usage of variable 'VAR' in dplyr functions due to NSE. ref: https://dplyr.tidyverse.org/articles/programming.html # NON_DPPD
768
744
dplyr::mutate(dplyr::across(!!sym(expected_attribute_name), as.character)) %>% # Ensure probeset ids treated as character type
dplyr::left_join(annot, by="ENSEMBL") %>% # Join with GeneLab Reference Annotation Table
1221
-
dplyr::mutate( count_ENSEMBL_mappings= ifelse(is.na(ENSEMBL), 0, count_ENSEMBL_mappings) ) # Convert NA mapping to 0
1196
+
dplyr::left_join(annot, by= c("ENSEMBL"=map_primary_keytypes[[unique(df_rs$organism)]])) %>% # Join with GeneLab Reference Annotation Table using key name expected in organism specific annotation table
1197
+
dplyr::mutate( count_ENSEMBL_mappings= ifelse(is.na(ENSEMBL), 0, count_ENSEMBL_mappings) ) %>% # Convert NA mapping to 0
-**differential_expression.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probe DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1219
+
-**differential_expression.csv** (table containing normalized probeset expression values for each sample, group statistics, Limma probeset DE results for each pairwise comparison, and gene annotations. The ProbesetID is the unique index column.)
1241
1220
-**normalized_expression_probeset.csv** (table containing the background corrected, normalized probeset expression values for each sample. The ProbesetID is the unique index column.)
1242
1221
- visualization_PCA_table.csv (file used to generate GeneLab PCA plots)
1243
1222
-**raw_intensities_probe.csv** (table containing the background corrected, unnormalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1244
-
-**normalized_intensities_probe.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
1223
+
-**normalized_intensities_probe.csv** (table containing the background corrected, normalized probe intensity values for each sample including gene annotations. The ProbeID is the unique index column.)
- Workflow now produces a file called meta.sh (in the 'GeneLab' sub-directory) that contains information about the workflow run. This file is used by the post processing workflow to generate a protocol description. (5a8a255)
13
+
- POST_PROCESSING will now generate a protocol description using the contents of meta.sh and text templates. (801e2ad)
14
+
- Workflow can now be run using an ISA archive by supplying parameter: 'isaArchivePath' (as either a local path or public web uri) (8822069)
15
+
16
+
### Changed
17
+
18
+
- Update dp_tools from 1.3.2 to 1.3.4 (158ce5e)
19
+
- This updates the POST_PROCESSING workflow assay table to join multiple files by ',' instead of ',<SPACE>' and enables max flag code setting.
20
+
- Slightly reduced stringency in V&V check for log2fc computation to account for rounding errors, specifically from 99.9% of rows within tolerance to 99.5%. (9fd2c11)
21
+
- Publish directory behavior reworked to use the OSD accession as part of the default name. Now uses `resultsDir` instead of `outputDir` as the parameter name when a user does control the published files directory. (97cba72)
22
+
23
+
### Fixed
24
+
25
+
- Halt level flags now properly trigger workflow halt. (0885175)
26
+
- Boxplots now show all y-axis labels when working with many samples. (7ec10d4s)
27
+
- Density plot legend cex (character expansion) now has a minimum of 0.35 (rather than raising an exception for very large numbers of samples) (9a54fdc)
-[3a. Approach 1: Run the workflow on a GeneLab Affymetrix Microarray dataset](#3a-approach-1-run-the-workflow-on-a-genelab-affymetrix-microarray-dataset)
56
56
-[3b. Approach 2: Run the workflow on a non-GLDS dataset using a user-created runsheet](#3b-approach-2-run-the-workflow-on-a-non-glds-dataset-using-a-user-created-runsheet)
57
+
-[3c. Approach 3: Run the workflow using an ISA Archive](#3c-approach-3-run-the-workflow-using-an-isa-archive)
@@ -96,9 +97,9 @@ All files required for utilizing the NF_MAAffymetrix GeneLab workflow for proces
96
97
copy of latest NF_MAAffymetrix version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
While in the location containing the `NF_MAAffymetrix_1.0.1` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAffymetrix workflow:
111
+
While in the location containing the `NF_MAAffymetrix_1.0.2` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAffymetrix workflow:
111
112
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
112
113
113
114
<br>
114
115
115
116
#### 3a. Approach 1: Run the workflow on a GeneLab Affymetrix Microarray dataset
116
117
117
118
```bash
118
-
nextflow run NF_MAAffymetrix_1.0.1/main.nf \
119
+
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
119
120
-profile singularity \
120
121
--osdAccession OSD-266 \
121
122
--gldsAccession GLDS-266
@@ -128,16 +129,28 @@ nextflow run NF_MAAffymetrix_1.0.1/main.nf \
128
129
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
129
130
130
131
```bash
131
-
nextflow run NF_MAAffymetrix_1.0.1/main.nf \
132
+
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
132
133
-profile singularity \
133
134
--runsheetPath </path/to/runsheet>
134
135
```
135
136
136
137
<br>
137
138
139
+
#### 3c. Approach 3: Run the workflow using an ISA Archive
140
+
141
+
> Note: Specifications for the ISA Tab Archive format can be found [here](https://isa-specs.readthedocs.io/en/latest/isatab.html).
142
+
143
+
```bash
144
+
nextflow run NF_MAAffymetrix_1.0.2/main.nf \
145
+
-profile singularity \
146
+
--isaArchivePath </path/to/isaArchive>
147
+
```
148
+
149
+
<br>
150
+
138
151
**Required Parameters For All Approaches:**
139
152
140
-
* `NF_MAAffymetrix_1.0.1/main.nf` - Instructs Nextflow to run the NF_MAAffymetrix workflow
153
+
* `NF_MAAffymetrix_1.0.2/main.nf` - Instructs Nextflow to run the NF_MAAffymetrix workflow
141
154
142
155
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
143
156
@@ -162,14 +175,14 @@ nextflow run NF_MAAffymetrix_1.0.1/main.nf \
162
175
163
176
* `--skipVV` - skip the automated V&V processes (Default: the automated V&V processes are active)
164
177
165
-
* `--outputDir` - specifies the directory to save the raw and processed data files (Default: files are saved in the launch directory)
178
+
* `--resultsDir` - specifies the output directory for all files produced by the workflow (Default: <OSD-NNN_GLDS-NNN> if OSD and GLDS accessions are specified. Otherwise, the workflow launch directory.)
166
179
167
180
<br>
168
181
169
182
All parameters listed above and additional optional arguments for the NF_MAAffymetrix workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
170
183
171
184
```bash
172
-
nextflow run NF_MAAffymetrix_1.0.1/main.nf --help
185
+
nextflow run NF_MAAffymetrix_1.0.2/main.nf --help
173
186
```
174
187
175
188
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -183,7 +196,7 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
183
196
All R code steps and output are rendered within a Quarto document yielding the following:
184
197
185
198
- Output:
186
-
- NF_MAAffymetrix_1.0.1.html (html report containing executed code and output including QA plots)
199
+
- NF_MAAffymetrix_1.0.2.html (html report containing executed code and output including QA plots)
187
200
188
201
189
202
The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:
Copy file name to clipboardExpand all lines: Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/bin/dp_tools__affymetrix/checks.py
Copy file name to clipboardExpand all lines: Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/config/default.config
+24-5Lines changed: 24 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -1,19 +1,38 @@
1
1
nextflow.enable.moduleBinaries = true
2
2
3
3
params {
4
-
/*
5
-
Parameters that MUST be supplied
4
+
5
+
/* Here GLDS and OSD accession are defined.
6
+
Default behaviour is as follows:
7
+
- If accessions are not set, then either runsheet or an ISA Archive MUST be supplied
8
+
- If both accessions are set:
9
+
- If runsheet and ISA archive are left unset, then the ISA archive will be fetched from the GeneLab API and runsheet generated from the runsheet.
10
+
- If either runsheet or ISA archive are set, they will be used but the output directory and tags will reflect the appropriate accessions. This is useful when processing from the OSDR but OSDR metadata is not ready as is.
11
+
- If both runsheet and ISA archive are set, the workflow will halt.
12
+
- If only one accession is set, then the workflow will halt.
13
+
6
14
*/
7
-
gldsAccession = null // GeneLab Data Accession Number, e.g. GLDS-104
8
-
osdAccession = null // OSD Data Accession Number, e.g. OSD-367
15
+
gldsAccession = "NOT_OSDR" // GeneLab Data Accession Number, e.g. GLDS-104
16
+
osdAccession = "NOT_OSDR" // OSD Data Accession Number, e.g. OSD-367
17
+
18
+
// Catch case where only one is set
19
+
if (params.gldsAccession != "NOT_OSDR" && params.osdAccession == "NOT_OSDR") {
20
+
println "ERROR: GLDS accession set but OSD accession is not set. Please set both or neither."
21
+
System.exit(1)
22
+
}
23
+
if (params.gldsAccession == "NOT_OSDR" && params.osdAccession != "NOT_OSDR") {
24
+
println "ERROR: OSD accession set but GLDS accession is not set. Please set both or neither."
25
+
System.exit(1)
26
+
}
9
27
28
+
resultsDir = (params.gldsAccession != "NOT_OSDR" && params.osdAccession != "NOT_OSDR") ? "./${params.osdAccession}_${params.gldsAccession}" : "." // the location for the output from the pipeline (also includes raw data and metadata)
10
29
11
30
/*
12
31
Parameters that CAN be overwritten
13
32
*/
14
33
runsheetPath = false
15
34
biomart_attribute = false // Must be supplied if runsheet 'Array design REF' column doesn't indicate it
16
-
outputDir = "." // the location for the output from the pipeline (also includes raw data and metadata)
35
+
isaArchivePath = false // Alternative to fetching the ISA archive for an associated OSD/GLDS dataset
17
36
publish_dir_mode = "link" // method for creating publish directory. Default here for hardlink
18
37
help = false // display help menu and exit workflow program
0 commit comments