Skip to content

Commit 8383bb5

Browse files
Merge pull request #124 from cyouh95/DEV_NF_MAAgilent_1ch
NF_MAAgilent1ch: Update workflow version from 1.0.3 to 1.0.4
2 parents 2e6d5bd + 20f9e54 commit 8383bb5

File tree

12 files changed

+138
-24
lines changed

12 files changed

+138
-24
lines changed

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.0.4](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAgilent1ch_1.0.4/Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch) - 2024-10-02
9+
10+
### Added
11+
12+
- Add automatic generation of processed data protocol ([#85](https://github.com/nasa/GeneLab_Data_Processing/issues/85))
13+
14+
### Changed
15+
16+
- Small bug fixes in `Agile1CMP.qmd`
17+
- Check if `getBM()` returned results before concatenating it to dataframe to avoid error in `bind_rows()` ([#96](https://github.com/nasa/GeneLab_Data_Processing/issues/96))
18+
- When renaming column names, specify which columns to rename to avoid unintentional renaming ([#97](https://github.com/nasa/GeneLab_Data_Processing/issues/97))
19+
- When renaming factor names, prevent cases where a factor is partially renamed because it contains a substring that is another factor ([#100](https://github.com/nasa/GeneLab_Data_Processing/issues/100))
20+
- Update software table generation to exclude `R.utils` from table if data files are not compressed ([#99](https://github.com/nasa/GeneLab_Data_Processing/issues/99))
21+
822
## [1.0.3](https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAgilent1ch_1.0.3/Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch) - 2024-05-17
923

1024
### Changed

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,9 @@ We recommend installing Singularity on a system wide level as per the associated
9393
All files required for utilizing the NF_MAAgilent1ch GeneLab workflow for processing Agilent 1 Channel Microarray data are in the [workflow_code](workflow_code) directory. To get a copy of latest NF_MAAgilent1ch version on to your system, the code can be downloaded as a zip file from the release page then unzipped after downloading by running the following commands:
9494
9595
```bash
96-
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_MAAgilent1ch_1.0.3/NF_MAAgilent1ch_1.0.3.zip
96+
wget https://github.com/nasa/GeneLab_Data_Processing/releases/download/NF_MAAgilent1ch_1.0.4/NF_MAAgilent1ch_1.0.4.zip
9797
98-
unzip NF_MAAgilent1ch_1.0.3.zip
98+
unzip NF_MAAgilent1ch_1.0.4.zip
9999
```
100100
101101
<br>
@@ -104,15 +104,15 @@ unzip NF_MAAgilent1ch_1.0.3.zip
104104
105105
### 3. Run the Workflow
106106
107-
While in the location containing the `NF_MAAgilent1ch_1.0.3` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAgilent1ch workflow:
107+
While in the location containing the `NF_MAAgilent1ch_1.0.4` directory that was downloaded in [step 2](#2-download-the-workflow-files), you are now able to run the workflow. Below are three examples of how to run the NF_MAAgilent1ch workflow:
108108
> Note: Nextflow commands use both single hyphen arguments (e.g. -help) that denote general nextflow arguments and double hyphen arguments (e.g. --ensemblVersion) that denote workflow specific parameters. Take care to use the proper number of hyphens for each argument.
109109
110110
<br>
111111
112112
#### 3a. Approach 1: Run the workflow on a GeneLab Agilent 1 Channel Microarray dataset
113113
114114
```bash
115-
nextflow run NF_MAAgilent1ch_1.0.3/main.nf \
115+
nextflow run NF_MAAgilent1ch_1.0.4/main.nf \
116116
-profile singularity \
117117
--osdAccession OSD-548 \
118118
--gldsAccession GLDS-548
@@ -125,7 +125,7 @@ nextflow run NF_MAAgilent1ch_1.0.3/main.nf \
125125
> Note: Specifications for creating a runsheet manually are described [here](examples/runsheet/README.md).
126126
127127
```bash
128-
nextflow run NF_MAAgilent1ch_1.0.3/main.nf \
128+
nextflow run NF_MAAgilent1ch_1.0.4/main.nf \
129129
-profile singularity \
130130
--runsheetPath </path/to/runsheet>
131131
```
@@ -134,7 +134,7 @@ nextflow run NF_MAAgilent1ch_1.0.3/main.nf \
134134
135135
**Required Parameters For All Approaches:**
136136
137-
* `NF_MAAgilent1ch_1.0.3/main.nf` - Instructs Nextflow to run the NF_MAAgilent1ch workflow
137+
* `NF_MAAgilent1ch_1.0.4/main.nf` - Instructs Nextflow to run the NF_MAAgilent1ch workflow
138138
139139
* `-profile` - Specifies the configuration profile(s) to load, `singularity` instructs Nextflow to setup and use singularity for all software called in the workflow
140140
@@ -166,7 +166,7 @@ nextflow run NF_MAAgilent1ch_1.0.3/main.nf \
166166
All parameters listed above and additional optional arguments for the NF_MAAgilent1ch workflow, including debug related options that may not be immediately useful for most users, can be viewed by running the following command:
167167
168168
```bash
169-
nextflow run NF_MAAgilent1ch_1.0.3/main.nf --help
169+
nextflow run NF_MAAgilent1ch_1.0.4/main.nf --help
170170
```
171171
172172
See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nextflow.io/docs/latest/cli.html#run) for more options and details common to all nextflow workflows.
@@ -180,7 +180,7 @@ See `nextflow run -h` and [Nextflow's CLI run command documentation](https://nex
180180
All R code steps and output are rendered within a Quarto document yielding the following:
181181
182182
- Output:
183-
- NF_MAAgilent1ch_1.0.3.html (html report containing executed code and output including QA plots)
183+
- NF_MAAgilent1ch_1.0.4.html (html report containing executed code and output including QA plots)
184184
185185
186186
The outputs from the Analysis Staging and V&V Pipeline Subworkflows are described below:

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/bin/Agile1CMP.qmd

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: "Agilent 1 Channel Processing"
3-
subtitle: "Workflow Version: NF_MAAgilent1ch_1.0.3"
3+
subtitle: "Workflow Version: NF_MAAgilent1ch_1.0.4"
44
date: now
55
title-block-banner: true
66
format:
@@ -530,7 +530,10 @@ if (organism %in% c("athaliana")) {
530530
values = probe_id_chunk,
531531
mart = ensembl)
532532
533-
df_mapping <- df_mapping %>% dplyr::bind_rows(chunk_results)
533+
if (nrow(chunk_results) > 0) {
534+
df_mapping <- df_mapping %>% dplyr::bind_rows(chunk_results)
535+
}
536+
534537
Sys.sleep(10) # Slight break between requests to prevent back-to-back requests
535538
}
536539
}
@@ -712,7 +715,7 @@ reformat_names <- function(colname, group_name_mapping) {
712715
stringr::str_replace(pattern = ".condition", replacement = "v")
713716
714717
# remap to group names before make.names was applied
715-
unique_group_name_mapping <- unique(group_name_mapping)
718+
unique_group_name_mapping <- unique(group_name_mapping) %>% arrange(-nchar(safe_name))
716719
for ( i in seq(nrow(unique_group_name_mapping)) ) {
717720
safe_name <- unique_group_name_mapping[i,]$safe_name
718721
original_name <- unique_group_name_mapping[i,]$original_name
@@ -722,7 +725,7 @@ reformat_names <- function(colname, group_name_mapping) {
722725
return(new_colname)
723726
}
724727
725-
df_interim <- df_interim %>% dplyr::rename_with( reformat_names, group_name_mapping = design_data$mapping )
728+
df_interim <- df_interim %>% dplyr::rename_with(reformat_names, .cols = matches('\\.condition|^Genes\\.'), group_name_mapping = design_data$mapping)
726729
727730
728731
# Concatenate expression values for each sample

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/bin/dp_tools__agilent_1_channel/config.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,9 @@ Staging:
7575
Sample name is used as a unique sample identifier during processing
7676
Example: Atha_Col-0_Root_WT_Ctrl_45min_Rep1_GSM502538
7777

78-
- ISA Field Name: Label
78+
- ISA Field Name:
79+
- Label
80+
- Parameter Value[label]
7981
ISA Table Source: Sample
8082
Runsheet Column Name: Label
8183
Processing Usage: >-

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/main.nf

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -97,13 +97,11 @@ workflow {
9797
ch_software_versions = Channel.value(nf_version)
9898
AGILE1CH.out.versions | map{ it -> it.text } | mix(ch_software_versions) | set{ch_software_versions}
9999
VV_AGILE1CH.out.versions | map{ it -> it.text } | mix(ch_software_versions) | set{ch_software_versions}
100-
ch_software_versions | unique
101-
| collectFile(
102-
newLine: true,
103-
sort: true,
104-
cache: false
105-
)
106-
| GENERATE_SOFTWARE_TABLE
100+
101+
GENERATE_SOFTWARE_TABLE(
102+
ch_software_versions | unique | collectFile(newLine: true, sort: true, cache: false),
103+
ch_runsheet | splitCsv(header: true, quote: '"') | first | map{ row -> row['Array Data File Name'] }
104+
)
107105

108106
emit:
109107
meta = ch_meta

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/modules/GENERATE_SOFTWARE_TABLE/main.nf

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@ process GENERATE_SOFTWARE_TABLE {
55

66
input:
77
path("software_versions.yaml")
8+
val(filename)
89

910
output:
1011
path("software_versions_GLmicroarray.md")
1112

1213
script:
1314
"""
14-
SoftwareYamlToMarkdownTable.py software_versions.yaml
15+
SoftwareYamlToMarkdownTable.py software_versions.yaml \"$filename\"
1516
"""
1617
}

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/modules/GENERATE_SOFTWARE_TABLE/resources/usr/bin/SoftwareYamlToMarkdownTable.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,19 @@
4141

4242
@click.command()
4343
@click.argument("input_yaml", type=click.Path(exists=True))
44-
def yamlToMarkdown(input_yaml: Path):
44+
@click.argument("filename")
45+
def yamlToMarkdown(input_yaml: Path, filename: str):
4546
""" Using a software versions """
4647
with open(input_yaml, "r") as f:
4748
data = yaml.safe_load(f)
4849

4950
data.extend(ASSUMED_SOFTWARE)
5051
df = pd.DataFrame(data)
5152

53+
# If data files are not compressed, won't use R.utils to unzip them during processing
54+
if not filename.endswith('.gz'):
55+
AGILENT_SOFTWARE_DPPD.remove('r.utils')
56+
5257
# Filter to direct software used (i.e. exclude dependencies of the software)
5358
df = df.loc[df["name"].str.lower().isin(AGILENT_SOFTWARE_DPPD)]
5459

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
process GENERATE_PROTOCOL {
2+
tag "${ params.gldsAccession }"
3+
publishDir "${ params.outputDir }/${ params.gldsAccession }/GeneLab",
4+
mode: params.publish_dir_mode
5+
6+
input:
7+
path("software_versions_GLmicroarray.md")
8+
val(organism)
9+
10+
output:
11+
path("PROTOCOL_GLmicroarray.txt")
12+
13+
script:
14+
"""
15+
generate_protocol.sh $workflow.manifest.version \"$organism\"
16+
"""
17+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
#!/bin/bash
2+
set -u
3+
4+
software_versions_file="software_versions_GLmicroarray.md"
5+
6+
# Read the markdown table
7+
while read -r line; do
8+
# Extract program, version, and link
9+
program=$(echo "$line" | awk -F'|' '{gsub(/^[[:blank:]]+|[[:blank:]]+$/,"",$1); print $1}')
10+
version=$(echo "$line" | awk -F'|' '{gsub(/^[[:blank:]]+|[[:blank:]]+$/,"",$2); print $2}')
11+
12+
# Skip the header row and rows without version information
13+
if [[ $program != "Program" && $version != "Version" && ! -z $version ]]; then
14+
# Replace invalid characters in program name with underscores
15+
sanitized_program=$(echo "$program" | tr -cd '[:alnum:]_')
16+
17+
# Create environment variable name
18+
env_var_name="${sanitized_program}_VERSION"
19+
20+
# Set the environment variable
21+
export "$env_var_name=$version"
22+
fi
23+
done < <(sed -n '/|/p' "$software_versions_file" | sed 's/^ *|//;s/|$//')
24+
25+
# Print the extracted versions
26+
env | grep "_VERSION"
27+
28+
# Get organism
29+
organism=$2
30+
31+
# List of organisms
32+
organism_list=("Homo sapiens" "Mus musculus" "Rattus norvegicus" "Drosophila melanogaster" "Caenorhabditis elegans" "Danio rerio" "Saccharomyces cerevisiae")
33+
34+
# Check the value of 'organism' variable and set 'GENE_MAPPING_STEP' accordingly
35+
if [[ $organism == "Arabidopsis thaliana" ]]; then
36+
GENE_MAPPING_STEP="Ensembl gene ID mappings were retrieved for each probe using the Plants Ensembl database ftp server (plants.ensembl.org, release 54)."
37+
elif [[ " ${organism_list[*]} " == *"${organism//\"/}"* ]]; then
38+
GENE_MAPPING_STEP="Ensembl gene ID mappings were retrieved for each probe using biomaRt (version ${biomaRt_VERSION}), Ensembl database (ensembl.org, release 107)."
39+
else
40+
GENE_MAPPING_STEP="TBD"
41+
fi
42+
43+
# Check the value of 'organism' variable and set 'GENE_ANNOTATION_DB' accordingly
44+
if [[ $organism == "Arabidopsis thaliana" ]]; then
45+
GENE_ANNOTATION_DB="org.At.tair.db"
46+
elif [[ $organism == "Homo sapiens" ]]; then
47+
GENE_ANNOTATION_DB="org.Hs.eg.db"
48+
elif [[ $organism == "Mus musculus" ]]; then
49+
GENE_ANNOTATION_DB="org.Mm.eg.db"
50+
elif [[ $organism == "Rattus norvegicus" ]]; then
51+
GENE_ANNOTATION_DB="org.Rn.eg.db"
52+
elif [[ $organism == "Drosophila melanogaster" ]]; then
53+
GENE_ANNOTATION_DB="org.Dm.eg.db"
54+
elif [[ $organism == "Caenorhabditis elegans" ]]; then
55+
GENE_ANNOTATION_DB="org.Ce.eg.db"
56+
elif [[ $organism == "Danio rerio" ]]; then
57+
GENE_ANNOTATION_DB="org.Dr.eg.db"
58+
elif [[ $organism == "Saccharomyces cerevisiae" ]]; then
59+
GENE_ANNOTATION_DB="org.Sc.sgd.db"
60+
else
61+
GENE_ANNOTATION_DB="TBD"
62+
fi
63+
64+
# Read the template file
65+
template="Data were processed as described in GL-DPPD-7112 ([https://github.com/nasa/GeneLab_Data_Processing/blob/master/Microarray/Agilent_1-channel/Pipeline_GL-DPPD-7112_Versions/GL-DPPD-7112.md]), using NF_MAAgilent1ch version $1 ([https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MAAgilent1ch_$1/Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch]). In short, a RunSheet containing raw data file location and processing metadata from the study's *ISA.zip file was generated using dp_tools (version ${dp_tools_VERSION}). The raw array data files were loaded into R (version ${R_VERSION}) using limma (version ${limma_VERSION}). Raw data quality assurance density, pseudo image, MA, and foreground-background plots were generated using limma (version ${limma_VERSION}), and boxplots were generated using ggplot2 (version ${ggplot2_VERSION}). The raw intensity data was background corrected and normalized across arrays via the limma (version ${limma_VERSION}) quantile method. Normalized data quality assurance density, pseudo image, and MA plots were generated using limma (version ${limma_VERSION}), and boxplots were generated using ggplot2 (version ${ggplot2_VERSION}). ${GENE_MAPPING_STEP} Differential expression analysis was performed in R (version ${R_VERSION}) using limma (version ${limma_VERSION}); all groups were compared pairwise for each probe to generate a moderated t-statistic and associated p- and adjusted p-value. Gene annotations were assigned using the custom annotation tables generated in-house as detailed in GL-DPPD-7110 ([https://github.com/nasa/GeneLab_Data_Processing/blob/GL_RefAnnotTable_1.0.0/GeneLab_Reference_Annotations/Pipeline_GL-DPPD-7110_Versions/GL-DPPD-7110/GL-DPPD-7110.md]), with STRINGdb (version 2.8.4), PANTHER.db (version 1.0.11), and ${GENE_ANNOTATION_DB} (version 3.15.0)."
66+
67+
# Output the filled template
68+
echo "$template" > PROTOCOL_GLmicroarray.txt

Microarray/Agilent_1-channel/Workflow_Documentation/NF_MAAgilent1ch/workflow_code/nextflow.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ manifest {
4545
mainScript = 'main.nf'
4646
defaultBranch = 'main'
4747
nextflowVersion = '>=23.10.1'
48-
version = '1.0.3'
48+
version = '1.0.4'
4949
}
5050

5151
def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss')

0 commit comments

Comments
 (0)