Skip to content

Commit e91d7a5

Browse files
committed
Fixed typos and no assemblies produced bug
1 parent 311800e commit e91d7a5

File tree

5 files changed

+60
-10
lines changed

5 files changed

+60
-10
lines changed

Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ nextflow run main.nf --help
115115
116116
<br>
117117
118-
#### 4a. Approach 1: Run slurm jobs in singularity containers with OSD accession as input
118+
#### 4a. Approach 1: Run slurm jobs in singularity containers with OSD or GLDS accession as input
119119
120120
```bash
121121
nextflow run main.nf -resume -profile slurm,singularity --accession OSD-574
@@ -195,30 +195,30 @@ Standard nextflow resource usage logs are also produced as follows:
195195
For options and detailed help on how to run the post-processing workflow, run the following command:
196196
197197
```bash
198-
nextflow run post_processng.nf --help
198+
nextflow run post_processing.nf --help
199199
```
200200
201201
To generate a README file, a protocols file, a md5sums table and a file association table after running the processing workflow sucessfully, modify and set the parameters in [post_processing.config](workflow_code/post_processing.config) then run the following command:
202202
203203
```bash
204-
nextflow -C post_processing.config run post_processng.nf -resume -profile slurm,singularity
204+
nextflow -C post_processing.config run post_processing.nf -resume -profile slurm,singularity
205205
```
206206
207207
The outputs of the run will be in a directory called `Post_Processing` by default and they are as follows:
208208
209-
- Post_processing/FastQC_Outputs/filtered_multiqc_GLmetagenomics_report.zip (Filtered sequence multiqc report with paths purged)
209+
- Post_processing/FastQC_Outputs/filtered_multiqc_GLmetagenomics_report.zip (Filtered sequence multiqc report with paths purged)
210210
211-
- Post_processing/FastQC_Outputs/raw_multiqc_GLmetagenomics_report.zip (Raw sequence multiqc report with paths purged)
211+
- Post_processing/FastQC_Outputs/raw_multiqc_GLmetagenomics_report.zip (Raw sequence multiqc report with paths purged)
212212
213-
- Post_processing/<GLDS_accession>_-associated-file-names.tsv (File association table for curation)
213+
- Post_processing/<GLDS_accession>_-associated-file-names.tsv (File association table for curation)
214214
215-
- Post_processing/<GLDS_accession>_metagenomics-validation.log (Automatic verification and validation log file)
215+
- Post_processing/<GLDS_accession>_metagenomics-validation.log (Automatic verification and validation log file)
216216
217-
- Post_processing/processed_md5sum_GLmetagenomics.tsv (md5sums for the files to be released on OSDR)
217+
- Post_processing/processed_md5sum_GLmetagenomics.tsv (md5sums for the files to be released on OSDR)
218218
219-
- Post_processing/processing_info_GLmetagenomics.zip (Zip file containing all files used to run the workflow and required logs with paths purged)
219+
- Post_processing/processing_info_GLmetagenomics.zip (Zip file containing all files used to run the workflow and required logs with paths purged)
220220
221-
- Post_processing/protocol.txt (File describing the methods used by the workflow)
221+
- Post_processing/protocol.txt (File describing the methods used by the workflow)
222222
223223
- Post_processing/README_GLmetagenomics.txt (README file listing and describing the outputs of the workflow)
224224
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
#!/usr/bin/env bash
2+
3+
# Generate protocol according to a pipeline document
4+
5+
# USAGE:
6+
# generate_protocol.sh <software_versions> <protocol_id>
7+
# EXAMPLE
8+
# generate_protocol.sh ../Metadata/software_versions.txt GL-DPPD-7107-A
9+
10+
FASTQC=`grep -i 'fastqc' $1 | awk '{print $2}' |sed -E 's/v//'`
11+
MULTIQC=`grep -i 'multiqc' $1 | awk '{print $3}'`
12+
BBMAP=`grep -i 'bbtools' $1 | awk '{print $2}'`
13+
HUMANN=`grep -i 'humann' $1 | awk '{print $2}'|sed -E 's/v//'`
14+
MEGAHIT=`grep -i 'megahit' $1 | awk '{print $2}'|sed -E 's/v//'`
15+
PRODIGAL=`grep -i 'prodigal' $1 | awk '{print $2}'|sed -E 's/[vV:]//g'`
16+
CAT=`grep 'CAT' $1 | awk '{print $2}'|sed -E 's/v//'`
17+
KOFAMSCAN=`grep 'exec_annotation' $1 | awk '{print $2}'`
18+
BOWTIE2=`grep -i 'bowtie' $1 | awk '{print $3}'`
19+
SAMTOOLS=`grep -i 'samtools' $1 | awk '{print $2}'`
20+
METABAT2=`grep -i 'metabat' $1 | awk '{print $2}'`
21+
BIT=`grep -i 'bioinformatics tools' $1 | awk '{print $3}' | sed 's/v//' | sed -E 's/.+([0-9]+.[0-9]+.[0-9]+).+/\1/'`
22+
CHECKM=`grep -i 'checkm' $1 | awk '{print $2}' |sed -E 's/v//'`
23+
GTDBTK=`grep -i '^GTDB' $1 | awk '{print $2}' |sed -E 's/v//' | head -n2` # If 2 versions are used, choose the second
24+
25+
PROTOCOL_ID=$2
26+
27+
PROTOCOL="Data were processed as described in ${PROTOCOL_ID} (https://github.com/nasa/GeneLab_Data_Processing/blob/master/Metagenomics/Illumina/Pipeline_GL-DPPD-7107_Versions/${PROTOCOL_ID}.md), using workflow NF_MGIllumina v1.0.0 (https://github.com/nasa/GeneLab_Data_Processing/tree/NF_MGIllumina_1.0.0/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina). \
28+
In breif, quality assessment of reads was performed with FastQC v${FASTQC} and reports were summarized with MultiQC v${MULTIQC}. \
29+
Quality trimming and filtering were performed with bbmap v${BBMAP}. Read-based processing was performed with humann3 v${HUMANN}. \
30+
Individual samples were assembled with megahit v${MEGAHIT}. Genes were called with prodigal v${PRODIGAL}. \
31+
Taxonomic classification of genes and contigs was performed with CAT v${CAT}. Functional annotation was done with KOFamScan v${KOFAMSCAN}. \
32+
Reads were mapped to assemblies with bowtie2 v${BOWTIE2} and coverage information was extracted for reads and contigs with samtools v${SAMTOOLS} and bbmap v${BBMAP}. \
33+
Binning of contigs was performed with metabat2 v${METABAT2}. Bins were summarized with bit v${BIT} and estimates of quality were generated with checkm v${CHECKM}. \
34+
High-quality bins (> 90% est. completeness and < 10% est. redundancy) were taxonomically classified with gtdb-tk v${GTDBTK}."
35+
36+
echo ${PROTOCOL}

Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/modules/assembly.nf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ process RENAME_HEADERS {
6161
output:
6262
tuple val(sample_id), path("${sample_id}-assembly.fasta"), emit: contigs
6363
path("versions.txt"), emit: version
64+
path("Failed-assemblies.tsv"), optional: true, emit: failed_assembly
6465
script:
6566
"""
6667
bit-rename-fasta-headers -i ${assembly} \\

Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/modules/assembly_based_processing.nf

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@ workflow assembly_based {
4848
sample_id, assembly -> file("${assembly}")
4949
}.collect()
5050
SUMMARIZE_ASSEMBLIES(assemblies_ch)
51+
52+
// Write failed assemblies to a Failed assemblies file
53+
failed_assemblies = RENAME_HEADERS.out.failed_assembly
54+
failed_assemblies
55+
.map{ it.text }
56+
.collectFile(name: "${params.assemblies_dir}/Failed-assemblies.tsv", cache: false)
5157

5258
// Map reads to assembly
5359
MAPPING(assembly_ch.join(filtered_ch))

Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/nextflow.config

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,13 @@ process {
327327
publishDir = [path: params.logs_dir, pattern: "*-assembly.log", mode: params.publishDir_mode]
328328
}
329329

330+
withName: RENAME_HEADERS{
331+
332+
publishDir = [path: params.assemblies_dir, pattern: "*-assembly.fasta" , mode: params.publishDir_mode]
333+
334+
}
335+
336+
330337
withLabel: mapping {
331338
conda = {params.conda.mapping != null ? params.conda.mapping : "envs/mapping.yaml"}
332339
cpus = 8

0 commit comments

Comments
 (0)