@@ -29,8 +29,8 @@ Lauren Sanders (OSDR Project Scientist)
29
29
- samtools
30
30
- CAT
31
31
- GTDB-Tk
32
- - HUMAnN3
33
- - MetaPhIAn3
32
+ - HUMAnN
33
+ - MetaPhlAn
34
34
- In [ step 14d] ( #14d-mag-taxonomic-classification ) , MAG taxonomic classification, added the new ` --skip_ani_screen ` argument to ` gtdbtk classify_wf ` to continue classifying genomes as in previous versions of GTDB-Tk, using mash and skani.
35
35
36
36
---
@@ -68,19 +68,19 @@ Lauren Sanders (OSDR Project Scientist)
68
68
| FastQC| 0.12.1 | [ https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ] ( https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ ) |
69
69
| MultiQC| 1.19 | [ https://multiqc.info/ ] ( https://multiqc.info/ ) |
70
70
| bbduk| 38.86 | [ https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/ ] ( https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/ ) |
71
- | megahit | 1.2.9 | [ https://github.com/voutcn/megahit#megahit ] ( https://github.com/voutcn/megahit#megahit ) |
71
+ | MEGAHIT | 1.2.9 | [ https://github.com/voutcn/megahit#megahit ] ( https://github.com/voutcn/megahit#megahit ) |
72
72
| bit| 1.8.53 | [ https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit ] ( https://github.com/AstrobioMike/bioinf_tools#bioinformatics-tools-bit ) |
73
73
| bowtie2| 2.4.1 | [ https://github.com/BenLangmead/bowtie2#overview ] ( https://github.com/BenLangmead/bowtie2#overview ) |
74
74
| samtools| 1.20 | [ https://github.com/samtools/samtools#samtools ] ( https://github.com/samtools/samtools#samtools ) |
75
- | prodigal | 2.6.3 | [ https://github.com/hyattpd/Prodigal#prodigal ] ( https://github.com/hyattpd/Prodigal#prodigal ) |
75
+ | Prodigal | 2.6.3 | [ https://github.com/hyattpd/Prodigal#prodigal ] ( https://github.com/hyattpd/Prodigal#prodigal ) |
76
76
| KOFamScan| 1.3.0 | [ https://github.com/takaram/kofam_scan#kofamscan ] ( https://github.com/takaram/kofam_scan#kofamscan ) |
77
77
| CAT| 5.2.3 | [ https://github.com/dutilh/CAT#cat-and-bat ] ( https://github.com/dutilh/CAT#cat-and-bat ) |
78
- | Metabat2 | 2.15 | [ https://bitbucket.org/berkeleylab/metabat/src/master/ ] ( https://bitbucket.org/berkeleylab/metabat/src/master/ ) |
79
- | checkm | 1.1.3 | [ https://github.com/Ecogenomics/CheckM ] ( https://github.com/Ecogenomics/CheckM ) |
78
+ | MetaBAT | 2.15 | [ https://bitbucket.org/berkeleylab/metabat/src/master/ ] ( https://bitbucket.org/berkeleylab/metabat/src/master/ ) |
79
+ | CheckM | 1.1.3 | [ https://github.com/Ecogenomics/CheckM ] ( https://github.com/Ecogenomics/CheckM ) |
80
80
| GTDB-Tk| 2.4.0 | [ https://github.com/Ecogenomics/GTDBTk ] ( https://github.com/Ecogenomics/GTDBTk ) |
81
- |KEGGDecoder | 1.2.2 |[ https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder ] ( https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder )
82
- | HUMAnN3 | 3.9 | [ https://huttenhower.sph.harvard.edu/humann3/ ] ( https://huttenhower.sph.harvard.edu/humann3/ ) |
83
- | MetaPhlAn3 | 4.1.0 | [ https://github.com/biobakery/MetaPhlAn/tree/3.0 ] ( https://github.com/biobakery/MetaPhlAn/tree/3.0 ) |
81
+ |KEGG-Decoder | 1.2.2 |[ https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder ] ( https://github.com/bjtully/BioData/tree/master/KEGGDecoder#kegg-decoder )
82
+ | HUMAnN | 3.9 | [ https://github.com/biobakery/humann ] ( https://github.com/biobakery/humann ) |
83
+ | MetaPhlAn | 4.1.0 | [ https://github.com/biobakery/MetaPhlAn ] ( https://github.com/biobakery/MetaPhlAn ) |
84
84
85
85
---
86
86
@@ -113,7 +113,7 @@ fastqc -o raw_fastqc_output *raw.fastq.gz
113
113
#### 1a. Compile Raw Data QC
114
114
115
115
```
116
- multiqc -o raw_multiqc_output -n raw_multiqc -z raw_fastqc_output/
116
+ multiqc -o raw_multiqc_output -n raw_multiqc raw_fastqc_output/
117
117
# this is how it's packaged with our workflow outputs
118
118
zip -r raw_multiqc_GLmetagenomics_report.zip raw_multiqc_output
119
119
```
@@ -122,7 +122,6 @@ zip -r raw_multiqc_GLmetagenomics_report.zip raw_multiqc_output
122
122
123
123
* ` -o ` – the output directory to store results
124
124
* ` -n ` – the filename prefix of results
125
- * ` -z ` – specifies to zip the output data directory
126
125
* ` raw_fastqc_output/ ` – the directory holding the output data from the fastqc run, provided as a positional argument
127
126
128
127
** Input data:**
@@ -175,7 +174,7 @@ bbduk.sh in=sample-1-R1-raw.fastq.gz in2=sample-1-R2-raw.fastq.gz out1=sample-1_
175
174
176
175
* ` maxns ` – sets the maximum number of Ns allowed in a read before it will be filtered out
177
176
178
- * ` swift ` – tells the program to look for and trim low-complexity adaptase reminants from the Swift1S kit
177
+ * ` swift ` – tells the program to look for and trim low-complexity adaptase reminants from the Swift1S kit
179
178
180
179
* ` > bbduk.log 2>&1 ` – redirects the stderr and stdout to a log file for saving
181
180
@@ -214,7 +213,7 @@ fastqc -o filtered_fastqc_output/ *filtered.fastq.gz
214
213
215
214
#### 3a. Compile Filtered/Trimmed Data QC
216
215
```
217
- multiqc -o filtered_multiqc_output -n filtered_multiqc -z filtered_fastqc_output/
216
+ multiqc -o filtered_multiqc_output -n filtered_multiqc filtered_fastqc_output/
218
217
# this is how it's packaged with our workflow outputs
219
218
zip -r filtered_multiqc_GLmetagenomics_report.zip filtered_multiqc_output
220
219
```
@@ -223,7 +222,6 @@ zip -r filtered_multiqc_GLmetagenomics_report.zip filtered_multiqc_output
223
222
224
223
* ` -o ` – the output directory to store results
225
224
* ` -n ` – the filename prefix of results
226
- * ` -z ` – specifies to zip the output data directory
227
225
* ` filtered_fastqc_output/ ` – the directory holding the output data from the fastqc run, provided as a positional argument
228
226
229
227
** Input data:**
@@ -244,7 +242,7 @@ zip -r filtered_multiqc_GLmetagenomics_report.zip filtered_multiqc_output
244
242
### 4. Sample assembly
245
243
```
246
244
megahit -1 sample-1_R1_filtered.fastq.gz -2 sample-1_R2_filtered.fastq.gz \
247
- -o sample-1-assembly -t 10 --min-contig-length 500 > sample-1-assembly.log 2>&1
245
+ -o sample-1-assembly -t NumberOfThreads --min-contig-length 500 > sample-1-assembly.log 2>&1
248
246
```
249
247
250
248
** Parameter Definitions:**
@@ -587,8 +585,8 @@ bowtie2-build sample-1-assembly.fasta sample-1-assembly-bt-index
587
585
588
586
#### 9b. Performing mapping, conversion to bam, and sorting
589
587
```
590
- bowtie2 --threads 15 -x sample-1-assembly-bt-index -1 sample-1_R1_filtered.fastq.gz \
591
- -2 sample-1_R2_filtered.fastq.gz 2> sample-1-mapping-info.txt | samtools view -b | samtools sort -@ 15 > sample-1.bam
588
+ bowtie2 --threads NumberOfThreads -x sample-1-assembly-bt-index -1 sample-1_R1_filtered.fastq.gz \
589
+ -2 sample-1_R2_filtered.fastq.gz 2> sample-1-mapping-info.txt | samtools view -b | samtools sort -@ NumberOfThreads > sample-1.bam
592
590
```
593
591
594
592
** Parameter Definitions:**
@@ -609,7 +607,7 @@ bowtie2 --threads 15 -x sample-1-assembly-bt-index -1 sample-1_R1_filtered.fastq
609
607
610
608
#### 9c. Indexing
611
609
```
612
- samtools index -@ 15 sample-1.bam
610
+ samtools index -@ NumberOfThreads sample-1.bam
613
611
```
614
612
615
613
** Parameter Definitions:**
@@ -787,7 +785,7 @@ bit-GL-combine-KO-and-tax-tables *-gene-coverage-annotation-and-tax.tsv -o Combi
787
785
```
788
786
jgi_summarize_bam_contig_depths --outputDepth sample-1-metabat-assembly-depth.tsv --percentIdentity 97 --minContigLength 1000 --minContigDepth 1.0 --referenceFasta sample-1-assembly.fasta sample-1.bam
789
787
790
- metabat2 --inFile sample-1-assembly.fasta --outFile sample-1 --abdFile sample-1-metabat-assembly-depth.tsv -t 4
788
+ metabat2 --inFile sample-1-assembly.fasta --outFile sample-1 --abdFile sample-1-metabat-assembly-depth.tsv -t NumberOfThreads
791
789
792
790
mkdir sample-1-bins
793
791
mv sample-1*bin*.fasta sample-1-bins
@@ -979,7 +977,7 @@ KEGG-decoder -v interactive -i MAG-level-KO-annotations_GLmetagenomics.tsv -o MA
979
977
980
978
## Read-based processing
981
979
### 16. Taxonomic and functional profiling
982
- The following uses the ` humann3 ` and ` metaphlan3 ` reference databases downloaded on 26-Sept-2020 as follows:
980
+ The following uses the ` humann ` and ` metaphlan ` reference databases downloaded on 13-Jun-2024 as follows:
983
981
984
982
``` bash
985
983
humann_databases --download chocophlan full
@@ -988,12 +986,12 @@ humann_databases --download utility_mapping full
988
986
metaphlan --install
989
987
```
990
988
991
- #### 16a. Running humann3 (which also runs metaphlan3 )
989
+ #### 16a. Running humann (which also runs metaphlan )
992
990
``` bash
993
991
# forward and reverse reads need to be provided combined if paired-end (if not paired-end, single-end reads are provided to the --input argument next)
994
992
cat sample-1_R1_filtered.fastq.gz sample-1_R2_filtered.fastq.gz > sample-1-combined.fastq.gz
995
993
996
- humann --input sample-1-combined.fastq.gz --output sample-1-humann3-out-dir --threads 15 \
994
+ humann --input sample-1-combined.fastq.gz --output sample-1-humann3-out-dir --threads NumberOfThreads \
997
995
--output-basename sample-1 --metaphlan-options " --unknown_estimation --add_viruses \
998
996
--sample_id sample-1"
999
997
```
0 commit comments