Skip to content

Commit 1c0bf53

Browse files
authored
Merge pull request #104 from sanger-tol/dev
Release 0.5
2 parents 3ba2b04 + 544c135 commit 1c0bf53

35 files changed

+510
-146
lines changed

.github/workflows/branch.yml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
name: nf-core branch protection
2-
# This workflow is triggered on PRs to master branch on the repository
3-
# It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev`
2+
# This workflow is triggered on PRs to main branch on the repository
3+
# It fails when someone tries to make a PR against the nf-core `main` branch instead of `dev`
44
on:
55
pull_request_target:
6-
branches: [master]
6+
branches: [main]
77

88
jobs:
99
test:
1010
runs-on: ubuntu-latest
1111
steps:
12-
# PRs to the nf-core repo master branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
12+
# PRs to the nf-core repo main branch are only ok if coming from the nf-core repo `dev` or any `patch` branches
1313
- name: Check PRs
1414
if: github.repository == 'sanger-tol/blobtoolkit'
1515
run: |
@@ -22,7 +22,7 @@ jobs:
2222
uses: mshick/add-pr-comment@v1
2323
with:
2424
message: |
25-
## This PR is against the `master` branch :x:
25+
## This PR is against the `main` branch :x:
2626
2727
* Do not close this PR
2828
* Click _Edit_ and change the `base` to `dev`
@@ -32,9 +32,9 @@ jobs:
3232
3333
Hi @${{ github.event.pull_request.user.login }},
3434
35-
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
36-
The `master` branch on nf-core repositories should always contain code from the latest release.
37-
Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
35+
It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `main` branch.
36+
The `main` branch on nf-core repositories should always contain code from the latest release.
37+
Because of this, PRs to `main` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
3838
3939
You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
4040
Note that even after this, the test will continue to show as failing until you push a new commit.

.github/workflows/sanger_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ jobs:
1717
with:
1818
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
1919
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
20-
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
20+
compute_env: ${{ secrets.TOWER_COMPUTE_ENV_LARGE }}
2121
revision: ${{ env.REVISION }}
2222
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
2323
parameters: |

.github/workflows/sanger_test_full.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
name: sanger-tol LSF full size tests
22

33
on:
4+
push:
5+
branches:
6+
- main
7+
- dev
48
workflow_dispatch:
59
jobs:
610
run-tower:
@@ -22,7 +26,7 @@ jobs:
2226
with:
2327
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
2428
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
25-
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
29+
compute_env: ${{ secrets.TOWER_COMPUTE_ENV_LARGE }}
2630
revision: ${{ env.REVISION }}
2731
workdir: ${{ secrets.TOWER_WORKDIR_PARENT }}/work/${{ github.repository }}/work-${{ env.REVISION }}
2832
parameters: |

.nf-core.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ lint:
1818
- .github/ISSUE_TEMPLATE/bug_report.yml
1919
- .github/PULL_REQUEST_TEMPLATE.md
2020
- .github/workflows/linting.yml
21+
- .github/workflows/branch.yml
2122
multiqc_config:
2223
- report_comment
2324
nextflow_config:

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,30 @@
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
44
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
55

6+
## [[0.5.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.0)] – Snorlax – [2024-07-31]
7+
8+
General tidy up of the configuration and the pipeline
9+
10+
### Enhancements & fixes
11+
12+
- Increased the resources for blastn
13+
- Removed some options that were not used or not needed
14+
- All relevant outputs are now copied to the output directory
15+
- Fixed some blast parameters to match the behaviour of the Snakemake pipeline
16+
- Fixed parsing of samplesheets from fetchngs to capture correct data type
17+
18+
### Parameters
19+
20+
| Old parameter | New parameter |
21+
| --------------- | ------------- |
22+
| --taxa_file | |
23+
| --blastp_outext | |
24+
| --blastp_cols | |
25+
| --blastx_outext | |
26+
| --blastx_cols | |
27+
28+
> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.
29+
630
## [[0.4.0](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.4.0)] – Buneary – [2024-04-17]
731

832
The pipeline has now been validated on dozens of genomes, up to 11 Gbp.

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ It takes a samplesheet of BAM/CRAM/FASTQ/FASTA files as input, calculates genome
2020
4. Run BUSCO ([`busco`](https://busco.ezlab.org/))
2121
5. Extract BUSCO genes ([`blobtoolkit/extractbuscos`](https://github.com/blobtoolkit/blobtoolkit))
2222
6. Run Diamond BLASTp against extracted BUSCO genes ([`diamond/blastp`](https://github.com/bbuchfink/diamond))
23-
7. Run BLASTn against extracted BUSCO genes ([`blast/blastn`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
24-
8. Run BLASTx against extracted BUSCO genes ([`blast/blastx`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
23+
7. Run BLASTx against sequences with no hit ([`blast/blastn`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
24+
8. Run BLASTn against sequences still with not hit ([`blast/blastx`](https://www.ncbi.nlm.nih.gov/books/NBK131777/))
2525
9. Count BUSCO genes ([`blobtoolkit/countbuscos`](https://github.com/blobtoolkit/blobtoolkit))
2626
10. Generate combined sequence stats across various window sizes ([`blobtoolkit/windowstats`](https://github.com/blobtoolkit/blobtoolkit))
2727
11. Imports analysis results into a BlobDir dataset ([`blobtoolkit/blobdir`](https://github.com/blobtoolkit/blobtoolkit))

conf/base.config

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,18 @@ process {
104104
time = { check_max( 3.h * Math.ceil(meta.genome_size/1000000000) * task.attempt, 'time') }
105105
}
106106

107+
withName: "BLAST_BLASTN" {
108+
109+
// There are blast failures we don't know how to fix. Just ignore for now
110+
errorStrategy = { task.exitStatus in ((130..145) + 104) ? (task.attempt == process.maxRetries ? 'ignore' : 'retry') : 'finish' }
111+
112+
// Most jobs complete quickly but some need a lot longer. For those outliers,
113+
// the CPU usage remains usually low, often nearing a single CPU
114+
cpus = { check_max( 6 - (task.attempt-1), 'cpus' ) }
115+
memory = { check_max( 1.GB * Math.pow(4, task.attempt-1), 'memory' ) }
116+
time = { check_max( 10.h * Math.pow(4, task.attempt-1), 'time' ) }
117+
}
118+
107119
withName:CUSTOM_DUMPSOFTWAREVERSIONS {
108120
cache = false
109121
}

conf/modules.config

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,14 @@ process {
4848
ext.args = { "-ax map-ont -I" + Math.ceil(meta2.genome_size/1e9) + 'G' }
4949
}
5050

51+
withName: "MINIMAP2_.*" {
52+
publishDir = [
53+
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
54+
mode: params.publish_dir_mode,
55+
saveAs: { filename -> filename.equals("versions.yml") ? null : filename }
56+
]
57+
}
58+
5159
withName: "SAMTOOLS_VIEW" {
5260
ext.args = "--output-fmt bam --write-index"
5361
}
@@ -60,6 +68,22 @@ process {
6068
ext.args = "--lineage --busco"
6169
}
6270

71+
withName: "PIGZ_COMPRESS" {
72+
publishDir = [
73+
path: { "${params.outdir}/base_content" },
74+
mode: params.publish_dir_mode,
75+
saveAs: { filename -> filename.equals("versions.yml") ? null : filename.minus("fw_out/") }
76+
]
77+
}
78+
79+
withName: "BLOBTK_DEPTH" {
80+
publishDir = [
81+
path: { "${params.outdir}/read_mapping/${meta.datatype}" },
82+
mode: params.publish_dir_mode,
83+
saveAs: { filename -> filename.equals("versions.yml") ? null : "${meta.id}.coverage.1k.bed.gz" }
84+
]
85+
}
86+
6387
withName: "BUSCO" {
6488
scratch = true
6589
ext.args = { 'test' in workflow.profile.tokenize(',') ?
@@ -114,7 +138,7 @@ process {
114138
}
115139

116140
withName: "BLAST_BLASTN" {
117-
ext.args = "-outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
141+
ext.args = "-task megablast -outfmt '6 qseqid staxids bitscore std' -max_target_seqs 10 -max_hsps 1 -evalue 1.0e-10 -lcase_masking -dust '20 64 1'"
118142
}
119143

120144
withName: "CUSTOM_DUMPSOFTWAREVERSIONS" {

docs/output.md

Lines changed: 55 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
1515
- [BlobDir](#blobdir) - Output files viewable on a [BlobToolKit viewer](https://github.com/blobtoolkit/blobtoolkit)
1616
- [Static plots](#static-plots) - Static versions of the BlobToolKit plots
1717
- [BUSCO](#busco) - BUSCO results
18+
- [Read alignments](#read-alignments) - Aligned reads (optional)
19+
- [Read coverage](#read-coverage) - Read coverage tracks
20+
- [Base content](#base-content) - _k_-mer statistics (for k &le; 4)
1821
- [MultiQC](#multiqc) - Aggregate report describing results from the whole pipeline
1922
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
2023

@@ -26,8 +29,8 @@ The files in the BlobDir dataset which is used to create the online interactive
2629
<summary>Output files</summary>
2730

2831
- `blobtoolkit/`
29-
- `<accession>/`
30-
- `*.json.gz`: files generated from genome and alignment coverage statistics
32+
- `<assembly-name>/`
33+
- `*.json.gz`: files generated from genome and alignment coverage statistics.
3134

3235
More information about visualising the data in the [BlobToolKit repository](https://github.com/blobtoolkit/blobtoolkit/tree/main/src/viewer)
3336

@@ -53,12 +56,56 @@ BUSCO results generated by the pipeline (all BUSCO lineages that match the claas
5356
<details markdown="1">
5457
<summary>Output files</summary>
5558

56-
- `blobtoolkit/`
57-
- `busco/`
58-
- `*.batch_summary.txt`: BUSCO scores as tab-separated files (1 file per lineage).
59-
- `*.fasta.txt`: BUSCO scores as formatted text (1 file per lineage).
60-
- `*.json`: BUSCO scores as JSON (1 file per lineage).
61-
- `*/`: all output BUSCO files, including the coordinate and sequence files of the annotated genes.
59+
- `busco/`
60+
- `<lineage-name>/`
61+
- `short_summary.json`: BUSCO scores for that lineage as a tab-separated file.
62+
- `short_summary.tsv`: BUSCO scores for that lineage as JSON.
63+
- `short_summary.txt`: BUSCO scores for that lineage as formatted text.
64+
- `full_table.tsv`: Coordinates of the annotated BUSCO genes as a tab-separated file.
65+
- `missing_busco_list.tsv`: List of the BUSCO genes that could not be found.
66+
- `*_busco_sequences.tar.gz`: Sequences of the annotated BUSCO genes. 1 _tar_ archive for each of the three annotation levels (`single_copy`, `multi_copy`, `fragmented`), with 1 file per gene.
67+
- `hmmer_output.tar.gz`: Archive of the HMMER alignment scores.
68+
69+
</details>
70+
71+
### Read alignments
72+
73+
Read alignments in BAM format -- only if the pipeline is run with `--align`.
74+
75+
<details markdown="1">
76+
<summary>Output files</summary>
77+
78+
- `read_mapping/`
79+
- `<datatype>/`
80+
- `<sample>.bam`: alignments of that sample's reads in BAM format.
81+
82+
</details>
83+
84+
### Read coverage
85+
86+
Read coverage statistics as computed by the pipeline.
87+
Those files are the raw data used to build the BlobDir.
88+
89+
<details markdown="1">
90+
<summary>Output files</summary>
91+
92+
- `read_mapping/`
93+
- `<datatype>/`
94+
- `<sample>.coverage.1k.bed.gz`: Bedgraph file with the coverage of the alignments of that sample per 1 kbp windows.
95+
96+
</details>
97+
98+
### Base content
99+
100+
_k_-mer statistics.
101+
Those files are the raw data used to build the BlobDir.
102+
103+
<details markdown="1">
104+
<summary>Output files</summary>
105+
106+
- `base_content/`
107+
- `<assembly-name>_*nuc_windows.tsv.gz`: Tab-separated files with the counts of every _k_-mer for k &le; 4 in 1 kbp windows. The first three columns correspond to the coordinates (sequence name, start, end), followed by each _k_-mer.
108+
- `<assembly-name>_freq_windows.tsv.gz`: Tab-separated files with frequencies derived from the _k_-mer counts.
62109

63110
</details>
64111

modules.json

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,12 +30,14 @@
3030
"diamond/blastp": {
3131
"branch": "master",
3232
"git_sha": "b29f6beb86d1d24d680277fb1a3f4de7b8b8a92c",
33-
"installed_by": ["modules"]
33+
"installed_by": ["modules"],
34+
"patch": "modules/nf-core/diamond/blastp/diamond-blastp.diff"
3435
},
3536
"diamond/blastx": {
3637
"branch": "master",
3738
"git_sha": "b29f6beb86d1d24d680277fb1a3f4de7b8b8a92c",
38-
"installed_by": ["modules"]
39+
"installed_by": ["modules"],
40+
"patch": "modules/nf-core/diamond/blastx/diamond-blastx.diff"
3941
},
4042
"fastawindows": {
4143
"branch": "master",
@@ -64,6 +66,11 @@
6466
"git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
6567
"installed_by": ["modules"]
6668
},
69+
"pigz/compress": {
70+
"branch": "master",
71+
"git_sha": "0eab94fc1e48703c1b0a8704bd665f554905c39d",
72+
"installed_by": ["modules"]
73+
},
6774
"samtools/fasta": {
6875
"branch": "master",
6976
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",

0 commit comments

Comments
 (0)