-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello!
Thanks a lot for making this tool available. I ran the demo and analyzed the sample dataset with no issues but when testing charcoal on my own dataset I am running into errors which seem to be caused by some genomes not downloading from GenBank correctly.
I am running this command:
python -m charcoal run zebrafish-test.conf -j 16
It fails with this error message:
Error in snakemake invocation: Command '['snakemake', '-s',
'/users/tg/Misc/Tool_testing/charcoal/charcoal/Snakefile', '--use-conda',
'-j', '1', '-j', '16', '--configfile', '/users/tg/Misc/Tool_testing/charcoal/charcoal/conf/defaults.conf',
'/users/tg/Misc/Tool_testing/charcoal/charcoal/conf/system.conf',
'zebrafish-test.conf']' returned non-zero exit status 1.
Which appears to be caused by a file not downloading from GenBank:
ERROR, skch::validateInputFile, Could not open genbank_genomes/GCF_002943105.1_genomic.fna.gz
[Thu Feb 16 11:58:47 2023]
Error in rule mashmap_compare:
jobid: 1373
output: output.zebrafish-test/stage2/MGYG000299400.fna.x.GCF_002943105.1.mashmap.align,
output.zebrafish-test/stage2/MGYG000299400.fna.x.GCF_002943105.1.mashmap.out
conda-env: /users/tg/Misc/Tool_testing/charcoal/.snakemake/conda/d01f2d1356a2c223e7b61208c452d8a0
shell:
mashmap -q zebrafish-genomes/MGYG000299400.fna -r
genbank_genomes/GCF_002943105.1_genomic.fna.gz -o
output.zebrafish-test/stage2/MGYG000299400.fna.x.GCF_002943105.1.mashmap.align
--pi 95 > output.zebrafish-test/stage2/MGYG000299400.fna.x.GCF_002943105.1.mashmap.out
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
I checked the genbank_genomes
folder, it did contain some genome files but this accession (GCF_002943105) was not there.
I manually downloaded this file from GenBank and reran the snakemake command. It failed twice again (on GCA_000798955.1_genomic.fna.gz and GCF_000820225.1_genomic.fna.gz) which I also then manually downloaded and reran the snakemake command. The workflow then failed on genome GCA_011046675.1, which has been suppressed in GenBank and isn't available (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/011/046/675/GCA_011046675.1_ASM1104667v1/assembly_status.txt).
This is what the error looked like for the suppressed genome:
Error in rule download_matching_genomes_one_by_one:
jobid: 0
output: genbank_genomes/GCA_011046675.1_genomic.fna.gz
RuleException:
HTTPError in line 465 of /users/tg/Misc/Tool_testing/charcoal/charcoal/Snakefile:
HTTP Error 404: Not Found
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2357, in run_wrapper
File "/users/tg/Misc/Tool_testing/charcoal/charcoal/Snakefile", line 465, in __rule_download_matching_genomes_one_by_one
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 214, in urlopen
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 523, in open
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 632, in http_response
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 561, in error
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 494, in _call_chain
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/urllib/request.py", line 641, in http_error_default
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 574, in _callback
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/concurrent/futures/thread.py", line 58, in run
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 560, in cached_or_run
File "/software/miniconda_py39/envs/charcoal/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 2390, in run_wrapper
Exiting because a job execution failed. Look above for error message
I tried to run charcoal on a small subset of my genomes (the ones that went through with no errors during this initial test) and that completed without errors and a report was generated successfully.