Skip to content

dorado crashes when calling sup modifications 4mC_5mC 6mA on 5090 #1507

@Kirk3gaard

Description

@Kirk3gaard

New issue checks

Dorado version

1.1.1

Dorado subcommand

Basecaller

The issue

Have experienced a few crashes now so here is the report. The basecaller starts up nicely and runs for quite a while before crashing. Nothing else is running on the computer.

[2025-09-26 13:50:19.756] [info] Running: "basecaller" "--recursive" "--device" "cuda:all" "sup" "./PBE71308/" "--modified-bases" "4mC_5mC" "6mA" "--resume-from" "PBE71308.dorado1.1.1.bm5.2.0_sup.sim.mod4mC_5mC_6mA.bam"
[2025-09-26 13:50:20.260] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v5.2.0 with httplib
[2025-09-26 13:50:22.560] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v5.2.0_4mC_5mC@v1 with httplib
[2025-09-26 13:50:23.348] [info] - downloading dna_r10.4.1_e8.2_400bps_sup@v5.2.0_6mA@v1 with httplib
[2025-09-26 13:50:24.185] [info] > Creating basecall pipeline
[2025-09-26 13:50:25.687] [info] Using CUDA devices:
[2025-09-26 13:50:25.687] [info] cuda:0 - NVIDIA GeForce RTX 5090
[2025-09-26 13:50:25.687] [info] cuda:1 - NVIDIA GeForce RTX 5090
[2025-09-26 13:50:27.018] [info] Calculating optimized batch size for GPU "NVIDIA GeForce RTX 5090" and model dna_r10.4.1_e8.2_400bps_sup@v5.2.0. Full benchmarking will run for this device, which may take some time.
[2025-09-26 13:50:27.104] [info] Calculating optimized batch size for GPU "NVIDIA GeForce RTX 5090" and model dna_r10.4.1_e8.2_400bps_sup@v5.2.0. Full benchmarking will run for this device, which may take some time.
[2025-09-26 13:50:28.452] [info] cuda:0 using chunk size 12288, batch size 224
[2025-09-26 13:50:28.452] [info] cuda:1 using chunk size 12288, batch size 288
[2025-09-26 13:50:28.603] [info] cuda:0 using chunk size 6144, batch size 448
[2025-09-26 13:50:28.638] [info] cuda:1 using chunk size 6144, batch size 288
[2025-09-26 13:50:28.848] [info] > Inspecting resume file...
[2025-09-26 13:50:28.854] [info] Resuming from file PBE71308.dorado1.1.1.bm5.2.0_sup.sim.mod4mC_5mC_6mA.bam...
[2025-09-26 14:48:59.831] [info] > 12231628 original read ids found in resume file.
Koi RMSNorm residual: failed to set smem size 184324h:16m:12s] Basecalling
[2025-09-27 00:51:55.231] [error] Koi tiled path failed 7
terminate called after throwing an instance of 'std::runtime_error'
what(): Koi convolution (host_window_ntwc_f16) failed with in size 16

System specifications

ubuntu 24,
2x RTX5090, AMD Ryzen Threadripper 7960X 24-Cores, 192 GB RAM, SSD NVME

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions