-
Notifications
You must be signed in to change notification settings - Fork 96
Description
New issue checks
- I have read the Dorado Documentation.
- I did not find an existing issue.
Dorado version
1.1.1+e72f1492
Dorado subcommand
Polish
The issue
Dorado polish can sometimes show inconsistent GPU utilisation when processing BAM files containing many short regions (e.g. in my case, >260,000 regions of ~1.5 kB for amplicon polishing) with inconsistent region coverage, or BAMs with inconsistent region lengths. I have mostly seen this on our cluster with a very fast graphics card but sometimes low I/O speeds. In this case, the infer threads seem to be faster than the encoder threads, with the trace log showing a queue size of 0: [consumer x] Popped data: item.samples.size() = 1, queue size: 0
. Increasing the number of threads for encoding doesn't seem to help either.
I've noticed that the CPU usage rises after the debug message [debug] Starting to encode regions for X windows using X threads
and then gradually drops until the next batch of windows is encoded. This indicates that the computational load is not evenly distributed among the encoders and that all encoders wait for an entire batch of windows to finish before pushing it into the infer queue and moving on to the next one. This cannot be fixed by providing more threads for encoding, as they all wait for the slowest thread to complete.
System specifications
Operating System: Ubuntu 22.04.4 LTS
CPU: AMD EPYC 7742 64-Core Processor (64 cores / 64 threads)
RAM: 120 GB
GPU: NVIDIA A100-SXM4-40GB / NVIDIA H100-80GB
Disk: Sometimes very slow