-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Hi, when I explored the cluster utilisation rate (number of running GPUs / total number of GPUs) based on the job start time, end time, and the number of GPUs for each job, I found that the maximum utilisation rate of the Kalos cluster is only around 70%, and there are lots of periods where less than 40% or even 20% of the total GPUs of the cluster are used, which is quite weird and is not the case for Seren. I also found that the Seren data has ~800k job records, while Kalos only has ~60k. Does this mean that not all jobs are recorded for Kalos, which further leads to the severe under-utilisation?
Sincerely appreciate it if you could help clarify this. Also thank you so much for sharing this fantastic dataset.
Metadata
Metadata
Assignees
Labels
No labels