Is there any way to enable multiple threads when adding files? #7296
Replies: 3 comments 18 replies
-
You can set the number of threads for hashing with the option |
Beta Was this translation helpful? Give feedback.
-
I have 4 CPU cores. According to the docs, it should be 2 by default, so there is already more than 1 thread? However, when I checked In top, DVC always took ~ 100% CPU. |
Beta Was this translation helpful? Give feedback.
-
@FrostyLeaf could you please run Also would be great if you could run https://github.com/iterative/dvc/wiki/Debugging,-Profiling-and-Benchmarking-DVC on a smaller subset of files to see why is it happening. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
We found DVC could be insanely slow when adding lots of small files (100k files, each one has roughly 150KiB size) since it calc hash and transfer files with only one thread.
As you can see, it took more than 40 minutes to calc hash and transfer files, and eventually... failed.
We also tested this dataset with Git LFS, it only took 5m10s to finish adding all of the image files. As Git itself, Git LFS always tends to use multiple threads to handle heavy jobs.
Beta Was this translation helpful? Give feedback.
All reactions