Skip to content

Add pin_memory to DataLoader and update ImageInfo to support #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: sd3
Choose a base branch
from

Conversation

rockerBOO
Copy link
Contributor

Support using pin_memory with DataLoader. Updated ImageInfo to pin_memory for relevant tensors. Will probably need some testing but is disabled by default.

Host to GPU copies are much faster when they originate from pinned (page-locked) memory. See Use pinned memory buffers for more details on when and how to use pinned memory generally.

For data loading, passing pin_memory=True to a DataLoader will automatically put the fetched data Tensors in pinned memory, and thus enables faster data transfer to CUDA-enabled GPUs.

https://pytorch.org/docs/stable/data.html#memory-pinning
https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

@kohya-ss
Copy link
Owner

Thank you for this! I will check it as soon as possible.

I checked pin_memory before, and it caused a large increase in memory usage in a Windows environment. If you have already tried it in a Windows environment, did it work without any problems?

@rockerBOO
Copy link
Contributor Author

I don't have access to a Window environment to test. On Linux it doesn't seem to effect memory usage (or I'm not using it correctly). This would be off by default so if it does hamper Windows memory usage, we can add a note in the documentation.

It has been roughly 8-10% improvement to epoch speed but I haven't done enough testing. Larger batch sizes may have higher gains, running with 1 or 3 batch size on a 2080 seemed to be the same relative improvement.

I'm looking at another performance pass to try and find bottlenecks with epoch speed and GPU usage.

@rockerBOO rockerBOO marked this pull request as ready for review June 16, 2025 21:56
@rockerBOO
Copy link
Contributor Author

I have been using this for the past 6 months and has worked as expected. I added some tests.

@kohya-ss
Copy link
Owner

Thank you for update. Users can choose whether to use pin_memory or not, and I'm sure this PR will be useful for users who can use pin_memory. I'll merge it after checking.

@rockerBOO
Copy link
Contributor Author

huggingface/accelerate#2441 This PR is related to this other PR which added the DataLoaderConfig which was for Accelerator 1.0.0 which would require updating to that version. I can see if it could work on the current Accelerator version to not need to update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants