Replies: 1 comment
-
Sorry for bumping, just wondering if this is something being considered. Leveraging the new sharding API, solving this would theoretically allow one to efficiently hoist iterative solutions to larger-than-memory problems into XLA. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm looking to process a moderately large amount of data (fits in host RAM but not GPU memory). As a result, a scan seemed like a promising approach, processing batchwise to limit the amount of data resident in GPU RAM at any given time. However, I noticed that with a scan operation, all data is immediately transferred to the device instead of on-demand.
A minimal example:
Is possible currently, or are there plans in the future, to signal to XLA that it should be a bit less eager in the host->memory transfer?
Beta Was this translation helpful? Give feedback.
All reactions