Create dataset without additional copies of data (2.6 - Data sampling with a sliding window) #745
Unanswered
labdmitriy
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @rasbt,
The current implementation of
GPTDatasetV1
usesappend
method of the Python list after converting chunk from list to tensor.Because torch.tensor() always copies data, then, as I understand, we will use additional RAM/VRAM for
input_ids
andtarget_ids
lists construction:How do you think, maybe we can use
unfold
method of PyTorch tensor to always get the view of the original tensor to get all required data, or this implementation has any disadvantages:Thank you.
Beta Was this translation helpful? Give feedback.
All reactions