[checkpoint_io] Fix gather_state_dict_fast #6333
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚨 Issue number
fixed #6248
📝 What does this PR do?
This PR fixed 2 issues:
reqs = dist.batch_isend_irecv(ops)
it's not guaranteed that len(ops) == len(reqs) (see batch_isend_irecv implementation) so having a single for loop over zip(reqs, target_metadata) is incorrect⭐️ Do you enjoy contributing to Colossal-AI?
cc @ver217 @kwen2501