-
Notifications
You must be signed in to change notification settings - Fork 15
Add indirect pointer to barrier support in hl.signal & hl.wait (as_ptrs) #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dd98a26
to
4c4b3ae
Compare
4eb10c4
to
dac9b8f
Compare
053923a
to
1a74563
Compare
bc6ea11
to
5348847
Compare
1a74563
to
a82b355
Compare
a82b355
to
9f46f56
Compare
) | ||
N = signal_pad_ptrs.size(0) | ||
for i in hl.grid(pad_shape): | ||
offset = i * 4 # number of btypes of each pointer (torch.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculating offset to pointers is kinda ugly here and it breaks the no-index calculation / pointer-handling abstraction. The tensor metadata (i.e. size, stride etc.) is lost when passed in as Tensor of data_ptrs, therefore offset calculation need to be explicit here for indirect barriers.
9f46f56
to
e646848
Compare
signal_pad_list = [ | ||
torch.ones(4, device=DEVICE, dtype=torch.int32) for _ in range(4) | ||
] | ||
signal_pad_ptrs = torch.as_tensor( | ||
[p.data_ptr() for p in signal_pad_list], device=DEVICE, dtype=torch.uint64 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the motivation for multiple signal pads here?
Could we have a multi-dimensional pad instead like:
signal_pad_list = torch.ones([4, 4], device=DEVICE, dtype=torch.int32)
then make pass in an index like (multicast_tile, i)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation is to implementation cross rank sync i.e. each barrier lives on a difference card, and we signal our peers and then wait for signal from our peers to make sure all cards have reached this kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See test/test_distributed in #245
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess the tensor of pointers is required in this case -- let's figure out a higher level abstraction to make this cleaner.
stack-info: PR: #261, branch: joydddd/stack/13
e646848
to
ab85dfe
Compare
Stacked PRs:
Add indirect pointer to barrier support in hl.signal & hl.wait (as_ptrs)