-
Notifications
You must be signed in to change notification settings - Fork 15
tensor[tile] when tile size is 1 returns a 1D tensor, instead of a scalar #275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
41f371d
to
bc6ea11
Compare
…alar stack-info: PR: #275, branch: joydddd/stack/15
bc6ea11
to
5348847
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test for the fix?
if fake_value.size(i) != 1: | ||
stride = state.device_function.tensor_stride(fake_value, i).name | ||
index_expr.append(f"{idx} * {stride}") | ||
stride = state.device_function.tensor_stride(fake_value, i).name | ||
index_expr.append(f"{idx} * {stride}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something like this:
N = x.size(0)
for tile in hl.tile(N):
x_tile = x[tile]
When block_size=1, the if statement evaluates to be False, so the indexing ignore the N dimension, and generate
x_tile = tl.load(tile + tl.zeros([1], ...))
I'll add a test case for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this checking the tensor size not the block size?
if block_size == 1: | ||
extra_body.append( | ||
statement_from_string( | ||
f"{index_var} = {offset_var} + tl.zeros([1], {dtype})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this do the same thing as arange? I'd expect we wuld need shape=[]
or even just offset_var directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this does the same thing. We don't need to make this change to fix tile indexing when block_size=1
.
However, why does grid_codegen
handle block_size == 1 differently with tl.zeros
instead tl.arange
?
The broadcasting behavior for size==1 tensors is intentional. We match numpy/pytorch broadcasting rules: https://numpy.org/devdocs/user/basics.broadcasting.html I added some more tests for this here, which we should make sure this doesn't break: #285 |
Stacked PRs:
tensor[tile] when tile size is 1 returns a 1D tensor, instead of a scalar