Skip to content

per_sample_grads regression in 2.8 RC #158787

@mikaylagawarecki

Description

@mikaylagawarecki

🐛 Describe the bug

From pytorch/tutorials CI it appears in per_sample_gradients.py tutorial that vmap was regressed by the 2.8 RC see pytorch/tutorials#3460

I've bisected this to first happen in the 0613 nightly, see https://docs.google.com/document/d/1ngNO8qybkYoYoJnF_14K_pSsnL1zspRp8-MvyWP1b0Y/edit?tab=t.0 for more. Am currently bisecting to check which commit this first occurs in

With 2.7 the differences were

conv1.weight tensor(2.1958e-05, device='cuda:0')
conv1.bias tensor(1.8031e-05, device='cuda:0')
conv2.weight tensor(2.8105e-05, device='cuda:0')
conv2.bias tensor(2.0489e-08, device='cuda:0')
fc1.weight tensor(2.2352e-08, device='cuda:0')
fc1.bias tensor(2.0489e-08, device='cuda:0')
fc2.weight tensor(1.7881e-07, device='cuda:0')
fc2.bias tensor(5.9605e-08, device='cuda:0')

With 2.8 this becomes

conv1.weight tensor(0.0037, device='cuda:0')
conv1.bias tensor(0.0017, device='cuda:0')
conv2.weight tensor(0.0120, device='cuda:0')
conv2.bias tensor(0.0060, device='cuda:0')
fc1.weight tensor(0.0107, device='cuda:0')
fc1.bias tensor(0.0112, device='cuda:0')
fc2.weight tensor(0.0001, device='cuda:0')
fc2.bias tensor(5.2005e-06, device='cuda:0')

Versions

2.8 RC

cc @zou3519 @Chillee @samdow @kshitij12345

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions