Combine vmap, func.grad with tensordict #1167

Roy-Kid · 2025-01-07T23:49:01Z

Roy-Kid
Jan 7, 2025

Hi everyone,

I am trying to move my code to tensordict(td) and vmap along batch according to https://discuss.pytorch.org/t/how-to-apply-vmap-on-a-heterogeneous-tensor/214109/5. Since vmap not support batched tensordict, now I just use batchsize = 1.

Nevertheless, everything works super good until I need to rewrite derivative module. I have a td with distance between atoms and predict energy by model, and the I need to calculate force by derivating energy w.r.t distance.

Before I use torch.grad, very straightforward to use. But within vmap, it raises: element 0 of tensors does not require grad and does not have a grad_fn. I find Batchedtensor of energy has grad_fn, but "vmaped" tensor, the real tensor required_grad is false.

Apparently(?), we can not combine vmap with torch.grad, but torch.func.grad, according to https://discuss.pytorch.org/t/use-vmap-and-grad-to-calculate-gradients-for-one-layer-independently-for-each-input-in-batch/187556/2?u=roy-kid, and https://discuss.pytorch.org/t/simple-use-case-compete-per-sample-gradient-with-autograd/207317/4?u=roy-kid

Since torch.func.vmap can only specify the position of arguments, how can I derivate energy w.r.t distance. Here is my pesudo code:

inputs = td({'distance': tensor})
inputs = model(inputs)  # predict energy
dedx = torch.grad(inputs['energy'], inputs['distance'], torch.zeros(len(energy), 3))
# how to replace torch.grad with torch.vmap?

So thx for your help!

Also post in forum: https://discuss.pytorch.org/t/combine-vmap-func-grad-with-tensordict/215086?u=roy-kid

vmoens · 2025-01-09T18:31:02Z

vmoens
Jan 9, 2025
Collaborator

Hello!
Not 100% sure I understand the issue, but here's how I combine vmap and grad in tensordict:

from tensordict import TensorDict
from tensordict.nn import TensorDictModule as Mod
import torch

mod = Mod(lambda x: (x+1).mean(), in_keys=["a"], out_keys=["b"])

td = TensorDict(a=torch.randn(10, 11, requires_grad=True), batch_size=[10])

vmap_mod = torch.vmap(mod, (0,))
td_out = vmap_mod(td)
print(td_out)

grad = torch.func.grad(lambda td: mod(td)["b"])
print(grad(td[0])["a"])

vmap_grad = torch.vmap(grad, (0,))
td_out = vmap_grad(td)
print(td_out["a"])

So vmapping a grad works (gradding a vmap doesn't I believe)

Is there a version of this script that explains what you're trying to do and where it breaks?

3 replies

Roy-Kid Jan 10, 2025
Author

Hi, so thanks for your suggestion, again!

I played around with the following code, your explanation is what I want. I want to make sure if I can get argnums of func.grad by using list(td.keys()).index('one_key')? Also, I have some other two questions which seems not tensordict relative

import torch
from torch import nn
from tensordict import TensorDict


class AComplexModel(nn.Module):

    def __call__(self, td):
        print("call")
        energy = torch.square(td["dist"])  # per dist
        td["energy"] = energy
        return td

n_batch = 4
td = TensorDict(
    {"dist": torch.randint(0, 10, (n_batch, 10), dtype=torch.float32, requires_grad=True),
     "inrelative": torch.randint(0, 10, (n_batch, 10), dtype=torch.float32),},
    batch_size=n_batch,  # other tensor not involved in grad
)
model = AComplexModel()
vmap_model = torch.vmap(model)
td = vmap_model(td)
print(td["dist"])
print(td["energy"])
print(td["inrelative"])

# Q1: Can I get index of key of tensordict in this way,
#     and use it in `argnums`?
dist_index = list(td.keys()).index("dist")
print(dist_index)

# Q2: I only want to get `energy` tensor's grad
#     but `inrelative` is zeros-like tensor
#     is that means `inrelative` also is calculated grad w.r.t. `energy`
#     any performance impact?
# Q3: If I want to use model in traditional `backward` way in the training,
#      and also want to calculate a certain tensor's grad in this way,
#      Does that mean the forward pass should be done twice?
grad = torch.func.grad(lambda td: model(td)["energy"].sum(), argnums=dist_index)
grad_td = grad(td[0])
print(grad_td["dist"])
print(grad_td["energy"])
print(grad_td["inrelative"])

vmap_grad = torch.vmap(grad)
batched_grad_td = vmap_grad(td)
print(batched_grad_td["dist"])
print(batched_grad_td["energy"])
print(batched_grad_td["inrelative"])

Roy-Kid Jan 15, 2025
Author

Also, I find if there are some tensors which not need to derive in tensordict, it will raise an error. For example, I have some index in tensordict, and its dtype is int.....

from tensordict import TensorDict
import torch
from torch.func import grad
from functools import partial
from tensordict.nn import TensorDictModule

inputs = TensorDict(
    {
        'x': torch.tensor([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]], requires_grad=True),
        'y': torch.tensor([[1, 2, 3], [2, 3, 4]])  # No need to derive it
    },
    batch_size=[2, ]
)

td_mod = TensorDictModule(torch.nn.Linear(3, 1), in_keys=['x'], out_keys=['out'])
torch.vmap(td_mod)(inputs)
torch.func.grad(torch.vmap(td_mod), argnums=0)(inputs)  # only want x's gradient

vmoens Jan 17, 2025
Collaborator

Oh that's interesting! I guess we should prevent that call to .requires_grad_ under the hood or make grad aware of what input tensors we want to include in the graph!
I'll give it a thought

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Combine vmap, func.grad with tensordict #1167

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Combine vmap, func.grad with tensordict #1167

Uh oh!

Uh oh!

Roy-Kid Jan 7, 2025

Replies: 1 comment · 3 replies

Uh oh!

vmoens Jan 9, 2025 Collaborator

Uh oh!

Roy-Kid Jan 10, 2025 Author

Uh oh!

Uh oh!

Roy-Kid Jan 15, 2025 Author

Uh oh!

vmoens Jan 17, 2025 Collaborator

Roy-Kid
Jan 7, 2025

Replies: 1 comment 3 replies

vmoens
Jan 9, 2025
Collaborator

Roy-Kid Jan 10, 2025
Author

Roy-Kid Jan 15, 2025
Author

vmoens Jan 17, 2025
Collaborator