Skip to content

Autograd Warning for FBGEMM Operators During Training #294

@mia1460

Description

@mia1460

Hi team,
Thank you for your work on this project! While running the model using the following command from the repository:

CUDA_VISIBLE_DEVICES=0 python3 main.py --gin_config_file=configs/ml-1m/hstu-sampled-softmax-n128-large-final.gin --master_port=12345

I get the following warning from PyTorch:

Skipping init for ....
/path/to/python3.10/site-packages/torch/autograd/graph.py:824: UserWarning: 
fbgemm::dense_to_jagged: an autograd kernel was not registered to the Autograd key(s) 
but we are trying to backprop through it. This may lead to silently incorrect behavior. 
This behavior is deprecated and will be removed in a future version of PyTorch. 
If your operator is differentiable, please ensure you have registered an autograd kernel 
to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). 
If your operator is not differentiable, or to squash this warning and use the previous behavior, 
please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd.

I also get a similar warning for fbgemm::jagged_to_padded_dense.

Environment

  • Python version: 3.10
  • OS: Ubuntu 22.04

After starting the training command, I notice that the script shows no progress information for about 20 minutes (no logs or console output). The only output I see is the warning above. I am not sure whether this is expected behavior or a sign of a potential bottleneck (e.g. data loading, model initialization, or blocking). Any guidance on whether this is expected — or suggestions to add logging to track progress — would be very helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions