Skip to content

Masking issues during training on Aurora #213

@WardLT

Description

@WardLT

The first time I noticed these were after the frameworks updates, but I'm not sure if that's related

  File "/lus/flare/projects/MOFA/lward/mof-generation-at-scale/mofa/utils/src/lightning.py", line 206, in training_step
    delta_log_px, kl_prior, loss_term_t, loss_term_0, l2_loss, noise_t, noise_0 = self.forward(data, training=True)
  File "/lus/flare/projects/MOFA/lward/mof-generation-at-scale/mofa/utils/src/lightning.py", line 189, in forward
    utils.assert_partial_mean_zero_with_mask(x, node_mask, center_of_mass_mask)
  File "/lus/flare/projects/MOFA/lward/mof-generation-at-scale/mofa/utils/src/utils.py", line 91, in assert_partial_mean_zero_with_mask
    assert_correctly_masked(x, node_mask)
  File "/lus/flare/projects/MOFA/lward/mof-generation-at-scale/mofa/utils/src/utils.py", line 101, in assert_correctly_masked
    assert max_mask < 1e-4, f'Variables not masked properly: {max_mask}'
AssertionError: Variables not masked properly: nan

Metadata

Metadata

Assignees

No one assigned

    Labels

    difflinkerProblems relating to DiffLinker codebase

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions