negative loss and mismatched dimension when load pretrained weights

Hello,

I got another two problems when I carried out experiments on Avazu dataset:

1. When I do pretraining step, it always shows negative loss which is somehow strange even though it still decreases.

08/11 02:07:09 PM client generated: 2
08/11 02:07:09 PM Cross-Party Train Epoch 0, training on aligned data, LR: 0.1, sample: 16384
08/11 02:07:10 PM Cross-Party SSL Train Epoch 0, client loss aligned: [-0.16511965772951953, -0.152420010213973]
08/11 02:07:10 PM Local SSL Train Epoch 0, training on local data, sample: 80384
08/11 02:07:22 PM Local SSL Train Epoch 0, client loss local: [-0.5874887084815307, -0.5748279593279881]
08/11 02:07:22 PM Local SSL Train Epoch 0, AGG MODE pma, client loss agg: []
08/11 02:07:24 PM ###### Valid Epoch 0 Start #####
08/11 02:07:24 PM Valid Epoch 0, valid client loss aligned: [-0.3176240861415863, -0.22815129309892654]
08/11 02:07:24 PM Valid Epoch 0, valid client loss local: [-0.22939987406134604, -0.22190943509340286]
08/11 02:07:24 PM Valid Epoch 0, valid client loss regularized: [0.0, 0.0]
08/11 02:07:24 PM Valid Epoch 0, Loss_aligned -0.273 Loss_local -0.226

2. when I do the finetune step, it shows error information below:

File "/data/nfs/user/liwg/vfl/fedhssl/FedHSSL/models/model_templates.py", line 206, in load_encoder_cross
    self.encoder_cross.load_state_dict(torch.load(load_path, map_location=device))
  File "/data/nfs/miniconda/envs/liwg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DNNFM:
        size mismatch for embedding_dict.device_ip.weight: copying a param with shape torch.Size([70769, 32]) from checkpoint, the shape in current model is torch.Size([70768, 32]).
        size mismatch for embedding_dict.device_model.weight: copying a param with shape torch.Size([3066, 32]) from checkpoint, the shape in current model is torch.Size([3065, 32]).
        size mismatch for embedding_dict.C14.weight: copying a param with shape torch.Size([1699, 32]) from checkpoint, the shape in current model is torch.Size([1698, 32]).

The pretrained encoder_cross weight is one dimension larger than the expected.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

negative loss and mismatched dimension when load pretrained weights #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

negative loss and mismatched dimension when load pretrained weights #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions