-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hello,
I got another two problems when I carried out experiments on Avazu dataset:
- When I do pretraining step, it always shows negative loss which is somehow strange even though it still decreases.
08/11 02:07:09 PM client generated: 2
08/11 02:07:09 PM Cross-Party Train Epoch 0, training on aligned data, LR: 0.1, sample: 16384
08/11 02:07:10 PM Cross-Party SSL Train Epoch 0, client loss aligned: [-0.16511965772951953, -0.152420010213973]
08/11 02:07:10 PM Local SSL Train Epoch 0, training on local data, sample: 80384
08/11 02:07:22 PM Local SSL Train Epoch 0, client loss local: [-0.5874887084815307, -0.5748279593279881]
08/11 02:07:22 PM Local SSL Train Epoch 0, AGG MODE pma, client loss agg: []
08/11 02:07:24 PM ###### Valid Epoch 0 Start #####
08/11 02:07:24 PM Valid Epoch 0, valid client loss aligned: [-0.3176240861415863, -0.22815129309892654]
08/11 02:07:24 PM Valid Epoch 0, valid client loss local: [-0.22939987406134604, -0.22190943509340286]
08/11 02:07:24 PM Valid Epoch 0, valid client loss regularized: [0.0, 0.0]
08/11 02:07:24 PM Valid Epoch 0, Loss_aligned -0.273 Loss_local -0.226
- when I do the finetune step, it shows error information below:
File "/data/nfs/user/liwg/vfl/fedhssl/FedHSSL/models/model_templates.py", line 206, in load_encoder_cross
self.encoder_cross.load_state_dict(torch.load(load_path, map_location=device))
File "/data/nfs/miniconda/envs/liwg/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DNNFM:
size mismatch for embedding_dict.device_ip.weight: copying a param with shape torch.Size([70769, 32]) from checkpoint, the shape in current model is torch.Size([70768, 32]).
size mismatch for embedding_dict.device_model.weight: copying a param with shape torch.Size([3066, 32]) from checkpoint, the shape in current model is torch.Size([3065, 32]).
size mismatch for embedding_dict.C14.weight: copying a param with shape torch.Size([1699, 32]) from checkpoint, the shape in current model is torch.Size([1698, 32]).
The pretrained encoder_cross weight is one dimension larger than the expected.