-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Got this error while using this library to train an embedding model:
File "/usr/local/lib/python3.8/dist-packages/angle_emb/angle.py", line 986, in on_epoch_end
corrcoef, accuracy = self.evaluate_fn(self.valid_ds)
File "/usr/local/lib/python3.8/dist-packages/angle_emb/angle.py", line 1470, in evaluate
pred = (x_vecs[::2] * x_vecs[1::2]).sum(1)
ValueError: operands could not be broadcast together with shapes (38,384) (37,384)
I confirmed that valid_ds
and train_ds
were of even length, so ultimately I just modified one line of the evaluate method of the AnglE class. After this line:
x_vecs = l2_normalize(x_vecs)
I added:
if len(x_vecs) % 2 != 0:
x_vecs = x_vecs[:-1]
Hopefully that doesn't break anything/everything else? Any thougths on what else might be the source of the issue?
Also, I attempted to restart training by running the same angle.fit()
as I did when I started it but adjusting the from_pretrained
to point to the most recent checkpoint:
angle = AnglE.from_pretrained('/checkpoint-1100', max_length=512, pooling_strategy='cls').cuda()
I don't see a resume_from_checkpoint=True
argument option anywhere... so it's not clear that it's aware of how many epochs have already been run etc.