Problems with reproducing zero-shot learning results

I tried replicating results for zero-shot learning on CLS, but my results don't match those from the paper. Since the script for predicting labels with LASER seems not be a part of Multifit repository I trained LASER on the CLS dataset (only en and de books for now) by adjusting the MLDoc script from LASER repo to CLS. My fork of LASER with these adjustment is [here]h(ttps://github.com/blazejdolicki/LASER). For the time being I only tested on books in German. After some hyperparameter tuning performed on English training set, my best setup obtains 82.25% accuracy compared to 84.15% from the Multifit paper. My hyperparams are:
 
  n_epochs=200
  lr=0.001
  wd=0.0
  nhid="10 8"
  drop=0.2
  seed=1
  bsize=12

and I'm using the last 10% of the test set as validation.
When I tried to make them more similar to Multifit (n_epochs=8, wd=0.001,bsize=18), the accuracy dropped to around 60%. 

Afterwards, I used the best (82.25% acc) LASER classifier (trained on English training set) to predict labels for German books. Then I copied test, training and unsupervised sets in Multifit repo from folder de-books into de-books-laser and replaced ground truth labels in training set with pseudolabels. Afterwards I trained the Multifit classifier on those pseudolabels and while my validation accuracy isn't great but at least similar, my test set accuracy is as low as 70% (compared to 89.60 from the paper and [here](https://github.com/n-waves/multifit/blob/ulmfit-multilingual-original-scripts/results/logs/cls/zeroshoot.md)) as you can see in the attached logs.
[Multifit CLS zero shot terrible results 15.04.2020.txt](https://github.com/n-waves/multifit/files/4488280/Multifit.CLS.zero.shot.terrible.results.15.04.2020.txt)

I did expect some drop due to the issue explained in https://github.com/n-waves/multifit/issues/63, but such big difference shows that the unsupervised set size can't be the only factor deteriorating the results. Other possible reason of the drop in performance that come to my mind are:
* I used different hyperparameters for training and predicting LASER pseudolabels?
* I used different train-dev split for training and predicting LASER pseudolabels?
* your script was loading the LASER model with fastai library and training the classifier with it instead of Pytorch ?

My fork of mutlifit is [here](https://github.com/blazejdolicki/multifit), I'm using the ulmfit-original-scripts branch. 

I would really appreciate a reply :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems with reproducing zero-shot learning results #67

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems with reproducing zero-shot learning results #67

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions