@@ -369,7 +369,7 @@ submit_multi_adamw_one_embed.sh
369
369
```
370
370
| Model name | SST accuracy | QQP accuracy | STS correlation |
371
371
| ------------------ |---------------- | -------------- | ---
372
- | Adam new base | 50,3 % | 86,4 % | 84,7 % |
372
+ | BERT Baseline | 50,3 % | 86,4 % | 84,7 % |
373
373
374
374
### non-linear classifier
375
375
@@ -380,7 +380,7 @@ submit_multi_adamw_add_layers.sh
380
380
381
381
| Model name | SST accuracy | QQP accuracy | STS correlation |
382
382
| ------------------ |---------------- | -------------- | ---
383
- | Adam additional layer| 50% | 88,4% | 84,4 % |
383
+ | BERT additional layer| 50% | 88,4% | 84,4 % |
384
384
385
385
386
386
### freeze
@@ -392,7 +392,22 @@ das verbessert das ergebnis nochmal etwas ( dritte Zeile) (man muss scheinbar nu
392
392
393
393
| Model name | SST accuracy | QQP accuracy | STS correlation |
394
394
| ------------------ |---------------- | -------------- | ---
395
- | Adam extra classifier training| 51,6% | 88,5% | 84,3 % |
395
+ | BERT extra classifier training| 51,6% | 88,5% | 84,3 % |
396
+
397
+ ### SMART
398
+
399
+ Using standard SMART parameters
400
+ ```
401
+ python -u multitask_classifier.py --use_gpu --option finetune --optimizer "adamw" --epochs 10 --one_embed True --add_layers True --comment adam_add_layers_one_embed_smart --smart --batch_size 64 --lr 1e-5
402
+ ```
403
+ Tensorboard: Sep03_11-23-24_bert_smart
404
+
405
+ | Model name | SST accuracy | QQP accuracy | STS correlation |
406
+ | ------------------ | ---------------- | -------------- | ---------------- |
407
+ | BERT-SMART | 51.6 % | 88.8 % | 43.8 % |
408
+
409
+ The bad sts correlation is because SMART uses MSE loss for its calculation of adverserial loss.
410
+ We did not change it yet.
396
411
397
412
### Tuning SMART
398
413
We did another Optuna SMART run for base BERT.
@@ -412,12 +427,18 @@ python -u optuna_smart.py --use_gpu --batch_size 50 --objective para --one_embed
412
427
| BERT-QQP | 67.00 | 1.83e-7 | 0.0014 | 2.32e-6 | L2 |
413
428
| sBERT-STS | 49.64 | 4.38e-7 | 0.0024 | 1.67e-5 | L2 |
414
429
| BERT-STS | 27.29 | 6.65e-6 | 0.0002 | 7.84e-6 | L1 |
430
+
431
+ The bad sts correlation is because SMART uses MSE loss for its calculation of adverserial loss.
432
+ We did not change it yet.
433
+
415
434
### Final model
416
435
We combined some of our results in the final model.
417
436
418
437
```
419
438
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "adamw" --epochs 10 --one_embed True --add_layers True --comment adam_add_layers_one_embed --batch_size 64 --lr 1e-5
420
439
```
440
+ Tensorboard: Sep03_11-24-17_bert_final
441
+
421
442
| Model name | SST accuracy | QQP accuracy | STS correlation |
422
443
| ------------------ | ---------------- | -------------- | ---------------- |
423
444
| BERT-Final | 51.2 % | 88.8 % | 82.2 % |
@@ -477,6 +498,10 @@ Our model achieves the following performance:
477
498
| sBERT-CenterMatrixSelfAttentionWithSparsemax | 39.1% | 75.6% | 40.4% |
478
499
| sBERT-CenterMatrixLinearSelfAttention | 42.4% | 76.2% | 42.4% |
479
500
| sBERT-CenterMatrixLinearSelfAttentionWithSparsemax | 39.7% | 76.4% | 39.2% |
501
+ | BERT Baseline | 50,3 % | 86,4 % | 84,7 % |
502
+ | BERT-SMART | 51.6 % | 88.8 % | 43.8 % |
503
+ | BERT additional layer| 50% | 88,4% | 84,4 % |
504
+ | BERT extra classifier training| 51,6% | 88,5% | 84,3 % |
480
505
| BERT-Final | 51.2 % | 88.8 % | 82.2 % |
481
506
482
507
[ Leaderboard] ( https://docs.google.com/spreadsheets/d/1Bq21J3AnxyHJ9Wb9Ik9OXvtX6O4L2UdVX9Y9sBg7v8M/edit#gid=0 )
0 commit comments