Skip to content

Commit f82edef

Browse files
committed
update
1 parent c1702cc commit f82edef

File tree

1 file changed

+28
-3
lines changed

1 file changed

+28
-3
lines changed

README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -369,7 +369,7 @@ submit_multi_adamw_one_embed.sh
369369
```
370370
| Model name | SST accuracy | QQP accuracy | STS correlation |
371371
| ------------------ |---------------- | -------------- | ---
372-
| Adam new base | 50,3 % | 86,4 % | 84,7 % |
372+
| BERT Baseline | 50,3 % | 86,4 % | 84,7 % |
373373

374374
### non-linear classifier
375375

@@ -380,7 +380,7 @@ submit_multi_adamw_add_layers.sh
380380

381381
| Model name | SST accuracy | QQP accuracy | STS correlation |
382382
| ------------------ |---------------- | -------------- | ---
383-
| Adam additional layer| 50% | 88,4% | 84,4 % |
383+
| BERT additional layer| 50% | 88,4% | 84,4 % |
384384

385385

386386
### freeze
@@ -392,7 +392,22 @@ das verbessert das ergebnis nochmal etwas ( dritte Zeile) (man muss scheinbar nu
392392

393393
| Model name | SST accuracy | QQP accuracy | STS correlation |
394394
| ------------------ |---------------- | -------------- | ---
395-
| Adam extra classifier training| 51,6% | 88,5% | 84,3 % |
395+
| BERT extra classifier training| 51,6% | 88,5% | 84,3 % |
396+
397+
### SMART
398+
399+
Using standard SMART parameters
400+
```
401+
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "adamw" --epochs 10 --one_embed True --add_layers True --comment adam_add_layers_one_embed_smart --smart --batch_size 64 --lr 1e-5
402+
```
403+
Tensorboard: Sep03_11-23-24_bert_smart
404+
405+
| Model name | SST accuracy | QQP accuracy | STS correlation |
406+
| ------------------ |---------------- | -------------- | ---------------- |
407+
| BERT-SMART | 51.6 % | 88.8 % | 43.8 % |
408+
409+
The bad sts correlation is because SMART uses MSE loss for its calculation of adverserial loss.
410+
We did not change it yet.
396411

397412
### Tuning SMART
398413
We did another Optuna SMART run for base BERT.
@@ -412,12 +427,18 @@ python -u optuna_smart.py --use_gpu --batch_size 50 --objective para --one_embed
412427
| BERT-QQP | 67.00 | 1.83e-7 | 0.0014 | 2.32e-6 | L2 |
413428
| sBERT-STS | 49.64 | 4.38e-7 | 0.0024 | 1.67e-5 | L2 |
414429
| BERT-STS | 27.29 | 6.65e-6 | 0.0002 | 7.84e-6 | L1 |
430+
431+
The bad sts correlation is because SMART uses MSE loss for its calculation of adverserial loss.
432+
We did not change it yet.
433+
415434
### Final model
416435
We combined some of our results in the final model.
417436

418437
```
419438
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "adamw" --epochs 10 --one_embed True --add_layers True --comment adam_add_layers_one_embed --batch_size 64 --lr 1e-5
420439
```
440+
Tensorboard: Sep03_11-24-17_bert_final
441+
421442
| Model name | SST accuracy | QQP accuracy | STS correlation |
422443
| ------------------ |---------------- | -------------- | ---------------- |
423444
| BERT-Final | 51.2 % | 88.8 % | 82.2 % |
@@ -477,6 +498,10 @@ Our model achieves the following performance:
477498
| sBERT-CenterMatrixSelfAttentionWithSparsemax | 39.1% | 75.6% | 40.4% |
478499
| sBERT-CenterMatrixLinearSelfAttention | 42.4% | 76.2% | 42.4% |
479500
| sBERT-CenterMatrixLinearSelfAttentionWithSparsemax | 39.7% | 76.4% | 39.2% |
501+
| BERT Baseline | 50,3 % | 86,4 % | 84,7 % |
502+
| BERT-SMART | 51.6 % | 88.8 % | 43.8 % |
503+
| BERT additional layer| 50% | 88,4% | 84,4 % |
504+
| BERT extra classifier training| 51,6% | 88,5% | 84,3 % |
480505
| BERT-Final | 51.2 % | 88.8 % | 82.2 % |
481506

482507
[Leaderboard](https://docs.google.com/spreadsheets/d/1Bq21J3AnxyHJ9Wb9Ik9OXvtX6O4L2UdVX9Y9sBg7v8M/edit#gid=0)

0 commit comments

Comments
 (0)