Skip to content

Commit 773bce1

Browse files
committed
added contribution
1 parent 7f73d88 commit 773bce1

File tree

1 file changed

+11
-39
lines changed

1 file changed

+11
-39
lines changed

README.md

Lines changed: 11 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -206,30 +206,12 @@ Those two adjustments of the datasets worked out and improved the performance on
206206
The following results were obtained:
207207
| Model name | SST accuracy | QQP accuracy | STS correlation |
208208
| ------------------ |---------------- | -------------- | ---
209-
| Sophia Tuned standard lr | 47,6 % | 78,8% | 36,7 % |
209+
| Sophia Tuned standard lr | 78,8 % | 47,6% | 36,7 % |
210210
| Sophia balanced data | 81,8 % | 47,8% | 45,5% |
211211

212-
Use the same command as in the Tuning Sophia section and add the argument ```--para_sep True --weights True``` for reproducing the results.
213-
214-
That approach could improve the performance on the paraphrasing task by ... but we lost a few percentage points on the other task. So we conclude, on the on hand training on the QQP dataset first actually helps to gain more information from this huge dataset but on the other hand the three tasks seem to conflict each other.
215-
216-
#### Tackle imbalanced data
217-
The distribution of the different classes in the SST dataset is not equal (class one contains over two times more samples than class zero). As we see in the confusion matrix of our model, which was trained as in the previous section, many datapoints from class 0 are falsely predicted to be in class one (same problem with classes five and four).
218-
219-
<img src="confusion_matrix.png" alt="alt text" width="300" height="300">
220-
221-
To balance the QQP and SST trainset we add weights to our Cross-Entropy loss function such that a training sample from a small class is assigned with an higher weight. This resulted in the following performance:
222-
| Model name | SST accuracy | QQP accuracy | STS correlation |
223-
| ------------------ |---------------- | -------------- | -------------- |
224-
| Sophia Baseline (Finn) | 45% | 77,8% | 32 % |
225-
| Sophia balanced data | 47,8 % | 81,8% | 45,5% |
226-
227-
Use the same command as in the previous section and add the argument ```--para_sep True --weights True``` for reproducing the results.
228-
229-
With this approach we could improve the performance on the SST dataset compared to the last section by ... .
212+
Use the same command as in the Tuning Sophia section (with standard learning rate and no dropout) and add the argument ```--para_sep True --weights True``` for reproducing the results.
230213

231214
### AdamW
232-
...
233215

234216
#### Additional layers
235217
Another problem we earlier observed was that the task contradict each other, i.e. in separating QQP training the paraphrasing accuracy increased but the other to accuracies decreased. We try to solve these conflicts by adding a simple neural network with one hidden layer as classifier for each task instead of only a linear classifier. The idea is that each task gets more parameters to adjust which are not influenced by the other tasks. As activation function in the neuronal network we tested ReLu and tanh activation layers between the hidden layer and the output. The ReLu activation function performed better. Furthermore, we tried to freeze the BERT parameters in the last trainings epohs and only train the classifier parameters. This improved the performance especially on the SST dataset.
@@ -239,8 +221,13 @@ Another problem we earlier observed was that the task contradict each other, i.e
239221
| Adam additional layer| 50% | 88,4% | 84,4 % |
240222
| Adam extra classifier training| 51,6% | 88,5% | 84,3 % |
241223

242-
Use the same command as in the previous section and add the argument ```--para_sep True --weights True --add_layers True``` for reproducing the results.
224+
Run the following command for the adam baseline:
225+
```
226+
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "adamw" --epochs 4 --one_embed True --freeze_bert True --add_layers True --filepath final_freeze
227+
```
228+
For using the non linear classifier with ReLu activation add the argument ```--add_layers``` and for freezing the BERT parameters in the last epochs add the argument ```--freeze_bert```
243229

230+
We also tested some dropout and weight decay values, but those couldn't improve the performance. Furthermore, the weighted loss function, which improved the Models performance with the Sophia optimizer didn't help here.
244231
### SMART
245232

246233
#### Implementation
@@ -510,7 +497,8 @@ Our model achieves the following performance:
510497
>📋 Include a table of results from your paper, and link back to the leaderboard for clarity and context. If your main result is a figure, include that figure and link to the command or notebook to reproduce it.
511498
512499
## Future work
513-
- Since the huge size of the para dataset (comparing) to both of the sizes of the sst and sts datasets is leading to overfitting, then an enlargemnt of the sizes of the datasets sst and sts should reduce the possibilty of overfitting. This could be achieved be generating more (true) data from the datasets sst and sts, which is possible by adding another additional Task, see issue #60 for more details.
500+
- Since the huge size of the para dataset (comparing) to both of the sizes of the sst and sts datasets is leading to overfitting, then an enlargemnt of the sizes of the datasets sst and sts should reduce the possibilty of overfitting.
501+
This could be achieved be generating more (true) data from the datasets sst and sts, which is possible by adding another additional Task.
514502
- give other losses different weights.
515503
- with or without combined losses.
516504
- maybe based in dev_acc performance in previous epoch.
@@ -524,22 +512,6 @@ Lübbers, Christopher L.: Part 1 complete; Part 2: sBERT, Tensorboard (metrics +
524512

525513
Niegsch, Lukas*: Generalisations on Custom Attention, Splitted and reordererd batches,
526514

527-
Schmidt, Finn Paul: sBert ultitzask training, Sophia dropout layers, Sophia seperated paraphrasing training, Sophia weighted loss, Optuna study on the dropout and hyperparameters, BERT baseline adam, BERT additional layers, error_analysis
528-
529-
530-
## Submit commands
531-
532-
To train the sophia base model with optimised parameters run :
533-
```
534-
python -u multitask_classifier.py --use_gpu --option finetune --epochs 10 --comment "_sophia-chris_opt2" --batch_size 64 --optimizer "sophiag" --weight_decay_para 0.1267 --weight_decay_sst 0.2302 --weight_decay_sts 0.1384 --rho_para 0.0417 --rho_sst 0.0449 --rho_sts 0.0315 --lr_para 1e-5 --lr_sst 1e-5 --lr_sts 1e-5
535-
```
515+
Schmidt, Finn Paul: sBert multi_task training, Sophia dropout layers, Sophia seperated paraphrasing training, Sophia weighted loss, Optuna study on the dropout and hyperparameters, BERT baseline adam, BERT additional layers, error_analysis
536516

537-
To train the sophia model with dropout layers and optimized hyperparameters run:
538-
```
539-
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "sophiag" --epochs 10 --hidden_dropout_prob_para 0.15 --hidden_dropout_prob_sst 0.052 --hidden_dropout_prob_sts 0.22 --lr_para 1.8e-05 --lr_sst 5.6e-06 --lr_sts 1.1e-05 --weight_decay_para 0.038 --weight_decay_sst 0.17 --weight_decay_sts 0.22 --comment individual_dropout
540-
```
541517

542-
To train the sophia model with weighted loss and seperate paraphrasing training run:
543-
```
544-
python -u multitask_classifier.py --use_gpu --option finetune --optimizer "sophiag" --epochs 5 --weights True --para_sep True --hidden_dropout_prob_para 0 --hidden_dropout_prob_sst 0 --hidden_dropout_prob_sts 0 --lr_para 1e-05 --lr_sst 1e-05 --lr_sts 1e-05 --batch_size 64 --optimizer "sophiag" --weight_decay_para 0.1267 --weight_decay_sst 0.2302 --weight_decay_sts 0.1384 --rho_para 0.0417 --rho_sst 0.0449 --rho_sts 0.0315 --comment weighted_loss_without_dropout
545-
```

0 commit comments

Comments
 (0)