Skip to content

Commit e9b5ebc

Browse files
committed
Update README.md
1 parent bd6af50 commit e9b5ebc

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -331,6 +331,7 @@ $$attention(Q, K, V) = softmax\left(\frac{QWK^T}{\sqrt{d_j}}\right)V$$
331331
For ideas 1, 3 we get the original self attention by having specific parameters. We also found a paper that showed the second idea. The goal was that the model uses the original parameters but having more freedom in manipulating them by adding few extra parameters inside all the bert layers. We later realized that all 3 ideas could be combined resulting in 8 different models (1 baseline + 7 extra):
332332

333333
| Model name | SST accuracy | QQP accuracy | STS correlation |
334+
| -------------------------- | ------------ | ------------ | --------------- |
334335
| sBERT-BertSelfAttention (baseline) | 44.6% | 77.2% | 48.3% |
335336
| sBERT-LinearSelfAttention | 40.5% | 75.6% | 37.8% |
336337
| sBERT-NoBiasLinearSelfAttention | 40.5% | 75.6% | 37.8% |
@@ -359,6 +360,7 @@ Furthermore, all 3 datasets are learned one after another. This means that the g
359360
Lastly, we tried training the batches for the last 3 steps in a round robin way (sts, para, sst, sts, para, sst, ...).
360361

361362
| Model name | SST accuracy | QQP accuracy | STS correlation |
363+
| -------------------------- | ------------ | ------------ | --------------- |
362364
| sBERT-BertSelfAttention (baseline) | 44.6% | 77.2% | 48.3% |
363365
| sBERT-ReorderedTraining (BertSelfAttention) | 45.9% | 79.3% | 49.8% |
364366
| sBERT-RoundRobinTraining (BertSelfAttention) | 45.5% | 77.5% | 50.3% |

0 commit comments

Comments
 (0)