TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO

Hi, thank you for the amazing work with NBoost.

My question is regarding TinyBERT.  As per this [accompanying blog post](https://towardsdatascience.com/tinybert-for-search-10x-faster-and-20x-smaller-than-bert-74cd1b6b5aec), TinyBERT is obtained using knowledge distillation on a larger BERT architecture that is pre-trained on MS-MARCO. 

How does this approach compare with training a TinyBERT architecture from scratch on the MS-MARCO dataset?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO #85

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO #85

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions