Training data used for xlm-roberta-base-squad2 #5439

shalakasatheesh · 2023-07-25T14:49:11Z

shalakasatheesh
Jul 25, 2023

Hello,

Could you please tell me what German language datasets were used for training the QA model xlm-roberta-base-squad2?

In addition, I was wondering if the model used for fine-tuning is xlm-roberta-base?

Best wishes,
Shalaka

Answered by tholor

Jul 31, 2023

@shalakasatheesh It's a long time ago, but pretty sure we just took the "xlm-roberta-base" model for it, fine-tuned it on the English Squad2.0 dataset and evaluated it on German MLQA and German XQuAD (as described in the HF model card). So the model has only seen multilingual data at pre-training time.

View full answer

vblagoje · 2023-07-26T14:51:37Z

vblagoje
Jul 26, 2023
Maintainer

Shalaka, you have some of these details at https://huggingface.co/deepset/xlm-roberta-base-squad2
Regarding the dataset, my best guess would be a translation of Squad v2. If you need to know precisely, tell me why, and I'll ask around 😄

4 replies

shalakasatheesh Jul 26, 2023
Author

thank you for the answer @vblagoje :)

As part of my thesis, I want to carry out robustness tests on a monolingual and a multilingual model for the task of German Question Answering. So yes, it would really helpful if I could get some more precise information :)

The monolingual model I am considering is also from deepset: gelectra-base-germanquad. I was able to find relevant training details about the model from this paper. Similarly, if I was wondering I could get some more information about xlm-roberta-base-squad2 so that I can use it as the multi-lingual model.

vblagoje Jul 28, 2023
Maintainer

Very cool, I'll ping my colleagues @tholor @mathislucka who are better suited to answer this question.

tholor Jul 31, 2023
Maintainer

@shalakasatheesh It's a long time ago, but pretty sure we just took the "xlm-roberta-base" model for it, fine-tuned it on the English Squad2.0 dataset and evaluated it on German MLQA and German XQuAD (as described in the HF model card). So the model has only seen multilingual data at pre-training time.

Answer selected by shalakasatheesh

shalakasatheesh Jul 31, 2023
Author

thank you for the quick responses!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training data used for xlm-roberta-base-squad2 #5439

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Training data used for xlm-roberta-base-squad2 #5439

Uh oh!

Uh oh!

shalakasatheesh Jul 25, 2023

Replies: 1 comment · 4 replies

Uh oh!

vblagoje Jul 26, 2023 Maintainer

Uh oh!

Uh oh!

shalakasatheesh Jul 26, 2023 Author

Uh oh!

vblagoje Jul 28, 2023 Maintainer

Uh oh!

tholor Jul 31, 2023 Maintainer

Uh oh!

shalakasatheesh Jul 31, 2023 Author

shalakasatheesh
Jul 25, 2023

Replies: 1 comment 4 replies

vblagoje
Jul 26, 2023
Maintainer

shalakasatheesh Jul 26, 2023
Author

vblagoje Jul 28, 2023
Maintainer

tholor Jul 31, 2023
Maintainer

shalakasatheesh Jul 31, 2023
Author