Fine tuning passage re-ranking #4399

vibha0411 · 2023-03-13T19:24:57Z

vibha0411
Mar 13, 2023

Is it possible to finetune SentenceTransformersRanker on custom data? If so what format should the data be in?

I think the base model is trained on msmarco (https://microsoft.github.io/msmarco/Datasets.html)?

vblagoje · 2023-03-14T09:08:18Z

vblagoje
Mar 14, 2023
Maintainer

It certainly is, but IMHO, the cost-benefit doesn't make sense. Are you experiencing some issues with SentenceTransformersRanker? Do you find that the ranking is not as good as you expected? I would love to hear more, mainly because we are introducing a top p sampler which I find to be much better in these use cases.

0 replies

vibha0411 · 2023-03-14T09:53:11Z

vibha0411
Mar 14, 2023
Author

@vblagoje thank you for your reply.
I don't see a problem with the current one, however I was hoping to fine-tune with the use feedback so that it can learn and improve over time.
I did not see a train function in the ranker node of haystack thought. (https://github.com/deepset-ai/haystack/tree/main/haystack/nodes/ranker)
so way wondering if retraining is possible through haystack or not. If where can I find the documentation/examples for the same.

Curious to know more on the top-p sampler.

0 replies

vblagoje · 2023-03-14T10:34:00Z

vblagoje
Mar 14, 2023
Maintainer

@vibha0411 We don't offer SentenceTransformersRanker fine-tuning and will likely remove the remaining model training code. Model fine-tuning should be done separately, outside Haystack, using specialized libraries like accelerate and others. Have a look at TopPSampling here. The planned release is 1.15 (in roughly two weeks). LMK if you have any other questions.

0 replies

vibha0411 · 2023-03-15T15:49:00Z

vibha0411
Mar 15, 2023
Author

@vblagoje, I am interested to know your thoughts on this
#4423
which is an alternative way to go about with fine-tuning the retrievers on custom data

0 replies

muazhari · 2023-04-05T00:32:21Z

muazhari
Apr 5, 2023

I need ranker fine-tuning too because the current ranker model doesn't support lengthy text processing. I need this to rerank some big chunk of text from the retriever results in LFQA with OpenAI prompt use-case. The alternative I found is using Longformer, but I think it should be fine-tuned to the downstream task. Below is the error message I came up with.

Exception: Exception while running node 'Ranker': The size of tensor a (1045) must match the size of tensor b (512) at non-singleton dimension 1

1 reply

ZanSara Apr 5, 2023

Haystack is not the right framework for this specialized fine-tuning tasks. Have a look at what HuggingFace has to offer on this topic, for example Accelerate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine tuning passage re-ranking #4399

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Fine tuning passage re-ranking #4399

Uh oh!

vibha0411 Mar 13, 2023

Replies: 5 comments · 1 reply

Uh oh!

vblagoje Mar 14, 2023 Maintainer

Uh oh!

vibha0411 Mar 14, 2023 Author

Uh oh!

Uh oh!

vblagoje Mar 14, 2023 Maintainer

Uh oh!

vibha0411 Mar 15, 2023 Author

Uh oh!

Uh oh!

muazhari Apr 5, 2023

Uh oh!

ZanSara Apr 5, 2023

vibha0411
Mar 13, 2023

Replies: 5 comments 1 reply

vblagoje
Mar 14, 2023
Maintainer

vibha0411
Mar 14, 2023
Author

vblagoje
Mar 14, 2023
Maintainer

vibha0411
Mar 15, 2023
Author

muazhari
Apr 5, 2023