Intuition about the Reader score #3277

FabianHertwig · 2022-09-26T09:17:50Z

FabianHertwig
Sep 26, 2022

Hi, in our case, the score is often around 0.005 and I would like to understand what that means.

So I found out the the Reader puts out vectors with logits for each token of the input document, predicting if that token is the start or end of the answer. The best and valid ones are then summed and put through the expit function in these lines of code float(expit(np.asarray(score) / 8)). But what was the training target for the reader? Is it a positive number for "this token is the start/end of the answer" and negative otherwise? So if the score is 0.005 does it mean the logits were almost -5.3 * 8 (expit(-5.3) = 0.005) and that means the Reader is pretty sure that this is not the start and end of the answer?

Resources:
There was some talk about the score in #193, there is some documentation here: https://haystack.deepset.ai/pipeline_nodes/reader#confidence-scores and this blogpost also explains a bit: https://www.deepset.ai/blog/modern-question-answering-systems-explained

Answered by sjrl

Sep 27, 2022

Hi, @FabianHertwig just to confirm I believe the score is for the Reader is separately calculated and is provided at this line the in results

haystack/haystack/nodes/reader/farm.py

Line 1159 in e2e6887

score=ans.confidence if self.use_confidence_scores else ans.score,

.

I also noticed that you are referencing an older version of Haystack in this link "in these lines of code" Could you let me know what version of Haystack that you are using?

Also the discussion here #193 is a little old and out of date. The answer score is no longer the raw logit by default, but instead, the confidence score that is explained in the blog post you linked.

The code for calculating the confid…

View full answer

sjrl · 2022-09-27T19:21:34Z

sjrl
Sep 27, 2022
Maintainer

Hi, @FabianHertwig just to confirm I believe the score is for the Reader is separately calculated and is provided at this line the in results

haystack/haystack/nodes/reader/farm.py

Line 1159 in e2e6887

score=ans.confidence if self.use_confidence_scores else ans.score,

.

I also noticed that you are referencing an older version of Haystack in this link "in these lines of code" Could you let me know what version of Haystack that you are using?

Also the discussion here #193 is a little old and out of date. The answer score is no longer the raw logit by default, but instead, the confidence score that is explained in the blog post you linked.

The code for calculating the confidence score is a bit buried but can be found here

haystack/haystack/modeling/model/prediction_head.py

Lines 524 to 525 in e2e6887

    
           start_matrix_softmax_start = torch.softmax(start_matrix[:, 0], dim=-1).cpu().numpy() 
        
           end_matrix_softmax_end = torch.softmax(end_matrix[0, :], dim=-1).cpu().numpy()

and here

haystack/haystack/modeling/model/prediction_head.py

Lines 537 to 541 in e2e6887

    
           confidence = ( 
        
               (start_matrix_softmax_start[start_idx] + end_matrix_softmax_end[end_idx]) / 2 
        
               if score > -500 
        
               else np.exp(score / 10)  # disqualify answers according to scores in logits_to_preds() 
        
           )

The basic idea is that the confidence score is a softmax applied on all of the raw logits produced by the model when predicting all potential answers for a single Haystack Document. As mentioned in the blog post a low reader confidence score can still occur (even if the answer is correct) when the data being fed to the model strongly differs from the training data. In this scenario, this means that the raw logits for all possible answers for one Haystack Document are closely grouped together instead of being further spaced apart. This doesn't mean that the model still cannot produce very good results, it just means that the model does not confidently identify the top answer as being much more correct than the other answers.

I hope this helps! Please let me know if you have any questions.

0 replies

anakin87 · 2022-10-04T14:11:17Z

anakin87
Oct 4, 2022
Maintainer

Hello @sjrl,
I think the score is a piece of important information and a distinctive feature of Haystack.
However, to understand how it is calculated, you currently need to examine the code, read that blog post, delve into Github issues and discussions...

So I propose to include a better and deeper explanation in the documentation...

Sorry I'm a little rude, I really appreciate your efforts to make Haystack better every day. 🤗

1 reply

agnieszka-m Oct 4, 2022
Collaborator

Thanks for pointing it out @anakin87. I'll have a closer look at it and see how we can improve the docs. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intuition about the Reader score #3277

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Intuition about the Reader score #3277

Uh oh!

Uh oh!

FabianHertwig Sep 26, 2022

Replies: 2 comments · 1 reply

Uh oh!

sjrl Sep 27, 2022 Maintainer

Uh oh!

anakin87 Oct 4, 2022 Maintainer

Uh oh!

agnieszka-m Oct 4, 2022 Collaborator

FabianHertwig
Sep 26, 2022

Replies: 2 comments 1 reply

sjrl
Sep 27, 2022
Maintainer

anakin87
Oct 4, 2022
Maintainer

agnieszka-m Oct 4, 2022
Collaborator