spanruler + spancat inconsistencies #13200
-
Maybe I am mis-interpreting the documentation but I am seeing some odd behavior with spanruler + spancat that I would like to understand better. My understanding was that entity ruler + ner could improve ner's training and the same could be said for spanruler + spancat. When i have Pipeline: ['tok2vec', 'spancat'] i get a score of .70 with my current training data set:
When i have Pipeline: ['tok2vec', 'span_ruler', 'spancat'] i get similar F/P/R scores but a score of .35 using the same training/dev data set?
In experimenting with the trained model that had span_ruler before spancat in the pipeline it seems the model is removing spans that should be set by the span_ruler (almost seems like the patterns are not working at all)? If i remove span_ruler from the trained pipeline and re-add it at the end i find the patterns are working as expected. I was hoping my patterns could improve spancats training but it seems to have the opposite effect. If i remove span_ruler all together the predictions from spancat seem better then when span_ruler(ahead of spancat) is in the pipeline? Here is my current configuration:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Remove the score weights for [training.score_weights]
spans_sc_f = 1.0
spans_sc_p = 0.0
spans_sc_r = 0.0 The score weights settings aren't adjusted automatically when you change the |
Beta Was this translation helpful? Give feedback.
Try setting all of the
spans_ruler_*
scores to tonull
instead of removing them? There will be minor differences in the results with the addedspan_ruler
, but in all the examples this looks like a math/scoring problem and not an actual training problem. (There is no keyruler
so a0.0
score is getting averaged in.)I would also add that I'd have to double-check exactly how the scoring works if the
span_ruler
andspancat
both add the same span more than once.spancat
will overwrite the entirespans_key
, but I think thatspan_ruler
would extend the existing spans using the default settings.