-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Description
I may be missing something, but it appears that string_match_all
is only evaluating recall, but not precision.
So in the multi-key setting, a strategy for achieving a perfect score is to just output all values all the time.
For example, the following returns a score of 100%.
def string_match_all(preds, refs):
score = sum([sum([1.0 if r.lower() in pred.lower() else 0.0 for r in ref]) / len(ref) for pred, ref in zip(preds, refs)]) / len(preds) * 100
return round(score, 2)
preds =[
"a b c d e f g h i j k l m n o p q r s t u v w x y z",
"a b c d e f g h i j k l m n o p q r s t u v w x y z",
"a b c d e f g h i j k l m n o p q r s t u v w x y z",
]
refs = [
["a", "b", "c"],
["x", "y", "z"],
["m", "n", "o"],
]
string_match_all(preds, refs)
This metric should balance recall and precision, no? Seems like reporting either F1-score or exact match would be more appropriate. But perhaps I'm missing something.
Thanks!
Metadata
Metadata
Assignees
Labels
No labels