Skip to content

string_match_all only evaluates recall but not precision #95

@seyuboglu

Description

@seyuboglu

I may be missing something, but it appears that string_match_all is only evaluating recall, but not precision.
So in the multi-key setting, a strategy for achieving a perfect score is to just output all values all the time.

For example, the following returns a score of 100%.

def string_match_all(preds, refs):
    score = sum([sum([1.0 if r.lower() in pred.lower() else 0.0 for r in ref]) / len(ref) for pred, ref in zip(preds, refs)]) / len(preds) * 100
    return round(score, 2)

preds =[
    "a b c d e f g h i j k l m n o p q r s t u v w x y z",
    "a b c d e f g h i j k l m n o p q r s t u v w x y z",
    "a b c d e f g h i j k l m n o p q r s t u v w x y z",
]
refs = [
    ["a", "b", "c"],
    ["x", "y", "z"],
    ["m", "n", "o"],
]

string_match_all(preds, refs)

This metric should balance recall and precision, no? Seems like reporting either F1-score or exact match would be more appropriate. But perhaps I'm missing something.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions