-
-
Notifications
You must be signed in to change notification settings - Fork 138
Description
It might be great to have an argument in functions to exclude identity matches/scores from the returned output, sth. like include_identity=False, or only_fuzzy=True. Commonly, we want to use fuzzy matching for fuzzy (rather than exact) matching, since the latter can be done via equality checks.
A use case is the fuzzy-matching of a list of strings with itself. Suppose for each of the input strings we want to find the best-matching string other than itself. To do that, currently, one has to remove the (single) input string from the list of strings from which to extract the matching string before calling extract()
. But with multi-input-calls to extract()
(see #188) that is not possible anymore. Or, if the input string occurs only once among the choices
list, one must take the second-best match returned by extract()
. Or, if one is interested in the entire similarity matrix, one needs to set the elements corresponding to identity matches to some number < 0 before e.g. applying np.(arg)max
to find the (index of) the maximum similarity score.
Having a dedicated argument that takes care of excluding identity matches under the hood of the process module functions may help improve convenience/user-friendlyness :)