Skip to content

Add option to exclude identity matches #405

@silenceOfTheLambda

Description

@silenceOfTheLambda

It might be great to have an argument in functions to exclude identity matches/scores from the returned output, sth. like include_identity=False, or only_fuzzy=True. Commonly, we want to use fuzzy matching for fuzzy (rather than exact) matching, since the latter can be done via equality checks.

A use case is the fuzzy-matching of a list of strings with itself. Suppose for each of the input strings we want to find the best-matching string other than itself. To do that, currently, one has to remove the (single) input string from the list of strings from which to extract the matching string before calling extract(). But with multi-input-calls to extract() (see #188) that is not possible anymore. Or, if the input string occurs only once among the choices list, one must take the second-best match returned by extract(). Or, if one is interested in the entire similarity matrix, one needs to set the elements corresponding to identity matches to some number < 0 before e.g. applying np.(arg)max to find the (index of) the maximum similarity score.

Having a dedicated argument that takes care of excluding identity matches under the hood of the process module functions may help improve convenience/user-friendlyness :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions