Skip to content

Label assertion and mapping in Machine #5054

@gf712

Description

@gf712

Currently some classification algorithms check whether the input Labels are valid, e.g. the class labels are continuous [0, 1, ..., n_classes-1], which leads to a lot of duplicate code.
These checks should be done by the Machine base class when training is performed. The Machine will then store the mapping of any Label input to an internal encoding, e.g. a binary classification task would map {10,20} -> {-1,+1} using a BinaryLabelEncoder class, and similarly there would be a MulticlassLabelsEncoder class for multiclass tasks. The properly encoded Labels are then dispatched to the train_machine method. When apply is called the returned Labels are mapped back to the user input Labels space using the LabelEncoder.

The tasks (in order):

  • write a LabelEncoder base class and respective BinaryLabelEncoder and MulticlassLabelsEncoder derived classes. These should also check that the Labels are valid, e.g. cannot transform {-1, 0, 1} to BinaryLabels. Add label encoder #5067
  • add LabelEncoder as a Machine class member
  • fit the LabelEncoder and transform input in train and then perform inverse operation in apply
  • Remove label checks from Machine subclasses, since algorithms are now guaranteed to receive a valid Label representation
  • xvalidation would use its own mapping that it passes on to each fold's Machine in order to keep the same mapping across folds

Most of this code already exists, but it is spread around the code base

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions