-
-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Currently some classification algorithms check whether the input Labels are valid, e.g. the class labels are continuous [0, 1, ..., n_classes-1]
, which leads to a lot of duplicate code.
These checks should be done by the Machine base class when training is performed. The Machine will then store the mapping of any Label input to an internal encoding, e.g. a binary classification task would map {10,20} -> {-1,+1} using a BinaryLabelEncoder
class, and similarly there would be a MulticlassLabelsEncoder
class for multiclass tasks. The properly encoded Labels are then dispatched to the train_machine
method. When apply
is called the returned Labels are mapped back to the user input Labels space using the LabelEncoder
.
The tasks (in order):
- write a
LabelEncoder
base class and respectiveBinaryLabelEncoder
andMulticlassLabelsEncoder
derived classes. These should also check that the Labels are valid, e.g. cannot transform {-1, 0, 1} to BinaryLabels. Add label encoder #5067 - add
LabelEncoder
as aMachine
class member - fit the
LabelEncoder
and transform input intrain
and then perform inverse operation inapply
- Remove label checks from
Machine
subclasses, since algorithms are now guaranteed to receive a valid Label representation - xvalidation would use its own mapping that it passes on to each fold's
Machine
in order to keep the same mapping across folds
Most of this code already exists, but it is spread around the code base