-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
At the moment, the dataset looks like:
This is not good! and that's what we have after a down-sampling on the null
labels (i.e. labels that can't be classified in one of the categories in https://allcontributors.org/docs/en/emoji-key) which are ≈ 16.61% of the whole dataset (ideally being less than business
, ..., userTesting
combined).
Down-sampling null
labels would be an option, however, most of the ones left seems (fairly) widely used.
So the remaining option is to level up the other categories by adding more labels of those categories, especially the ones that can be found in GH/GL/Bitbucket repos alone.