Repository for Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants paper.
In praticular, you will find code to reproduce the paper experiments. If you want to use our proposed mthods, please use the following updated repository : https://github.com/artefactory/mgs-grf
If you want to reproduce our paper experiments:
- the notebooks here and here reproduce the experiments
- thise code contains implementation the protocols used for the numerical experiments of our article.
In order to use our MGS strategy:
The data sets of used for our article should be dowloaded inside the data/externals folder. The data sets are available at the followings adresses :
- Pima
- Phoneme : https://github.com/jbrownlee/Datasets/blob/master/phoneme.csv
- Abalone : https://archive.ics.uci.edu/dataset/1/abalone
- Wine : https://archive.ics.uci.edu/dataset/186/wine+quality
- Haberman : https://archive.ics.uci.edu/dataset/43/haberman+s+survival
- Yeast : https://archive.ics.uci.edu/dataset/110/yeast
- Vehicle : https://archive.ics.uci.edu/dataset/149/statlog+vehicle+silhouettes
- Ionosphere : https://archive.ics.uci.edu/dataset/52/ionosphere
- Breast cancer Wisconsin : https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
- CreditCard : https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
- MagicTel : https://www.openml.org/d/44125
- California : https://www.openml.org/d/44090
- House_16H : https://openml.org/d/821
This work was done through a partenership between Artefact Research Center and the Laboratoire de Probabilités Statistiques et Modélisation (LPSM) of Sorbonne University.
If you find the code usefull, please consider citing us :
@article{sakho2024we,
title={Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants},
author={Sakho, Abdoulaye and Malherbe, Emmanuel and Scornet, Erwan},
journal={arXiv preprint arXiv:2402.03819},
year={2024}
}