This is the official implementation of the paper "DynED: Dynamic Ensemble Diversification in Data Stream Classification", published at CIKM 2023.
DynED is a novel ensemble construction and maintenance approach for data stream classification that dynamically balances the diversity and prediction accuracy of its components. The core challenge in data stream environments is handling disruptive changes in the data distribution, known as concept drift. DynED addresses this by using the Maximal Marginal Relevance (MMR) concept to dynamically adjust the ensemble's diversity—increasing it to adapt during concept drifts and decreasing it to maximize accuracy in stable periods.
Authors: Soheil Abadifard, Sepehr Bakhshi, Sanaz Gheibuni, and Fazli Can.
- Dynamic Diversity Adjustment: DynED dynamically adjusts its diversity parameter based on the intensity of accuracy changes, allowing it to adapt to severe drifts without manual tuning.
- MMR-based Component Selection: Utilizes a modified Maximal Marginal Relevance (MMR) method to prune redundant or ineffective components, ensuring the ensemble is both diverse and accurate.
- Concept Drift Adaptation: Specifically designed to handle the challenges of concept drift in evolving data streams, outperforming baseline methods in various drift scenarios.
- Proven Performance: Experimental results on 15 datasets show that DynED achieves a higher average mean accuracy compared to five state-of-the-art baselines.
How it Works
DynED's architecture is built on a three-stage process to construct and maintain the ensemble structure, as illustrated in the paper.
-
Stage 1: Prediction & Training
- The set of active components predicts the label of new data samples using majority voting.
- These components are then trained on the new samples in an online fashion.
-
Stage 2: Drift Detection & Adaptation
- The ADWIN drift detector is used to monitor the predictions for concept drift.
- If a drift is detected, a new component is trained on the most recent data and added to a "reserved pool" of components.
- The diversity parameter (
λ
) is dynamically updated based on the rate of change in the ensemble's accuracy, preparing it for the selection stage.
-
Stage 3: Component Selection
- When triggered, this stage combines the active and reserved components and prunes the pool to a maximum size.
- Components are first clustered into two groups based on their prediction errors on recent data.
- Finally, the adapted MMR method is used to select a new set of high-performing, diverse components to become the active ensemble.
-
Prerequisites: DynED requires Python 3.8.
-
Dependencies: The core dependency is the
scikit-multiflow
library. The last version ofscikit-multiflow
is compatible with specific versions of Numpy and Pandas. Please follow their official installation instructions.pip install -U scikit-multiflow
For more details, visit the scikit-multiflow documentation.
- Get the Data: The datasets used for evaluation are available in the
/Dataset
folder. - Configure the Script:
- Open the
DynED.py
file. - Locate the following line:
stream = FileStream("Put the full address and name of the dataset here.")
- Replace the placeholder string with the full path to the dataset file you wish to evaluate. For example:
stream = FileStream("Dataset/poker.arff")
- Open the
- Execute the Script:
- Run the Python script from your terminal:
python DynED.py
- The script will output the final mean accuracy of the DynED model on the provided dataset.
- Run the Python script from your terminal:
The paper evaluates DynED against five state-of-the-art baselines: LevBag, SAM-KNN, ARF, SRP, and KUE.
The scripts used to run these baseline experiments using the MOA framework can be found in the /scripts
directory. Please see the README.md
within that directory for more information.
This study is partially supported by TÜBİTAK grant no. 122E271.
If you use this work in your research, please cite the original paper:
@inproceedings{abadifard2023dyned,
author = {Abadifard, Soheil and Bakhshi, Sepehr and Gheibuni, Sanaz and Can, Fazli},
title = {DynED: Dynamic Ensemble Diversification in Data Stream Classification},
year = {2023},
isbn = {9798400701245},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {[https://doi.org/10.1145/3583780.3615266](https://doi.org/10.1145/3583780.3615266)},
doi = {10.1145/3583780.3615266},
booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
pages = {3707–3711},
numpages = {5},
keywords = {ensemble pruning, maximal marginal relevance, data stream classification, concept drift, diversity adjustment, ensemble learning},
location = {Birmingham, United Kingdom},
series = {CIKM '23}
}