DynED: Dynamic Ensemble Diversification in Data Stream Classification

This is the official implementation of the paper "DynED: Dynamic Ensemble Diversification in Data Stream Classification", published at CIKM 2023.

DynED is a novel ensemble construction and maintenance approach for data stream classification that dynamically balances the diversity and prediction accuracy of its components. The core challenge in data stream environments is handling disruptive changes in the data distribution, known as concept drift. DynED addresses this by using the Maximal Marginal Relevance (MMR) concept to dynamically adjust the ensemble's diversity—increasing it to adapt during concept drifts and decreasing it to maximize accuracy in stable periods.

Authors: Soheil Abadifard, Sepehr Bakhshi, Sanaz Gheibuni, and Fazli Can.

Key Features

Dynamic Diversity Adjustment: DynED dynamically adjusts its diversity parameter based on the intensity of accuracy changes, allowing it to adapt to severe drifts without manual tuning.
MMR-based Component Selection: Utilizes a modified Maximal Marginal Relevance (MMR) method to prune redundant or ineffective components, ensuring the ensemble is both diverse and accurate.
Concept Drift Adaptation: Specifically designed to handle the challenges of concept drift in evolving data streams, outperforming baseline methods in various drift scenarios.
Proven Performance: Experimental results on 15 datasets show that DynED achieves a higher average mean accuracy compared to five state-of-the-art baselines.

How it Works

DynED's architecture is built on a three-stage process to construct and maintain the ensemble structure, as illustrated in the paper.

Stage 1: Prediction & Training
- The set of active components predicts the label of new data samples using majority voting.
- These components are then trained on the new samples in an online fashion.
Stage 2: Drift Detection & Adaptation
- The ADWIN drift detector is used to monitor the predictions for concept drift.
- If a drift is detected, a new component is trained on the most recent data and added to a "reserved pool" of components.
- The diversity parameter (λ) is dynamically updated based on the rate of change in the ensemble's accuracy, preparing it for the selection stage.
Stage 3: Component Selection
- When triggered, this stage combines the active and reserved components and prunes the pool to a maximum size.
- Components are first clustered into two groups based on their prediction errors on recent data.
- Finally, the adapted MMR method is used to select a new set of high-performing, diverse components to become the active ensemble.

Installation

Prerequisites: DynED requires Python 3.8.
Dependencies: The core dependency is the scikit-multiflow library. The last version of scikit-multiflow is compatible with specific versions of Numpy and Pandas. Please follow their official installation instructions.
```
pip install -U scikit-multiflow
```
For more details, visit the scikit-multiflow documentation.

How to Run

Get the Data: The datasets used for evaluation are available in the /Dataset folder.
Configure the Script:
- Open the DynED.py file.
- Locate the following line:
```
stream = FileStream("Put the full address and name of the dataset here.")
```
- Replace the placeholder string with the full path to the dataset file you wish to evaluate. For example:
```
stream = FileStream("Dataset/poker.arff")
```
Execute the Script:
- Run the Python script from your terminal:
```
python DynED.py
```
- The script will output the final mean accuracy of the DynED model on the provided dataset.

Baseline Experiments

The paper evaluates DynED against five state-of-the-art baselines: LevBag, SAM-KNN, ARF, SRP, and KUE.

The scripts used to run these baseline experiments using the MOA framework can be found in the /scripts directory. Please see the README.md within that directory for more information.

Acknowledgments

This study is partially supported by TÜBİTAK grant no. 122E271.

Citation

If you use this work in your research, please cite the original paper:

@inproceedings{abadifard2023dyned,
  author = {Abadifard, Soheil and Bakhshi, Sepehr and Gheibuni, Sanaz and Can, Fazli},
  title = {DynED: Dynamic Ensemble Diversification in Data Stream Classification},
  year = {2023},
  isbn = {9798400701245},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {[https://doi.org/10.1145/3583780.3615266](https://doi.org/10.1145/3583780.3615266)},
  doi = {10.1145/3583780.3615266},
  booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
  pages = {3707–3711},
  numpages = {5},
  keywords = {ensemble pruning, maximal marginal relevance, data stream classification, concept drift, diversity adjustment, ensemble learning},
  location = {Birmingham, United Kingdom},
  series = {CIKM '23}
}

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.dataherb		.dataherb
Experiments		Experiments
dataset		dataset
scripts		scripts
.deepsource.toml		.deepsource.toml
DynED.py		DynED.py
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DynED: Dynamic Ensemble Diversification in Data Stream Classification

Key Features

Installation

How to Run

Baseline Experiments

Acknowledgments

Citation

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

soheilabadifard/DynED

Folders and files

Latest commit

History

Repository files navigation

DynED: Dynamic Ensemble Diversification in Data Stream Classification

Key Features

Installation

How to Run

Baseline Experiments

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages