Skip to content

DynED is a novel ensemble construction and maintenance approach for data stream classification that dynamically balances the diversity and prediction accuracy of its components.

License

Notifications You must be signed in to change notification settings

soheilabadifard/DynED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepSource

DynED: Dynamic Ensemble Diversification in Data Stream Classification

This is the official implementation of the paper "DynED: Dynamic Ensemble Diversification in Data Stream Classification", published at CIKM 2023.

DynED is a novel ensemble construction and maintenance approach for data stream classification that dynamically balances the diversity and prediction accuracy of its components. The core challenge in data stream environments is handling disruptive changes in the data distribution, known as concept drift. DynED addresses this by using the Maximal Marginal Relevance (MMR) concept to dynamically adjust the ensemble's diversity—increasing it to adapt during concept drifts and decreasing it to maximize accuracy in stable periods.

Authors: Soheil Abadifard, Sepehr Bakhshi, Sanaz Gheibuni, and Fazli Can.


Key Features

  • Dynamic Diversity Adjustment: DynED dynamically adjusts its diversity parameter based on the intensity of accuracy changes, allowing it to adapt to severe drifts without manual tuning.
  • MMR-based Component Selection: Utilizes a modified Maximal Marginal Relevance (MMR) method to prune redundant or ineffective components, ensuring the ensemble is both diverse and accurate.
  • Concept Drift Adaptation: Specifically designed to handle the challenges of concept drift in evolving data streams, outperforming baseline methods in various drift scenarios.
  • Proven Performance: Experimental results on 15 datasets show that DynED achieves a higher average mean accuracy compared to five state-of-the-art baselines.

How it Works

DynED's architecture is built on a three-stage process to construct and maintain the ensemble structure, as illustrated in the paper.

  1. Stage 1: Prediction & Training

    • The set of active components predicts the label of new data samples using majority voting.
    • These components are then trained on the new samples in an online fashion.
  2. Stage 2: Drift Detection & Adaptation

    • The ADWIN drift detector is used to monitor the predictions for concept drift.
    • If a drift is detected, a new component is trained on the most recent data and added to a "reserved pool" of components.
    • The diversity parameter (λ) is dynamically updated based on the rate of change in the ensemble's accuracy, preparing it for the selection stage.
  3. Stage 3: Component Selection

    • When triggered, this stage combines the active and reserved components and prunes the pool to a maximum size.
    • Components are first clustered into two groups based on their prediction errors on recent data.
    • Finally, the adapted MMR method is used to select a new set of high-performing, diverse components to become the active ensemble.

Installation

  1. Prerequisites: DynED requires Python 3.8.

  2. Dependencies: The core dependency is the scikit-multiflow library. The last version of scikit-multiflow is compatible with specific versions of Numpy and Pandas. Please follow their official installation instructions.

    pip install -U scikit-multiflow

    For more details, visit the scikit-multiflow documentation.

How to Run

  1. Get the Data: The datasets used for evaluation are available in the /Dataset folder.
  2. Configure the Script:
    • Open the DynED.py file.
    • Locate the following line:
      stream = FileStream("Put the full address and name of the dataset here.")
    • Replace the placeholder string with the full path to the dataset file you wish to evaluate. For example:
      stream = FileStream("Dataset/poker.arff")
  3. Execute the Script:
    • Run the Python script from your terminal:
      python DynED.py
    • The script will output the final mean accuracy of the DynED model on the provided dataset.

Baseline Experiments

The paper evaluates DynED against five state-of-the-art baselines: LevBag, SAM-KNN, ARF, SRP, and KUE.

The scripts used to run these baseline experiments using the MOA framework can be found in the /scripts directory. Please see the README.md within that directory for more information.

Acknowledgments

This study is partially supported by TÜBİTAK grant no. 122E271.

Citation

If you use this work in your research, please cite the original paper:

@inproceedings{abadifard2023dyned,
  author = {Abadifard, Soheil and Bakhshi, Sepehr and Gheibuni, Sanaz and Can, Fazli},
  title = {DynED: Dynamic Ensemble Diversification in Data Stream Classification},
  year = {2023},
  isbn = {9798400701245},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {[https://doi.org/10.1145/3583780.3615266](https://doi.org/10.1145/3583780.3615266)},
  doi = {10.1145/3583780.3615266},
  booktitle = {Proceedings of the 32nd ACM International Conference on Information and Knowledge Management},
  pages = {3707–3711},
  numpages = {5},
  keywords = {ensemble pruning, maximal marginal relevance, data stream classification, concept drift, diversity adjustment, ensemble learning},
  location = {Birmingham, United Kingdom},
  series = {CIKM '23}
}

About

DynED is a novel ensemble construction and maintenance approach for data stream classification that dynamically balances the diversity and prediction accuracy of its components.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages