k-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014). The implementation leverages a k-dimensional tree to efficiently calculate intercluster distances resulting in improved performance when compared with previous implementations.
For optimizing k-DBCV to choose parameters from commonly used density-based clustering algorithms (DBSCAN, HDBSCAN, OPTICS) we recommend looking at our DBOpt library: https://github.com/Kaufman-Lab-Columbia/DBOpt
- SciPy
- NumPy
k-DBCV can be installed via pip:
pip install kDBCV
To score clustering scenarios, the following libraries are used:
- scikit-learn
- ClustSim
For visualization:
- matplotlib
The half moons dataset simulated from scikit-learn is shown:
DBCV_Score(X,labels)
Output: 0.5068928345037831
A larger dataset of clusters simulated with Clust_Sim-SMLM is shown:
score = DBCV_score(X,labels)
Output: 0.6171526846848352
k-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise: Individual Cluster Score = separation-sparseness/max(separation,sparseness)
By default, ind_clust_scores is set to False
score, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)
Individual cluster scores are displayed by color below:
A memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).
score = DBCV_score(X,labels, memory_cutoff = 25.0)
Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839–847 (2014)
Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)
k-DBCV is licensed with an MIT license. See LICENSE file for more information.
In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):
Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)