Skip to content

Kaufman-Lab-Columbia/k-DBCV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

k-DBCV

k-DBCV is an efficient python implementation of the density based cluster validation (DBCV) score proposed by Moulavi et al. (2014). The implementation leverages a k-dimensional tree to efficiently calculate intercluster distances resulting in improved performance when compared with previous implementations.

For optimizing k-DBCV to choose parameters from commonly used density-based clustering algorithms (DBSCAN, HDBSCAN, OPTICS) we recommend looking at our DBOpt library: https://github.com/Kaufman-Lab-Columbia/DBOpt

Getting Started

Dependencies

  • SciPy
  • NumPy

Installation

k-DBCV can be installed via pip:

pip install kDBCV

Usage

To score clustering scenarios, the following libraries are used:

  • scikit-learn
  • ClustSim

For visualization:

  • matplotlib

DBCV Score

Simple Scenario

The half moons dataset simulated from scikit-learn is shown:

DBCV_Score(X,labels)

Output: 0.5068928345037831

Scenario II

A larger dataset of clusters simulated with Clust_Sim-SMLM is shown:

score = DBCV_score(X,labels)

Output: 0.6171526846848352

Extracting Individual Cluster Scores

k-DBCV enables individual cluster score extraction where each cluster is assigned a score without consideration for noise: Individual Cluster Score = separation-sparseness/max(separation,sparseness)

By default, ind_clust_scores is set to False

score, ind_clust_score_array = DBCV_Score(X,labels, ind_clust_scores = True)

Individual cluster scores are displayed by color below:

Memory cutoff

A memory cutoff is necessary to prevent attempts to score clusters that would exceed available memory. This cutoff should be set dependent on the machine being used. The default is set to a maximum of 25.0 GB. The score will output a -1 if the cutoff would be exceeded, along with an error message. To remove these error messages set batch_mode = True (Default is False).

score = DBCV_score(X,labels, memory_cutoff = 25.0)

Relevant Citations

Density Based Cluster Validation

Moulavi, D., Jaskowiak, P. A., Campello, R. J. G. B., Zimek, A. & Sander, J. Density-based clustering validation. SIAM Int. Conf. Data Min. 2014, SDM 2014 2, 839–847 (2014)

k-DBCV implementation

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

License

k-DBCV is licensed with an MIT license. See LICENSE file for more information.

Referencing

In addition to citing Moulavi et al., if you use this repository, please cite with the following (currently in preprint):

Hammer, J. L., Devanny, A. J. & Kaufman, L. J. Density-based optimization for unbiased, reproducible clustering applied to single molecule localization microscopy. Preprint at https://www.biorxiv.org/content/10.1101/2024.11.01.621498v1 (2024)

image

Contact

kaufmangroup.rubylab@gmail.com

About

High speed implementation of Density Based Cluster Validation (DBCV)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •