Clustering Model for ordinal data

Personal Information

Name: Sai Krishna Kalyan
Email: krishnakalyan3@gmail.com
Telephone: +1-581-681-3826
Github: krishnakalyan3
University: Lumière University Lyon 2
Course: Masters in Data Mining and Knowledge Management
Expected Graduation date: June 2017
Time Zone: EDT (GMT -04:00)

Deliverables

At the end of this project, we will have

Faster execution time of this code
User friendly R Package
Shiny Interface

Summary

I would working on the authors (who are also my professors) on their implementation of clustering model for categorical data. Summary of their work presented below.

This algorithm relies on the latent block model embedding a probability distribution specific to ordinal data (the so-called BOS or Binary Ordinal Search distribution). Model inference relies on a Stochastic EM algorithm coupled with a Gibbs sampler, and the ICL-BIC criterion is used for selecting the number of co-clusters (or blocks). The main advantage of this ordinal dedicated co-clustering model is its parsimony, the interpretability of the co-cluster parameters (mode, precision) and the possibility to take into account missing data. Numerical experiments on simulated data show the efficiency of the inference strategy, and real data analyses illustrate the interest of the proposed procedure.

Implementation

The implementation language will be R. Core parts of the work will be done in R (including RcppAmardillo and Shiny). Benchmark analysis will be done using different datasets. Documentation and examples will be added to the user guide and the API.

Project Milestones

Phase 1 (May 10 to June 30)

Optimise execution time by refactoring code using RcppAmardillo package. For this, a preliminary phase of tests should find the most computationally heavy part of the inference algorithm.

Experiment with RcppAmardillo
Benchmarking existing code with different data sets
Refactoring code with RcppAmardillo

Phase 2 (June 30 to July 28)

Compile the code in order to create a R package. The package should be easy of use for non specialists, fast, and provide useful output and graphical representations of the results.

API documentation
Write Examples
Create a user friendly R package

Final (July 28 to August 21)

The results should be presented through a Shiny interface, in which the user can move into the solution space by changing the number of clusters.

Design UI
Server implementation
Deploy Application to server
Buffer time

I am completely available during the summer for 40 hours/week work.

Mentors

Please get in touch with Julien JACQUES and Christophe BIERNACKI for this project.

References

[1] C. Biernacki and J. Jacques (2016), Model-based clustering of multivariate ordinal data relying on a stochastic binary search algorithm, Statistics and Computing, 26 [5], 929-943

Clustering Model for ordinal data

Personal Information

Deliverables

Summary

Implementation

Project Milestones

Phase 1 (May 10 to June 30)

Phase 2 (June 30 to July 28)

Final (July 28 to August 21)

Mentors

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally