Skip to content

nitesh2104/Unsupervised-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised Learning and Dimensionality Reduction

In this project, we explore unsupervised learning algorithms and perform Dimensionality Reduction to obtain a subset of features with most information

---How to Run---

  • Each requirement is contained within the respective directory (KMeans, PCA, ICA etc.)
  • Run this cmd from the root directory: jupyter-lab
  • After the browser opens - open the file and run all cells
  • Note: np.random.seed(0) is already added to ensure output consistent runs

Algorithms

  • k-means clustering
  • Expectation Maximization
  • PCA
  • ICA
  • Randomized Projections
  • IPCA

Dataset

  • Phone Price Prediction
  • Salary Prediction

STEPS and Guidelines

  • Run the clustering algorithms on the datasets
  • Apply the dimensionality reduction algorithms to the two datasets
  • Reproducing the clustering experiments
  • Applying the dimensionality reduction algorithms and reruning the neural network learner on the newly projected data.
  • Applying the clustering algorithms to the same dataset to which we just applied the dimensionality reduction algorithms, treating the clusters as if they were new features. In other words, treat the clustering algorithms as if they were dimensionality reduction algorithms. Again, rerun the neural network learner on the newly projected data.

Requirements

  • a discussion of the datasets, and why they're interesting: If we're using the same datasets as before at least briefly remind us of what they are so we don't have to revisit the old assignment write-up... and if we aren't well that's a whole lot of work we're going to have to recreate from assignment 1 isn't it?
  • explanations of the methods: for example, how did we choose k?
  • a description of the kind of clusters that we got.
  • analyses of the results
  • Describe how the data looks in the new spaces which is created with the various algorithms? For PCA, what is the distribution of eigenvalues? For ICA, how kurtotic are the distributions? Do the projection axes for ICA seem to capture anything "meaningful"? Assuming we only generate k projections (i.e., we do dimensionality reduction), how well is the data reconstructed by the randomized projections? PCA? How much variation did we get when we re-ran the RP several times (I know I don't have to mention that we might want to run RP many times to see what happens, but I hope we forgive me)?
  • When the data reproduces the clustering experiments on the datasets projected onto the new spaces created by ICA, PCA, and RP, the clusters same as before ? Different clusters? Why? Why not?
  • When we re-ran the neural network algorithms were there any differences in performance? Speed? Anything at all?

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published