Hyperspectral Image Analysis

Hyperspectral Imaging is a powerful technique that captures and analyzes a wide spectrum of light across hundreds of narrow bands. This technology allows for detailed material identification, pattern recognition, and classification across diverse fields, including environmental monitoring, medical imaging, agriculture, art conservation, and more.

This repository demonstrates the application of machine learning algorithms, particularly unsupervised learning techniques, to analyze and interpret hyperspectral datasets. The methods presented provide insights into spectral patterns, material properties, and class segmentation.

📌 1. Project Introduction

This repository implements both unsupervised and supervised machine learning algorithms on hyperspectral datasets (Pavia, Salinas, and Onera Satellite Datasets) to perform clustering and segmentation:

K-Means Clustering
K-Subspaces Clustering
Spectral Clustering

📊 2. Implemented Algorithms

K-Means Clustering:

K-Means clustering is one of the widely used clustering algorithms due to its simplicity. K-Means clustering groups data points into k clusters by minimizing the within cluster variance. Each cluster is represented by a centroid, and data points are assigned to the cluster with the nearest centroid. The algorithm works iteratively to assign each data point to the cluster with the nearest centroid (based on Euclidean distance) and then updates the centroids as the mean of the points in each cluster. This process repeats until the centroids stabilize or a stopping criterion is met.

The objective of K-Means is to minimize the within-cluster sum of sqaured errors.

$$J = \sum_{k=1}^{K} \sum_{\mathbf{y}_i \in \mathcal{C}_k} |\mathbf{y}_i - \boldsymbol{\mu}_k|_2^2$$

Where:

$K$: Number of clusters.
$y_i$: Data point at index $i$ in the dataset.
$C_k$: Set of points assigned to cluster $k$.
$\mu_k$: Centroid of Cluster $k$.
$| \cdot |_2$: Euclidean norm (distance).

K-Subspaces Clustering:

K-Subspaces (KSS) clustering extends K-Means by addressing the challenges of clustering high-dimensional data. Instead of relying solely on Euclidean distance, KSS assumes the data lies in a union of low-dimensional subspaces. Clustering is accomplished through a iterative process where subspace space basis are updated and data points are assigned based on their projection onto the closest subspace.

The objective of K-Subspaces clustering is to minimize the total projection error.

$$J = \sum_{k=1}^{K} \sum_{\mathbf{y}_i \in \mathcal{C}_k} |\mathbf{y}_i - \mathbf{U}_k \mathbf{U}_k^{\top} \mathbf{y}_i|_2^2$$

Where:

$K$: Number of clusters.
$y_i$: Data point at index $i$ in the dataset.
$C_k$: Set of points assigned to cluster $k$.
$U_k$: Subspace Basis for Cluster $k$.
$| \cdot |_2$: Euclidean norm (distance).

Pluto Notebook - KSS Implementation on Pavia Dataset

K-Affine spaces Clustering:

K-Affine spaces (KAS) clustering. Just like KSS, KAS assumes that the data lies in a union of low-dimensional affine spaces formed from a set of bases and a bias vector rather than a set of linear subspaces. Clustering is accomplished through a iterative process where the basis and the bias vector for each affine space are updated, and the data points are assigned based on their projection onto the closest affine space.

The objective of K-Affine spaces clustering is to minimize the total projection error.

$$J = \sum_{k=1}^{K} \sum_{\mathbf{y}_i \in \mathcal{C}_k} |\mathbf{y}_i - [\mathbf{U}_k \mathbf{U}_k^{\top} (\mathbf{y}_i - \boldsymbol{\mu}_k) + \boldsymbol{\mu}_k]|_2^2$$

Where:

$K$: Number of clusters.
$y_i$: Data point at index $i$ in the dataset.
$\mu_k$: Centroid of cluster $k$.
$C_k$: Set of points assigned to cluster $k$.
$U_k$: Affine space Basis for Cluster $k$.
$| \cdot |_2$: Euclidean norm (distance).

Thresholded subspace Clustering:

In Thresholded Subspace Clustering (TSC) algorithm, data points are treated as nodes in a graph, which are then clustered using techniques from spectral graph theory. TSC algorithm is made up of three important matrices. Adjaceny Matrix ($A$) defines similarity between any two nodes in the dataset; Degree Matrix ($D$) represents the sum of the weights of all edges connected to a node; Laplacian Matrix($L$) captures the structure of the graph by combining information from the above two matrices. K-Means clustering is applied on the eigenvectors corresponding to the $K$ smallest eigenvalues of $L_{sym}$.

$$L_{sym} = I - D^{-1/2}AD^{-1/2}$$

where:

$I$: Identity matrix of size $MN \times MN$
$MN$: Total number of data points
$A$: $A = Z + Z^{\top}$; $Z$ = thresholded version of $C$; $$C_{ij} = \exp \left[ -2 \cdot \arccos \left( \frac{\mathbf{y}_i^{\top} \mathbf{y}_j}{|\mathbf{y}_i|_2 \cdot |\mathbf{y}_j|_2} \right) \right], \quad \text{for } i,j = 1, \dots, MN.$$
$D$ = $\text{diag}(d), d_i = \sum_{j=1}^{MN}A_{ij} \quad \text{for } i = 1, \dots, MN$

Mean-based Classification:

Mean-based classification classifies data points based on the nearest centroid, where each centroid ($\boldsymbol{\mu}_k$) is the mean of the data points in the corresponding class ($k$).

$$ \boldsymbol{\mu}_k = \frac{1}{N_k} y_n^{(k)} \in \mathbb{R}^{L}, \quad \text{for } k = 1, \dots, K$$

where:

$K$: Number of Classes
$N_k$: Number of data points in class $k$.
$y_n^{(k)}$: $n^{\text{th}}$ training data point in class $k$.

$$\text{classify}(y) = \text{argmin}_{k = {1, \dots, K}} | y - \boldsymbol{\mu}_k|_2^2$$

Subspace-based Classification:

Subspace-based classification classifies data points based on the closest projection onto the defined low-dimensional subspace, where each subspace corresponds to a class. subspace basis ($\mathbf{U}$) are formed by the number of left singular vectors from training data specified by the dimensions ($dim$) intitially chosen for each class ($k$).

$$\mathbf{U}_k^{dim_k} = \mathbf{\hat{U}}[:, 1:dim_k] \quad \text{where } \mathcal{Y}_k = \mathbf{\hat{U}}\mathbf{\hat{\Sigma}}\mathbf{\hat{V}}^{\top} \text{ is an SVD, } \quad \text{for } k = 1, \dots, K$$

where:

$\mathbf{U}_k^{dim_k}:$ Subspace basis corresponding to the class $k$ with dimensions $dim_k$.
$\mathcal{Y}_k \in \mathbb{R}^{L \times N_k}:$ data matrix for the selected $N_k$ training data points in class $k$; $L$ refers to the original feature space dimensions data points are in.

$$\text{classify}(\mathbf{y}) = \text{argmin}_{k \in {1,\dots,K}}; |\mathbf{y}|_2^2 - |(\textbf{U}^{\text{dim}_k}_k)^{\top} \mathbf{y}|_2^2$$

Affine space-based Classification:

Affine space-based classification classifies data points based on the closest projection onto the defined low-dimensional closest affine space, where each class is represented in an affine space formed from a set of basis ($\mathbf{U_k}$) and a centroid ($\boldsymbol{\mu}_k$) which acts as the center for the affine space.

$$\boldsymbol{\mu}_k = \frac{1}{N_k} y_n^{(k)} \in \mathbb{R}^{L}; \mathbf{U}_k^{dim_k} = \mathbf{\hat{U}}[:, 1:dim_k] \quad \text{where } \mathcal{Y}_k - \boldsymbol{\mu}k1{N_k}^{\top} = \mathbf{\hat{U}}\mathbf{\hat{\Sigma}}\mathbf{\hat{V}}^{\top} \text{ is an SVD, } \quad \text{for } k = 1, \dots, K$$

where:

$\mathbf{U}_k^{dim_k}:$ Affine space basis corresponding to the class $k$ with dimensions $dim_k$.
$\boldsymbol{\mu}_k:$ Center of the affine space aka the mean vector formed by the mean of the data points in the corresponding class $k$.

$$\text{classify}(\mathbf{y}) = \text{argmin}_{k \in {1,\dots,K}} ; |\mathbf{y} - [(\textbf{U}_k^{dim_k})(\textbf{U}^{\text{dim}_k}_k)^{\top} (\mathbf{y} - \boldsymbol{\mu}_k) + \boldsymbol{\mu}_k]|_2^2$$.

Name		Name	Last commit message	Last commit date
Latest commit History 262 Commits
Clustering Results		Clustering Results
LaTeX Images		LaTeX Images
NIR_HSI_Coffee_DS		NIR_HSI_Coffee_DS
Pavia		Pavia
Salinas		Salinas
_layouts		_layouts
_sass		_sass
_site		_site
assets		assets
.DS_Store		.DS_Store
.gitignore		.gitignore
10_KSS_Salinas.jl		10_KSS_Salinas.jl
11_KMeans_Pavia.jl		11_KMeans_Pavia.jl
12_Affine_Classification.jl		12_Affine_Classification.jl
13_Affine_Pavia.jl		13_Affine_Pavia.jl
14_KMeans_Salinas.jl		14_KMeans_Salinas.jl
15_Classification_Pavia.jl		15_Classification_Pavia.jl
16_Classification_Salinas.jl		16_Classification_Salinas.jl
17_KAffine_Pavia.jl		17_KAffine_Pavia.jl
18_KAffine_Salinas.jl		18_KAffine_Salinas.jl
19_SSC_Pavia.jl		19_SSC_Pavia.jl
1_SC_Indian Pines.jl		1_SC_Indian Pines.jl
2_SC_ONSCD.jl		2_SC_ONSCD.jl
3_SC_Pavia.jl		3_SC_Pavia.jl
4_Spectral Clustering_Onera.jl		4_Spectral Clustering_Onera.jl
5_Onera_Change_Detection.jl		5_Onera_Change_Detection.jl
6_SC_Coffee.jl		6_SC_Coffee.jl
7_SC_Salinas.jl		7_SC_Salinas.jl
8_KMeans_Pavia.jl		8_KMeans_Pavia.jl
9_KSS_Pavia.jl		9_KSS_Pavia.jl
Change Detection.jl		Change Detection.jl
Data Loading.ipynb		Data Loading.ipynb
Data_Loading.jl		Data_Loading.jl
Data_Visuals.jl		Data_Visuals.jl
GLMakie_Pavia.jl		GLMakie_Pavia.jl
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
KSS_Pavia.html		KSS_Pavia.html
LICENSE		LICENSE
MAT_HSI_Load.jl		MAT_HSI_Load.jl
Manifest.toml		Manifest.toml
Mean_Class_Pavia.ipynb		Mean_Class_Pavia.ipynb
Project.toml		Project.toml
README.md		README.md
Spectral Clustering.jl		Spectral Clustering.jl
Test_Notebook.jl		Test_Notebook.jl
Training_Network.jl		Training_Network.jl
Untitled.ipynb		Untitled.ipynb
_config.yml		_config.yml
figure.png		figure.png
figure1.png		figure1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hyperspectral Image Analysis

K-Means Clustering:

The objective of K-Means is to minimize the within-cluster sum of sqaured errors.

K-Subspaces Clustering:

The objective of K-Subspaces clustering is to minimize the total projection error.

K-Affine spaces Clustering:

The objective of K-Affine spaces clustering is to minimize the total projection error.

Thresholded subspace Clustering:

Mean-based Classification:

Subspace-based Classification:

Affine space-based Classification:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Cristy210/Hyperspectral-Image-Analysis

Folders and files

Latest commit

History

Repository files navigation

Hyperspectral Image Analysis

K-Means Clustering:

The objective of K-Means is to minimize the within-cluster sum of sqaured errors.

K-Subspaces Clustering:

The objective of K-Subspaces clustering is to minimize the total projection error.

K-Affine spaces Clustering:

The objective of K-Affine spaces clustering is to minimize the total projection error.

Thresholded subspace Clustering:

Mean-based Classification:

Subspace-based Classification:

Affine space-based Classification:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages