Author: Thavindu Liyanage
Student ID: W1899297 / 20211175
This project explores and evaluates the effectiveness of three AI-based clustering techniques—K-Means, DBSCAN, and Self-Organizing Maps (SOM)—for forming optimized stock portfolios using daily returns of S&P 500 stocks. The goal is to determine the most effective method for:
- Grouping similar-performing stocks
- Managing outliers
- Enhancing portfolio performance with machine learning
Machine learning is transforming how financial analysts and investors construct and optimize portfolios. Traditional methods like K-Means are commonly used, but this study emphasizes DBSCAN as a superior alternative due to its ability to:
- Identify noise/outliers
- Form arbitrarily-shaped clusters
- Eliminate the need to predefine the number of clusters
Additionally, Artificial Neural Networks (ANNs)—specifically Self-Organizing Maps (SOMs)—are evaluated for their non-linear pattern recognition capabilities in complex financial datasets.
- 2013–2023
- Garman-Klass Volatility – Robust estimator for stock price volatility
- RSI (Relative Strength Index) – Measures price momentum
- Bollinger Bands – Volatility-based envelope of moving averages
- ATR (Average True Range) – Measures market volatility
- MACD (Moving Average Convergence Divergence) – Trend-following momentum indicator
These indicators form the foundation for clustering and analysis.
Clustering Model | Pros | Cons | Best Use Case |
---|---|---|---|
K-Means | Fast, scalable, easy to implement | Assumes spherical clusters, ignores outliers | Works best with well-defined, spherical clusters |
DBSCAN | Handles noise and arbitrary shapes, no need to predefine cluster count | Sensitive to eps and min_samples , struggles with varying densities |
Ideal for irregular cluster shapes and noisy data |
SOM (Self-Organizing Maps) | Handles non-linear patterns, detects outliers, reduces dimensionality | Computationally expensive, requires large datasets, sensitive to hyperparameters | Suitable for complex, high-dimensional datasets |
Python
NumPy
,Pandas
,Matplotlib
,Seaborn
scikit-learn
yfinance
MiniSom
(for SOM implementation)Draw.io
(for system diagrams)
- Data Collection: Gathered daily returns, technical indicators, and volatility metrics for all S&P 500 companies.
- Preprocessing: Cleaned, normalized, and transformed data for clustering models.
- Modeling:
- Applied K-Means and evaluated with Elbow/Silhouette methods.
- Applied DBSCAN using domain-optimized parameters.
- Trained Self-Organizing Map to visualize and detect hidden stock group patterns.
- Evaluation: Compared clustering results using visualizations, interpretability, noise handling, and application to portfolio diversification.