Machine Learning Classifier Comparison

A comprehensive Java implementation of various machine learning algorithms for handwritten digit recognition, developed as coursework for CST 3170.

Overview

This project implements and compares 15+ different machine learning classifiers on a handwritten digit recognition task using 8x8 grayscale images. The implementation demonstrates fundamental ML concepts including:

Instance-based learning (k-NN variants)
Neural networks (MLP, Perceptron)
Support Vector Machines (Linear, Kernel-based)
Tree-based methods (Decision Trees, Random Forest, Gradient Boosting)
Ensemble methods (Voting, Hybrid classifiers)

Features

Pure Java Implementation: All algorithms implemented from scratch without external ML libraries
Comprehensive Evaluation: 2-fold cross-validation with detailed performance metrics
Modular Architecture: Clean interface-based design for easy extension
Automated Experiments: One-click script to run all experiments
Result Logging: Automatic saving of results with timestamps

Implemented Classifiers

Instance-Based Learning

1-Nearest Neighbor: Basic nearest neighbor classifier
k-Nearest Neighbors: Configurable k parameter (k=3,5,7)
Weighted k-NN: Distance-weighted voting mechanism

Neural Networks

Multi-Layer Perceptron (MLP):
- Single hidden layer with ReLU activation
- Softmax output layer
- Backpropagation training
Multi-class Perceptron: Direct multi-class classification

Support Vector Machines

Linear SVM: One-vs-All approach for multi-class
SVM with Centroid Features: Enhanced feature space
RBF Kernel SVM: Non-linear classification using Radial Basis Function

Tree-Based Methods

Decision Tree: Entropy-based splitting criteria
Random Forest: Ensemble of 10 trees with feature randomization
Gradient Boosted Trees: 50 trees with softmax for multi-class

Ensemble Methods

Voting Classifier: Combines NN, MLP, and Weighted k-NN
Hybrid Classifier: Switches between NN and MLP based on distance threshold

Dataset

The project uses two datasets (dataSet1.csv and dataSet2.csv) containing:

64 features: Pixel values from 8x8 grayscale images
10 classes: Digits 0-9
Format: CSV with features in columns 1-64, labels in column 65

Getting Started

Prerequisites

Java Development Kit (JDK) 8 or higher
Command line terminal (bash/cmd)

Installation

Clone the repository:

git clone https://github.com/yourusername/Machine-Learning-Course-Work-CST-3170.git
cd Machine-Learning-Course-Work-CST-3170

Ensure datasets are in the datasets/ directory

Running Experiments

Windows:

run_experiments.bat

Linux/Mac:

./run_experiments.sh

The script will:

Create necessary directories (out/, results/)
Compile all Java files
Run all experiments
Save results to results/experiment_results_[timestamp].txt

Manual Compilation and Execution

# Compile
javac -d out src/*.java

# Run
cd out
java Main

Project Structure

Machine-Learning-Course-Work-CST-3170/
├── src/                          # Source code
│   ├── Classifier.java          # Base interface
│   ├── Main.java               # Experiment runner
│   ├── DataLoader.java         # CSV data loading
│   ├── Utils.java              # Utility functions
│   ├── DistanceCalculator.java # Distance metrics
│   ├── NearestNeighborClassifier.java
│   ├── KNearestNeighborsClassifier.java
│   ├── WeightedKNearestNeighborsClassifier.java
│   ├── MLPClassifier.java
│   ├── MulticlassPerceptronClassifier.java
│   ├── LinearSVMClassifier.java
│   ├── MulticlassSVMClassifier.java
│   ├── KernelSVMClassifier.java
│   ├── MulticlassKernelSVMClassifier.java
│   ├── LinearKernel.java
│   ├── RBFKernel.java
│   ├── DecisionTreeClassifier.java
│   ├── SimpleDecisionTreeClassifier.java
│   ├── RandomForestClassifier.java
│   ├── GradientBoostedTreesClassifier.java
│   ├── MultiClassGradientBoostedTreesClassifier.java
│   ├── SimpleVotingClassifier.java
│   └── HybridClassifier.java
├── datasets/                    # Data files
│   ├── dataSet1.csv
│   └── dataSet2.csv
├── results/                     # Output directory (created on run)
├── run_experiments.sh          # Linux/Mac runner
├── run_experiments.bat         # Windows runner
└── README.md                   # This file

Performance Metrics

Each classifier is evaluated on:

Accuracy: Percentage of correct predictions
Training Time: Time to train the model (ms)
Evaluation Time: Time to make predictions (ms)
Confusion Matrix: Detailed classification results

Results are automatically ranked by accuracy with a summary showing:

Best and worst performers
Performance gap analysis
Fastest training algorithm

Example Output

========== NEURAL NETWORK CLASSIFIERS ==========

--- MLP (100 hidden units) ---

Fold 1:
  Training...
  Training time: 2341 ms
  Evaluating...
  Accuracy: 97.85%
  Evaluation time: 45 ms

Average Results:
  Average Accuracy: 97.32%
  Average Training Time: 2298 ms
  Average Evaluation Time: 42 ms

Extending the Project

To add a new classifier:

Create a new class implementing the Classifier interface:

public class MyClassifier implements Classifier {
    @Override
    public void train(int[][] features, int[] labels) {
        // Training logic
    }
    
    @Override
    public int predict(int[] features) {
        // Prediction logic
        return predictedLabel;
    }
}

Add an experiment in Main.java:

runExperiment("My Classifier", 
    (features, labels, numClasses) -> new MyClassifier());

Key Algorithms and Techniques

Feature Engineering

Centroid Features: Distances to class centroids added as features
Min-Max Scaling: Normalization support in Utils class

Distance Metrics

Euclidean Distance: Standard L2 distance

Neural Network Features

ReLU Activation: For hidden layers
Softmax Output: For probability distribution
Cross-Entropy Loss: For multi-class classification
Mini-batch Training: Efficient gradient updates

Tree-Based Features

Entropy-based Splitting: Information gain criteria
Bootstrap Sampling: For Random Forest
Feature Randomization: Reduces overfitting
Gradient Boosting: Sequential error correction

Contributing

This is a coursework project, but suggestions and improvements are welcome:

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is developed for educational purposes as part of CST 3170 coursework.

Acknowledgments

Course: CST 3170 - Machine Learning
Dataset: Simplified handwritten digit recognition (8x8 pixels)
Implementation: All algorithms implemented from scratch for learning purposes

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
.idea		.idea
datasets		datasets
src		src
.gitignore		.gitignore
Machine Learning Course Work CST 3170.iml		Machine Learning Course Work CST 3170.iml
PERFORMANCE.md		PERFORMANCE.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
SETUP.md		SETUP.md
requirements.txt		requirements.txt
run_experiments.bat		run_experiments.bat
run_experiments.sh		run_experiments.sh
visualize_results.py		visualize_results.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Classifier Comparison

Overview

Features

Implemented Classifiers

Instance-Based Learning

Neural Networks

Support Vector Machines

Tree-Based Methods

Ensemble Methods

Dataset

Getting Started

Prerequisites

Installation

Running Experiments

Windows:

Linux/Mac:

Manual Compilation and Execution

Project Structure

Performance Metrics

Example Output

Extending the Project

Key Algorithms and Techniques

Feature Engineering

Distance Metrics

Neural Network Features

Tree-Based Features

Contributing

License

Acknowledgments

About

Uh oh!

Uh oh!

Languages

Chris0Jeky/Java-ML-Classifiers-From-Scratch

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Classifier Comparison

Overview

Features

Implemented Classifiers

Instance-Based Learning

Neural Networks

Support Vector Machines

Tree-Based Methods

Ensemble Methods

Dataset

Getting Started

Prerequisites

Installation

Running Experiments

Windows:

Linux/Mac:

Manual Compilation and Execution

Project Structure

Performance Metrics

Example Output

Extending the Project

Key Algorithms and Techniques

Feature Engineering

Distance Metrics

Neural Network Features

Tree-Based Features

Contributing

License

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages