This project explores the intuition behind neural networks for multiclass classification using only NumPy, without high-level frameworks like TensorFlow or PyTorch. The goal is to classify the three Iris species—Setosa, Versicolor, and Virginica—based on petal and sepal measurements. We built a single-layer neural network using softmax activation, cross-entropy loss, and gradient descent to optimize model parameters.
- Softmax activation for multi-class classification.
- Cross-entropy loss function for model optimization.
- Gradient Descent- Backpropagation to update model parameters.
- Vectorization and broadcasting for computational efficiency.
- Decision boundary visualization to analyze model predictions.
The dataset consists of 150 samples, each with four numerical features:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
Each sample belongs to one of three classes:
- Setosa (0)
- Versicolor (1)
- Virginica (2)
These features are represented as
where
# Load the dataset using sklearn
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris.data, iris.target
Since we are dealing with a multi-class classification problem, we convert categorical labels into one-hot encoded vectors.
import numpy as np
m, K = y.shape[0], 3 # 3 classes
y_one_hot = np.zeros((m, K))
y_one_hot[np.arange(m), y] = 1
This transforms each label into a vector where only the corresponding class index is set to 1.
We use a single-layer feed-forward neural network with softmax activation.
Since we have three distinct classes, we use the softmax function instead of the sigmoid function. The softmax function is given by:
where
This function converts raw scores into probabilites.
# Compute softmax activation
Z = np.dot(W.T, X) + b
numerator = np.exp(Z)
denominator = np.sum(numerator, axis=0, keepdims=True)
y_hat = (numerator / denominator).T
The loss function quantifies the difference between predicted and true labels:
# Compute loss
loss = np.sum(y_one_hot * np.log(y_hat), axis=1)
total_cost = - (1/m) * np.sum(loss)
Using backpropagation, we compute gradients for weights and bias updates
We update parameters iteratively using:
Note: This image is only to show how the training updates the parameters though back propagation. It is not representative of the single-layer feed-forward neural network we have built.
# Compute gradients
W_grad = np.dot((y_hat - y_one_hot), X) / m
b_grad = np.sum((y_hat - y_one_hot), axis=1) / m
We train the model using gradient descent over multiple iterations.
# Training loop
costs = []
for i in range(iters):
y_hat = p_model(X, W, b)
cost = compute_cost(y, y_hat)
W_grad, b_grad = compute_gradients(X, y, W, b)
W -= lr * W_grad
b -= lr * b_grad
if i % 100 == 0:
costs.append(cost)
print(f"Cost after iteration {i}: {cost:.4f}")
We tested three feature sets:
Feature Set | Accuracy |
---|---|
Petal Measurements | 96% |
Sepal Measurements | 75% |
Both Features | 98% |
- Petal measurements alone perform better than sepal measurements alone.
- Using both features gives the highest accuracy (98%).
- The decision boundary was influenced by the number of training iterations and learning rate.
To visualize how the model classifies new data, we plot the decision boundary.
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
np.linspace(y_min, y_max, 300))
y_pred = p_model(np.c_[xx.ravel(), yy.ravel()], W_trained, b_trained)
y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
plt.contourf(xx, yy, y_pred, alpha=0.3, cmap=ListedColormap(['lightgreen', 'pink', 'coral']))
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
plt.title("Decision Boundary")
plt.show()
- Petal-only features: Forms well-defined decision regions due to strong separability.
- Sepal-only features: The model perfoems poorly and is not able to form well-defined boundaires.
- Petal measurements provide a stronger predictive signal than sepal measurements.
- Gradient descent, softmax activation, and cross-entropy loss optimize the model effectively.
- Vectorization and broadcasting improve computational efficiency.
- Decision boundaries improve with more training iterations and proper hyperparameter tuning for the petal measurements.
This project serves as a minimal yet powerful demonstration of how a neural network can be implemented from scratch, reinforcing mathematical intuition behind classification tasks.
- Iris Dataset - UCI Machine Learning Repository
- Softmax Regression - Stanford CS229
- Medium Article by Srija Neogi - Exploring Multi-Class Classification using Deep Learning
- Medium Article by LM Po - Backpropagation: The Backbone of Neural Network Training (Back Propagation Image)
This coursework was completed under the guidance of Ms. Tatiana Bubba (Mathematics Professor).