Skip to content

Implementing a softmax-based neural network from scratch using NumPy to classify the Iris dataset, leveraging vectorization, gradient descent, and decision boundary visualization.

Notifications You must be signed in to change notification settings

dpb44/Exploring-the-Intuition-of-Neural-Networks-on-a-Classification-Problem-Using-Only-NumPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

Exploring the Intuition of Neural Networks on a Classification Problem Using Only NumPy

Overview

This project explores the intuition behind neural networks for multiclass classification using only NumPy, without high-level frameworks like TensorFlow or PyTorch. The goal is to classify the three Iris species—Setosa, Versicolor, and Virginica—based on petal and sepal measurements. We built a single-layer neural network using softmax activation, cross-entropy loss, and gradient descent to optimize model parameters.

Image

Key Features:

  • Softmax activation for multi-class classification.
  • Cross-entropy loss function for model optimization.
  • Gradient Descent- Backpropagation to update model parameters.
  • Vectorization and broadcasting for computational efficiency.
  • Decision boundary visualization to analyze model predictions.

Dataset

The dataset consists of 150 samples, each with four numerical features:

  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width

Each sample belongs to one of three classes:

  • Setosa (0)
  • Versicolor (1)
  • Virginica (2)

These features are represented as $X$ in matrix form:

$$ X \in \mathbb{R}^{m \times n_x} $$

where $m = 150$ (50 samples per species) and $n_x = 4$(features per sample).

# Load the dataset using sklearn
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

One-Hot Encoding

Since we are dealing with a multi-class classification problem, we convert categorical labels into one-hot encoded vectors.

import numpy as np

m, K = y.shape[0], 3  # 3 classes

y_one_hot = np.zeros((m, K))
y_one_hot[np.arange(m), y] = 1

This transforms each label into a vector where only the corresponding class index is set to 1.

Model Architecture

We use a single-layer feed-forward neural network with softmax activation.

1. Softmax Function

Since we have three distinct classes, we use the softmax function instead of the sigmoid function. The softmax function is given by:

$$ g_k(\boldsymbol{t}) = \frac{e^{t_k}}{\sum_{j=1}^{K} e^{t_j}} $$

where $\boldsymbol{t} = (t_k)_{k=1}^K$ represents the unnormalized class scores.

This function converts raw scores into probabilites.

# Compute softmax activation
Z = np.dot(W.T, X) + b
numerator = np.exp(Z)
denominator = np.sum(numerator, axis=0, keepdims=True)
y_hat = (numerator / denominator).T

2. Cross-Entropy Loss Function

The loss function quantifies the difference between predicted and true labels:

$$ \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = -\frac{1}{m} \sum_{i=1}^m \sum_{j=1}^K \mathbf{y}_j^{(i)} \log(\widehat{y}^{(i)}_j) $$

# Compute loss
loss = np.sum(y_one_hot * np.log(y_hat), axis=1)
total_cost = - (1/m) * np.sum(loss)

3. Backpropagation Gradient Descent for Optimization

Using backpropagation, we compute gradients for weights and bias updates

$$ \nabla_{\boldsymbol{W}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = \frac{1}{m} (\widehat{\boldsymbol{Y}} - \mathbf{Y}) X^\top $$

$$ \nabla_{\boldsymbol{b}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) = \frac{1}{m} \sum_{i=1}^{m} (\widehat{\boldsymbol{y}}^{(i)} - \mathbf{y}^{(i)}) $$

We update parameters iteratively using:

$$ \boldsymbol{W} := \boldsymbol{W} - \alpha \nabla_{\boldsymbol{W}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) $$

$$ \boldsymbol{b} := \boldsymbol{b} - \alpha \nabla_{\boldsymbol{b}} \mathcal{J}(\boldsymbol{W},\boldsymbol{b}) $$

Image

Note: This image is only to show how the training updates the parameters though back propagation. It is not representative of the single-layer feed-forward neural network we have built.

# Compute gradients
W_grad = np.dot((y_hat - y_one_hot), X) / m
b_grad = np.sum((y_hat - y_one_hot), axis=1) / m

Training the Model

We train the model using gradient descent over multiple iterations.

# Training loop
costs = []
for i in range(iters):
    y_hat = p_model(X, W, b)
    cost = compute_cost(y, y_hat)
    W_grad, b_grad = compute_gradients(X, y, W, b)
    W -= lr * W_grad
    b -= lr * b_grad

    if i % 100 == 0:
        costs.append(cost)
        print(f"Cost after iteration {i}: {cost:.4f}")

Model Evaluation & Results

We tested three feature sets:

Feature Set Accuracy
Petal Measurements 96%
Sepal Measurements 75%
Both Features 98%

Observations:

  • Petal measurements alone perform better than sepal measurements alone.
  • Using both features gives the highest accuracy (98%).
  • The decision boundary was influenced by the number of training iterations and learning rate.

Decision Boundary Visualization

To visualize how the model classifies new data, we plot the decision boundary.

import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                     np.linspace(y_min, y_max, 300))

y_pred = p_model(np.c_[xx.ravel(), yy.ravel()], W_trained, b_trained)
y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)

plt.contourf(xx, yy, y_pred, alpha=0.3, cmap=ListedColormap(['lightgreen', 'pink', 'coral']))
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
plt.title("Decision Boundary")
plt.show()

Image

Image

Decision Boundary Analysis:

  • Petal-only features: Forms well-defined decision regions due to strong separability.
  • Sepal-only features: The model perfoems poorly and is not able to form well-defined boundaires.

Conclusion

  • Petal measurements provide a stronger predictive signal than sepal measurements.
  • Gradient descent, softmax activation, and cross-entropy loss optimize the model effectively.
  • Vectorization and broadcasting improve computational efficiency.
  • Decision boundaries improve with more training iterations and proper hyperparameter tuning for the petal measurements.

This project serves as a minimal yet powerful demonstration of how a neural network can be implemented from scratch, reinforcing mathematical intuition behind classification tasks.


References


Credits & Acknowledgments

This coursework was completed under the guidance of Ms. Tatiana Bubba (Mathematics Professor).

About

Implementing a softmax-based neural network from scratch using NumPy to classify the Iris dataset, leveraging vectorization, gradient descent, and decision boundary visualization.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published