Neural Network from Scratch (Only Numpy & Math)

Hand-Written Digit Classifier

Here is a simple neural network architecture to recognise hand-written digits based off the famous MNIST dataset

1. About the dataset

The training images are 28 x 28 pixels, 784 pixels in total. Since images are greyscaled, each pixel ranges from 0 to 255. 255 means white, 0 means black.

2. Neural Network Architecture

This is a simple neural network, only with 2 layers.

Input Layer: Contains our 784 nodes, each mapped to a node
Hidden Layer: 10 units, ReLU activation function
Output Layer: 10 output units (each representing 1 digit), Softmax activation function

3. Forward Propagation

$A^{[0]} = X$ is our input layer, there is no processing there, it is just the 784 pixels.
$Z^{[1]} = W^{[1]} A^{[0]} + b^{[1]}$ calculates the linear transformation for the first hidden layer, and introduces weights and biases to the equations.
$A^{[1]} = g(Z^{[1]}) = ReLU(Z^{[1]})$ applies the ReLU activation function to introduce non-linearity.
$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$ computes the linear transformation for the output layer, with weights and biases from the first hidden layer.
$A^{[2]} = softmax(Z^{[2]})$ applies the softmax activation function to produce class probabilities.

4. Backward Propagation

$dZ^{[2]} = A^{[2]} - Y$ calculates the derivative of the cost with respect to $Z^{[2]}$.
$db^{[2]} = \frac{1}{m} \sum dZ^{[2]}$ computes the gradient of the bias for layer 2.
$dZ^{[1]} = W^{[2]T} dZ^{[2]} \cdot g'(Z^{[1]})$ calculates the derivative of the cost with respect to $Z^{[1]}$ using the chain rule.
$dW^{[1]} = \frac{1}{m} dZ^{[1]} X^T$ computes the gradient of the weights for layer 1.
$db^{[1]} = \frac{1}{m} \sum dZ^{[1]}$ calculates the gradient of the bias for layer 1.

5. Updating Parameters after Gradient Descent

$W^{[1]} = W^{[1]} - \alpha dW^{[1]}$ updates the weights for layer 1 using the learning rate $\alpha$ and the gradient $dW^{[1]}$.
$b^{[1]} = b^{[1]} - \alpha db^{[1]}$ updates the bias for layer 1.
$W^{[2]} = W^{[2]} - \alpha dW^{[2]}$ updates the weights for layer 2.
$b^{[2]} = b^{[2]} - \alpha db^{[2]}$ updates the bias for layer 2.

Implications

Overall, this is a pretty simple architecture that acheived a 90% accuracy. Definite improvements can be made, for example the current architecture uses a simple 2-layer neural network (784 -> 10 -> 10). We could onsider adding more layers or increasing the number of neurons in the hidden layer to capture more complex patterns. Other than that, adding batch normalization after the ReLU activation could improve training stability and experimenting with other activation functions like ELU or LeakyReLU for the hidden layer could well optimise performance.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ignore_img		ignore_img
README.md		README.md
neural_net.ipynb		neural_net.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neural Network from Scratch (Only Numpy & Math)

Hand-Written Digit Classifier

1. About the dataset

2. Neural Network Architecture

3. Forward Propagation

4. Backward Propagation

5. Updating Parameters after Gradient Descent

Implications

About

Uh oh!

Releases

Packages

Languages

jingieboy/Neural_Network_From_Scratch

Folders and files

Latest commit

History

Repository files navigation

Neural Network from Scratch (Only Numpy & Math)

Hand-Written Digit Classifier

1. About the dataset

2. Neural Network Architecture

3. Forward Propagation

4. Backward Propagation

5. Updating Parameters after Gradient Descent

Implications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages