micrograd-np: A Tiny Autograd Engine with NumPy

A minimal, educational deep learning library inspired by micrograd, but fully vectorized.

This is a lightweight automatic differentiation engine built on top of NumPy, designed like a tiny version of PyTorch. It supports:

Scalar-to-matrix autodiff with a simple Value class
Vectorized operations with full NumPy broadcasting(+, *, summation)
Neural network layers (Linear, ReLU, Sigmoid) built from scratch

Binary cross-entropy loss and gradient backpropagation

Scaler Value Object	Vectorized Value Object
`Value(1) + 2 # O/P-> 3`	`Value(1) + 2 # O/P-> 3`
`Value([1,2,3]) + 2 # ❌`	`Value([1,2,3]) + 2 # ✅ o/p -> [2,4,5]`
`Value([1,2,3]) * 2 # ❌`	`Value([1,2,3]) * 2 # ✅ o/p -> [2,4,6]`
	and so on with automatic differentiation.

I traind a model to classify given image is cat or not using four layer neural network entierly build up on Micrograd-np engine. I overfitted the model, see results in bellow image TL is true label and MP is model predicted lalbel. It classifies correctly to all the cat images.

Improvements needed

Broadcasting : when you train with shape (20, 209) the Value object will broadcast value to (20,209) and when you pass test set of shape (20, 10) it will give error because during training it has broadcasted it to 209. So this needs to be fixed.
Atomic functions for log, exp, etc.

When you call loss.backward(), what happens behind the scenes? The model generates a directed acyclic graph of each forward pass for a single epoch. For each input training example fed to the network, it will generate a directed acyclic graph of the forward pass. Then, we calculate its loss; the loss node will be added on top of this DAG. When you call loss.backward(), it first generates a reversed topological order list of all the nodes in the DAG. Then, it starts calling each node (object's) _backward() (private) function that calculates the local gradient and multiplies it with the forward node's gradient, which is added to the current node's gradient. This process is done for all elements in the reversed topological list. Finally, it updates each parameter based on gradient values by passing all parameters to the optimizer using model.parameters().

This is how backpropagation works when you call loss.backward() and then call optimizer.step(). That's the Autograd engine, the backbone of modern neural network libraries. Here is the link to PyTorch's Autograd engine.

Micrograd is a replica of the Autograd engine, containing the Value class that supports all these features. It was originally developed by Andrej Karpathy ♥.

Learnings

How the gradients of loss with respect to each parameter of neural nets are generated.
Generates a DAG, then a reversed topological order, and then calculates each parameter's gradients.
Getting all parameters of the model using .parameters() and updating their values.

Structure of Value Class

Can perform following operations

>>> Value(1.0) + Value(2.0)
Value(3.0)

>>> Value(2.0) * Value(3.0)
Value(6.0)

>>> Value(2.9).tanh()
Value()

Value Class

Attributes

data

Data of the object.

grad

Contains the gradient.
By default its 0.0

Functions

__rper__

Print formatted data of the Value object instead of object address.

Operations

add
multiply

Negative (-a)

   def __neg__(self):
        return self * -1

Negating the value.

eg

a # 10
(-a) #-10

Subtraction (a-b)

   def __sub__(self, other):
        return self + (-other) # (-other) is the negative operation

Reusing the addition operation by passing other value as its oposite value by negating (-other) to get subtracted value. And there's no need of differentiation method ._backward() beacause we are reusing the + operation and it has ._backward().

Eg:

a = 10
b = 5

a + (-b) # 5
a - b    # 5

Pow ($ x^y $)

   def __pow__(self, other):
        assert isinstance(other, (int, float))
        out = Value(self.data**other, (self, ), f"**{other}")

        def _backward():
                self.grad += (other * self.data**(other-1)) * out.grad
        out._backward = _backward
        return out

Overriding pow function as a helper function for divide. Accepting only int or float as a pow value. Derivation of the power function is $ nx^{n-1} $ that is the power rule.

Eg:

a = Value(10)
a**2 # Value(data=100)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
dataset		dataset
engine		engine
old		old
README.md		README.md
cat_or_not.ipynb		cat_or_not.ipynb
model.py		model.py
train_cat_or_not.py		train_cat_or_not.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

micrograd-np: A Tiny Autograd Engine with NumPy

Improvements needed

Learnings

Structure of Value Class

Value Class

Attributes

Functions

Operations

Negative (-a)

Subtraction (a-b)

Pow ($ x^y $)

Important Clarifications

Q1. Why accumulating the gradients

About

Uh oh!

Releases

Packages

Languages

srkds/Micrograd-Autograd-Engine-implementation

Folders and files

Latest commit

History

Repository files navigation

micrograd-np: A Tiny Autograd Engine with NumPy

Improvements needed

Learnings

Structure of Value Class

Value Class

Attributes

Functions

Operations

Negative (-a)

Subtraction (a-b)

Pow ($ x^y $)

Important Clarifications

Q1. Why accumulating the gradients

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages