Name	Name	Last commit message	Last commit date
Latest commit History 349 Commits
.github	.github
1-Gradient Descendin-Build a Brain	1-Gradient Descendin-Build a Brain
1d-Shortest Path Problem-Theory Dijkstra's Algorithm-Python	1d-Shortest Path Problem-Theory Dijkstra's Algorithm-Python
Building a Brain- NVIDEA	Building a Brain- NVIDEA
Code_for_Neuralearn_Courses	Code_for_Neuralearn_Courses
Design	Design
Exer_5-Factory Task/Exer_5a-Factory Task - Hungarian Method	Exer_5-Factory Task/Exer_5a-Factory Task - Hungarian Method
md's	md's
project_BuildingBrain-gd-sgd	project_BuildingBrain-gd-sgd
project_Predictive, PI, and Gradient Descent Control in TAB Converters for Electric Vehicles	project_Predictive, PI, and Gradient Descent Control in TAB Converters for Electric Vehicles
✍🏻 HandMade EXAM -LP -MathModels - Graphs + Transportation Problem	✍🏻 HandMade EXAM -LP -MathModels - Graphs + Transportation Problem
.gitignore	.gitignore
CITATION.cff	CITATION.cff
CODEOWNERS.txt	CODEOWNERS.txt
CONTRIBUTIBNG.md	CONTRIBUTIBNG.md
LICENSE	LICENSE
README.md	README.md
package-lock.json	package-lock.json
package.json	package.json
requirements.txt	requirements.txt
setup-python.yml	setup-python.yml
test_code.py	test_code.py

🧠 Brain Made of Code, Created with Heart ❤︎

Machine Learning Regression with Gradient Descent and Stochastic Optimization

This project demonstrates how to train a Linear Regression model using both Batch Gradient Descent (GD) and Stochastic Gradient Descent (SGD). The implementation includes Python code, dataset, and detailed visualizations to illustrate convergence behavior, performance comparison, and optimization dynamics.

Ideal for beginners and intermediate learners looking to understand the foundations of machine learning optimization algorithms.

Artificial Neural Networks – Gradient Descent

This repository provides a comprehensive explanation of Artificial Neural Networks (ANNs), focusing on the perceptron and multilayer perceptron (MLP) architectures, and the Gradient Descent algorithm for training. The content is based on the Decreasing-Gradient.pdf document.

Motivation

The human brain processes information in a highly complex, nonlinear, and parallel way, which is fundamentally different from conventional digital computers. For example, tasks such as visual recognition (e.g., recognizing a familiar face in an unfamiliar scene) are performed by the brain in milliseconds, while much simpler tasks can take a conventional computer days to complete.

At birth, a child's brain has a large structure and the ability to develop its own rules through experience. ANNs are computational machines designed to model or simulate the way the brain performs specific tasks or functions of interest.

Historical Context

McCulloch & Pitts (1943): Introduced the first neural network models.
Hebb (1949): Developed the basic model of self-organization.
Rosenblatt (1958): Introduced the perceptron, a supervised learning model.
Hopfield (1982), Rumelhart, Hinton & Williams: Revived the field with symmetric networks for optimization and the backpropagation method.

Artificial Neuron Model

Each artificial neuron receives input signals $X_1, X_2, ..., X_p$ (binary or real values), each multiplied by a weight $w_1, w_2, ..., w_p$ (real values). The neuron computes a weighted sum (activity level):

$$a = w_1 X_1 + w_2 X_2 + \cdots + w_p X_p $$

\a = w_1 X_1 + w_2 X_2 + \cdots + w_p X_p\

The output $y$ is determined by an activation function, such as:

$$ y = \begin{cases} 1, & \text{if } a \geq t \\ 0, & \text{if } a < t \end{cases} $$

y =
\begin{cases}
1, & \text{if } a \geq t \\
0, & \text{if } a < t
\end{cases}

Key Benefits of ANNs

Adaptability through learning
Ability to operate with partial knowledge
Fault tolerance
Generalization
Contextual information processing
Input-output mapping

Application Areas

Pattern classification
Clustering/categorization
Function approximation
Prediction
Optimization
Content-addressable memory
Control systems

Learning Process

ANNs operate in two main phases:

Training Phase: The network learns by adjusting its free parameters (weights) to perform a specific function.
Application Phase: The trained network is used for its intended purpose (e.g., pattern or image classification).

The learning process involves:

Stimulation by the environment (input).
Modification of free parameters (weights) as a result.
The network responds differently due to internal changes.

Learning is governed by a set of pre-established rules (learning algorithm) and a learning paradigm (model).

Error Correction Learning

The output of neuron $k$ at iteration $n$ is $y_k(n)$, and the desired response is $d_k(n)$. The error signal is:

$$ e_k(n) = d_k(n) - y_k(n) $$

The goal is to minimize the cost function (performance index):

$$ E(n) = \frac{1}{2} e_k^2(n) $$

Weights are updated as:

$$ w_{kj}(n+1) = w_{kj}(n) + \Delta w_{kj}(n) $$

The Perceptron

The perceptron, proposed by Rosenblatt (1958), is the simplest type of ANN. It uses supervised learning and error correction to adjust the weight vector. For a perceptron with two inputs and a bias:

The bias allows the threshold value in the activation function to be set, and is updated like any other weight.

Nonlinearities and Activation Functions

Nonlinearities are inherent in most real-world problems.
Incorporated through nonlinear activation functions (e.g., sigmoid, tanh) and multiple layers.
MLPs use sigmoid functions in hidden layers and linear functions in the output layer.

MLP (MultiLayer Perceptron)

Composed of neurons with nonlinear activation functions in intermediate (hidden) layers.
Only the output layer receives a desired output during training.
The error for hidden layers is estimated by the effect they cause on the output error (backpropagation).

Two-Layer Perceptron Architecture

A two-layer perceptron (MLP with one hidden layer and one output layer) can approximate any function, linear or not (Cybenko, 1989).

Layer 1 (Hidden/Intermediate): Each neuron contributes lines (hyperplanes) to form surfaces in input space, "linearizing" the features.
Layer 2 (Output): Neurons combine these lines to form convex regions, enabling complex decision boundaries.

Number of Neurons:

The generalization capacity of the network increases with the number of neurons.
Empirically, 3–5 neurons per layer strike a good balance between modeling power and computational cost.

Layer Types:

Input Layer: Receives input patterns.
Hidden Layer(s): Main processing; feature extraction.
Output Layer: Produces the final result.

Main Concepts and Key Formulas

Neuron Activation:

$a = \sum_{i=1}^{p} w_i X_i$
Output:

$y = f(a)$, where $f$ is the activation function (e.g., sigmoid, tanh)
Error Calculation:

$e_k(n) = d_k(n) - y_k(n)$
Cost Function (Mean Squared Error):

$E(n) = \frac{1}{2} e_k^2(n)$
Weight Update (Gradient Descent):

$w_{kj}(n+1) = w_{kj}(n) + \eta \frac{\partial E(n)}{\partial w_{kj}}$
Backpropagation for Output Layer:

$\delta^{(2)}(t) = (d(t) - y(t)) \cdot f'^{(2)}(u)$
Backpropagation for Hidden Layer:

delta_j^(1)(t) = ( sum_k [ delta_k^(2) * w_kj^(2) ] ) * f'^(1)( u_j^(1))

Training: Two-Phase Process

1. Forward Phase

Initialize learning rate $\eta$ and weight matrix $w$ with random values.
Present input to the first layer.
Each neuron in layer $i$ computes its output, which is passed to the next layer.
The final output is compared to the desired output.
The error for each output neuron is calculated.

Example Calculation:

Forward Computation Example

For input values:

( X_0 = 1 )
( X_1 = 0.43 )
( X_2 = 0.78 )

And example weights:

( w^{(1)}_{00} = 0.45 )
( w^{(1)}_{01} = 0.89 )
etc.

Compute the activations and outputs for each layer using an activation function (e.g., tanh):

Compute pre-activation (input to each hidden neuron):

$$ u_j^{(1)} = \sum_i X_i \cdot w_{ji}^{(1)} $$

Compute activation (output from each hidden neuron):

$y^{(1)}_j = \tanh(u^{(1)}_j)$
Compute output layer pre-activation:
$u^{(2)} = \sum_j y^{(1)}_j w^{(2)}_j$
Output of network:
$y^{(2)} = \tanh(u^{(2)})$
Calculate error:
$e = d - y^{(2)}$
$E = \frac{1}{2} e^2$

2. Backward Phase (Backpropagation)

Start from the output layer.
Each node adjusts its weight to reduce its error.
For hidden layers, the error is determined by the weighted errors of the next layer (chain rule).
Output layer weight update:

$w^{(2)}(t+1) = w^{(2)}(t) + \eta \delta^{(2)} y^{(1)}(t)$

where $\delta^{(2)}(t) = (d(t) - y(t)) \cdot f'^{(2)}(u)$
Hidden layer delta:

 $\delta^{(1)}_j(t) = \left( \sum_k \delta^{(2)}_k w^{(2)}_{kj} \right) \cdot f'^{(1)}(u_j)$

Example: Training a Two-Layer Perceptron

Initialize all weights randomly.
Present an input vector $X$.
Compute outputs for the first (hidden) layer:

$u_j^{(1)} = \sum_i X_i w_{ji}^{(1)}$

$y_j^{(1)} = \tanh(u_j^{(1)})$

Compute output for the second (output) layer:

$u^{(2)} = \sum_j y^{(1)}_j \cdot w^{(2)}_j$

$y^{(2)} = \tanh(u^{(2)})$
Calculate error:

$e = d - y^{(2)}$

$E = \frac{1}{2} e^2$
Backward phase:
- Compute $\delta^{(2)}$ and update output weights.
- Compute $\delta^{(1)}$ for each hidden neuron and update hidden weights.

Why Two Layers and 3–5 Neurons per Layer?

Theoretical Power: Two-layer MLPs can approximate any continuous function (universal approximation theorem).
Practical Simplicity: Most real-world problems rarely require more than two layers.
Cost-Benefit: 3–5 neurons per layer often provide sufficient capacity for generalization without excessive computational cost.

Local Maximum (Local Maxima)

In gradient descent training, the algorithm updates weights to reduce error by following the gradient of the cost function. However, the cost function may have multiple local maxima or minima.

Local Maximum: A point where the cost function has a peak relative to nearby points but is not the absolute highest point globally.
Gradient descent can get "stuck" in local maxima or minima, preventing the network from reaching the best possible solution.
Techniques such as random restarts, momentum, or advanced optimization algorithms help mitigate this problem.

Usage

Artificial Neural Networks, especially perceptrons and MLPs, are widely used in various domains due to their adaptability and ability to model complex nonlinear relationships.

Strengths

Ability to learn from examples and generalize to unseen data.
Fault tolerance and robustness to noisy inputs.
Flexibility to model complex, nonlinear functions.
Parallel processing capability.

Weaknesses

Training can be computationally expensive, especially for large networks.
Susceptible to getting stuck in local minima or maxima.
Requires careful tuning of hyperparameters (learning rate, number of neurons, layers).
Lack of interpretability compared to simpler models.

Additional Relevant Points

Learning Rate (η) Importance

The learning rate $\eta$ controls the step size during weight updates:

If $\eta$ is too large, the training may overshoot minima and fail to converge.
If $\eta$ is too small, training will be very slow and may get stuck in local minima.
Adaptive learning rate methods (e.g., learning rate decay, Adam optimizer) can improve convergence.

Activation Functions

While the document mentions sigmoid and tanh, it is useful to note:

ReLU (Rectified Linear Unit):
Widely used in modern neural networks for faster convergence and to mitigate vanishing gradient problems.
Softmax:
Commonly used in output layers for multi-class classification problems.

Overfitting and Regularization

Neural networks with too many parameters can overfit training data, performing poorly on unseen data.
Techniques such as early stopping, dropout, and L2 regularization help improve generalization.

Batch vs. Online Learning

The document discusses iterative weight updates per sample (online/stochastic gradient descent).
In practice, batch or mini-batch gradient descent is often used for computational efficiency and stability.

Practical Considerations

Data preprocessing (normalization, encoding) is crucial for effective training.
Initialization of weights affects convergence speed and final performance.
Monitoring training with validation sets helps detect overfitting.

References

Content derived from Decreasing-Gradient.pdf.
Classic works by McCulloch & Pitts, Hebb, Rosenblatt, Hopfield, Rumelhart, Hinton & Williams, and Cybenko.
NVIDEA Building a Brain Course
Neuralearn Courses

See alsso our Project:

Predictive, PI, and Gradient Descent Control in TAB Converters for Electric Vehicles

🚛 (Under Construtction)

✌️ Meet the Crew — Under Jah’s Vibes! 🟥🟨🟩

➣ United by Vision

➢ Guided by Jah

➣ Strength in Unity ≽༏≼⊹

Reference

Content derived from Decreasing-Gradient.pdf.
Application of MPC controls with descending gradient and PI in a TAB converter used in electric vehicle powertrains by Atílio Caliari de Lima,PHD.

Feel Free to Reach Out:

💌 Email Me

🛸๋ My Contacts Hub

────────────── ⊹🔭๋ ──────────────

➣➢➤ Back to Top

Uh oh!

License

Mindful-AI-Assistants/brains-made-of-code-ml-gd-sgd

Folders and files

Latest commit

History

Repository files navigation

🧠 Brain Made of Code, Created with Heart ❤︎

Machine Learning Regression with Gradient Descent and Stochastic Optimization

Artificial Neural Networks – Gradient Descent

Motivation

Historical Context

Artificial Neuron Model

Each artificial neuron receives input signals $X_1, X_2, ..., X_p$ (binary or real values), each multiplied by a weight $w_1, w_2, ..., w_p$ (real values). The neuron computes a weighted sum (activity level):

The output $y$ is determined by an activation function, such as:

Key Benefits of ANNs

Application Areas

Learning Process

The learning process involves:

Error Correction Learning

The Perceptron

Nonlinearities and Activation Functions

MLP (MultiLayer Perceptron)

Two-Layer Perceptron Architecture

Main Concepts and Key Formulas

Training: Two-Phase Process

1. Forward Phase

Forward Computation Example

2. Backward Phase (Backpropagation)

Example: Training a Two-Layer Perceptron

Why Two Layers and 3–5 Neurons per Layer?

Local Maximum (Local Maxima)

Usage

Strengths

Weaknesses

Additional Relevant Points

Learning Rate (η) Importance

Activation Functions

Overfitting and Regularization

Batch vs. Online Learning

Practical Considerations

References

See alsso our Project:

Predictive, PI, and Gradient Descent Control in TAB Converters for Electric Vehicles

🚛 (Under Construtction)

✌️ Meet the Crew — Under Jah’s Vibes! 🟥🟨🟩

Reference

Feel Free to Reach Out:

💌 Email Me

🛸๋ My Contacts Hub

Copyright 2025 Mindful-AI-Assistants. Code released under the MIT license.

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages