📈 Comparing Gradient Descent Optimizers 📉

This project implements, visualizes, and compares three common variants of the Gradient Descent (GD) optimization algorithm:

Batch Gradient Descent (BGD)
Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent (MBGD)

The comparison is performed on two distinct optimization problems to highlight the strengths and weaknesses of each method under different conditions:

🎯 Problem 1: Simple Linear Regression
- Goal: Find the best-fit line (y = wx + b) for a synthetically generated dataset with noise.
- Nature: A convex optimization problem where the objective is to minimize the Mean Squared Error (MSE) loss function. This serves as a baseline to understand the fundamental behavior of the optimizers on a well-behaved surface.
- Parameters: The slope w and the y-intercept b of the line.
- Expected Outcome: All methods should converge close to the true parameters used to generate the data. We'll analyze convergence speed (per epoch and per update), path stability, and final accuracy.
⛰️ Problem 2: Rosenbrock Function Optimization
- Goal: Find the minimum value of the challenging non-convex Rosenbrock function, often referred to as the "banana function" due to its curved valley shape.
- Nature: A non-convex optimization problem famous for its narrow, parabolic valley leading to the minimum. The minimum is located at (w, b) = (1, 1). Optimizing this function tests the algorithms' ability to navigate difficult terrains with potentially poor gradient directions.
- Parameters: The two input variables w and b of the Rosenbrock function.
- Expected Outcome: Convergence towards (1, 1) can be slow, especially along the valley floor. We expect to observe characteristic behaviors like zig-zagging, sensitivity to learning rates, and differences in how well each method follows the curved valley.

The primary objective is to gain a visual and numerical understanding of the operational dynamics, convergence speed, computational efficiency, and optimization paths taken by each algorithm across these distinct scenarios.

📚 Project Workflow

The project unfolded through the following key stages:

Synthetic Dataset Creation (Linear Regression): 📊
- Generated X and y data based on the linear relationship y = 2x + 5, incorporating random noise to simulate real-world data.
- Defined the cost function: Mean Squared Error (MSE).
- Derived and implemented the function to calculate MSE gradients with respect to the model parameters (w and b).
Implementation of GD Variants (Linear Regression): ⚙️
- BGD: Computed the gradient using the entire dataset in each epoch, performing one parameter update per epoch.
- SGD: Computed the gradient using only one randomly selected data sample at each step, resulting in frequent updates (equal to the number of samples per epoch).
- MBGD: Computed the gradient using a small batch (mini-batch) of data, updating parameters once per mini-batch.
Analysis & Visualization (Linear Regression): 🔍📈
- Metric Tracking: Recorded the cost function value (Loss) per epoch, the total number of parameter updates, and the trajectory of the weights (w, b) in the parameter space.
- Execution Time Measurement: Timed the execution of each optimization algorithm. ⏱️
- Plotting: Generated comparative plots:
  - Loss vs. Epoch (comparing convergence over time).
  - Loss vs. Number of Updates (comparing computational efficiency).
  - Loss function contour plot overlaid with the weight trajectories (visualizing the search path).
  - Final learned regression lines for each method alongside the true line and the original data points.
Introducing a New Challenge: The Rosenbrock Function: 🍌⛰️
- Objective Function Definition: Switched the target to the famous Rosenbrock function f(w, b) = (1 - w)^2 + 100 * (b - w^2)^2, known for its challenging narrow, curved valley with a minimum at (1, 1).
- Rosenbrock Gradient Calculation: Derived and implemented the partial derivatives of the function with respect to w and b.
- Algorithm Adaptation: Adjusted training parameters (significantly smaller learning rate, potentially more epochs) and adapted the SGD/MBGD loops to simulate frequent updates (as there's no data to sample from, the difference lies purely in update frequency).
Execution & Analysis (Rosenbrock): 🔬📉
- Re-ran the three optimization methods on the Rosenbrock function, starting from a defined initial point (e.g., (-1.5, -1.0)).
- Tracked and compared metrics: Rosenbrock function value (instead of Loss), update counts, execution time, and parameter trajectories.
- Generated Similar Plots:
  - Function value vs. Epoch.
  - Function value vs. Number of Updates.
  - Contour plot of the Rosenbrock function showing the optimization paths (highly illustrative for this function, revealing the valley-following challenge).
Final Comparison & Conclusion: ✨
- Summarized the performance of each algorithm on both problems.
- Discussed the trade-offs of each method (speed, stability, learning rate sensitivity, convergence quality).

📁 Project Files

linear_regression_gd.py (or similar): Contains the code for the linear regression problem (Stages 1-3 and the final line plot).
rosenbrock_optimization.py (or similar): Contains the code dedicated to optimizing the Rosenbrock function (Stages 4-5).

🛠️ Prerequisites

Python 3.x 🐍
NumPy: pip install numpy
Matplotlib: pip install matplotlib

▶️ How to Run

Open the Mountaineers_in_Search_of_the_Global_Minimum.ipynb notebook in Google Colab.
- You can open it directly from GitHub (if hosted there) using the Colab integration, or download the .ipynb file and upload it via File -> Upload notebook....
Run the cells sequentially.
- Use the Runtime -> Run all option, or execute each cell individually using Shift + Enter.
The notebook includes explanatory text, code blocks, numerical outputs, and generated plots inline.
The scripts will print numerical results (final parameter values, execution times, update counts) to the console and display the comparative plots.

📊 Expected Results Summary

Linear Regression: All three methods should converge close to the true parameters (w=2, b=5). SGD and MBGD typically show faster initial progress per epoch (due to more updates) but exhibit noisier trajectories. BGD follows a smoother path but might be slower per epoch on large datasets. Overall runtime comparisons can vary based on implementation details and dataset size. The final learned lines should be very close to the true line.
Rosenbrock Function: This function highlights the optimizers' challenges. BGD might progress very slowly along the narrow valley or exhibit significant zig-zagging. SGD and MBGD (due to higher update frequency simulating stochasticity's effect in other contexts) might navigate the curve slightly better but will still converge slowly towards (1, 1) and require careful tuning of the learning rate. The contour plot will clearly visualize this behavior.

This project provides a practical demonstration of the differences between various gradient descent flavors and how the characteristics of the objective function landscape influence their performance.

🔬 Auther

Sayyed Hossein Hosseini DolatAbadi

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Documentation		Documentation
.gitignore		.gitignore
LICENSE		LICENSE
Mountaineers_in_Search_of_the_Global_Minimum.ipynb		Mountaineers_in_Search_of_the_Global_Minimum.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📈 Comparing Gradient Descent Optimizers 📉

📚 Project Workflow

📁 Project Files

🛠️ Prerequisites

▶️ How to Run

📊 Expected Results Summary

🔬 Auther

About

Uh oh!

Releases 1

Packages

Languages

License

Sayed-Hossein-Hosseini/Mountaineers_in_Search_of_the_Global_Minimum

Folders and files

Latest commit

History

Repository files navigation

📈 Comparing Gradient Descent Optimizers 📉

📚 Project Workflow

📁 Project Files

🛠️ Prerequisites

▶️ How to Run

📊 Expected Results Summary

🔬 Auther

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages