Merge branch 'development' of https://github.com/shaheennabi/Production-Ready-Instruction-Finetuning-of-Meta-Llama-3.1-8B-Project into development

shaheennabi · shaheennabi · commit 96ca6b19cbae · 2024-12-01T12:21:14.000-08:00
diff --git a/README.md b/README.md
@@ -1,5 +1,4 @@
 # 🎋🌿 **Production-Ready Instruction Fine-Tuning of Meta LLaMA 3.2 3B Instruct Project** 🌿🎉  
-updating soon: 
 
 ## **Problem Statement**  
 ---  
@@ -29,11 +28,11 @@ To achieve this, we are leveraging the **Hugging Face dataset** `charanhu/kannad
 
 ### **My Role as a Developer** 🎋  
 
-As a developer, I am responsible for delivering a fine-tuned **LLaMA 3.2 3B** model that aligns with the defined **Key Performance Indicator (KPI)** objectives and ensures exceptional performance for Kannada-speaking users.  
+As a developer, I am responsible for delivering a Instruction fine-tuned **LLaMA 3.2 3B** model that aligns with the defined **Key Performance Indicator (KPI)** objectives and ensures exceptional performance for Kannada-speaking users.  
 
 - I will **instruct fine-tune** the model using the high-quality **Kannada dataset** from **Hugging Face** (`charanhu/kannada-instruct-dataset-390k`).  
 
-- To address the constraints of **limited GPU resources**, I will implement **QLoRA-based 4-bit precision quantization** using **BitsAndBytes**, which involves:  
+- To address the constraints of **limited GPU resources**, I will implement **QLoRA-based 4-bit precision quantization** using **Unsloth**, which involves:  
   - First **quantizing the model** to 4-bit precision to reduce computational overhead.  
   - Adding **LoRA (Low-Rank Adaptation) layers** to fine-tune the model efficiently within **Google Colab**, ensuring optimal resource utilization without compromising performance.  
 
@@ -147,19 +146,101 @@ Remember: For this project **Pipeline** is going to be seprated in two different
 ![Finetuning Pipeline](https://github.com/user-attachments/assets/bc09764b-b5a1-4614-b872-cc6d9cd88bdc)
 
 
+*Note: Fine-tuning code will be entirely modular, but I have used **Google Colab** for training, if you have high-end machine make sure you execute **pipeline** in modular fashin*
 
 ## Fine-tuning Pipeline 💥
 ---
+### Installing the required libraries
+* Unsloth gives a lot of issues while installing, so execute these code cells one by one in sequence to avoid any problems.
 
+````bash
+# Run this first (cell 1)
+!python -m pip install --upgrade pip
+!pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+!pip install xformers[torch2]  # Install xformers built for PyTorch 2.x
+!pip install "unsloth[colab] @ git+https://github.com/unslothai/unsloth.git"
+!pip install "git+https://github.com/huggingface/transformers.git"
+!pip install trl
+!pip install boto3
+````
 
----
+```bash
+# Run this cell (cell 2)
+!pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # Upgrade PyTorch to a compatible version
+!pip install xformers  # Install xformers after upgrading PyTorch
+```
+
+```bash
+# cell 3
+!pip uninstall torch torchvision torchaudio -y  # Uninstall existing PyTorch, torchvision, and torchaudio
+!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # Install PyTorch, torchvision, and torchaudio with CUDA 11.8
+```
+
+```bash
+# cell 4
+!pip uninstall xformers -y
+!pip install xformers[torch2]  # Install xformers built for PyTorch 2.x
+```
+
+### Importing Necessary Libraries
+
+<img width="656" alt="Importing Necessary Libraries" src="https://github.com/user-attachments/assets/dfb4fdee-0513-4202-b5d1-167e15689354">
+
+###  Loading the Model
+
+<img width="640" alt="Loading  Model" src="https://github.com/user-attachments/assets/89013450-1bb1-4a29-9ad4-2a620004064e">
+
+### Applying  Lora layers
+
+<img width="620" alt="Applying  Lora" src="https://github.com/user-attachments/assets/062a2115-d24d-4ede-9c83-2fc9665cdaa1">
+
+### Data Preparation
+
+<img width="920" alt="Dataset Preparation" src="https://github.com/user-attachments/assets/869f6569-df05-455f-bd7e-ba71dc036593">
+
+### Data Formatting(what model expects for instruction tuning)
+
+<img width="920" alt="Prompt Formatting" src="https://github.com/user-attachments/assets/58f7c5cf-945a-43d7-a9cf-670eee3261e6">
+
+
+###  Training Configurations
+
+<img width="614" alt="Training Configuration" src="https://github.com/user-attachments/assets/956acc04-ac6f-497b-9c12-9cc33b70301b">
 
-## Deployment/Inference Pipeline 💥
----
+
+### Model Training
+
+<img width="856" alt="Model  Training" src="https://github.com/user-attachments/assets/075ee343-8412-4ad4-bb4b-dd569663c4fd">
+
+### Inference
+
+<img width="713" alt="Inference  1" src="https://github.com/user-attachments/assets/189c2d17-9026-4cb3-bdfb-95435b075fae">
+
+<img width="901" alt="Inference 2" src="https://github.com/user-attachments/assets/ea31462b-9e1c-4575-9120-5390cfbc23e2">
+
+### Saving the Model & Tokenizer
+
+<img width="453" alt="Saving the model and tokenizer" src="https://github.com/user-attachments/assets/f6eb0858-f51e-452d-a65b-83945537e487">
+
+### Merging base model & finetuned lora layers
+
+<img width="557" alt="Merge base model and finetuned layers" src="https://github.com/user-attachments/assets/15d66a2b-dfb9-471c-8fe0-9b13640d45e4">
+
+
+### Pushing Model & Tokenizer to S3 Bucket
+
+
+<img width="399" alt="Pushing to s3 1" src="https://github.com/user-attachments/assets/06948b95-59a6-4ad5-b530-90e075cc88f9">
+
+
+<img width="527" alt="Pushing to s3 2" src="https://github.com/user-attachments/assets/2d944deb-b2f1-475a-834e-d462bb08fffb">
+
+<img width="505" alt="Pushing to s3 3" src="https://github.com/user-attachments/assets/7fd11f13-57f2-43b0-b3e9-918e89b91b12">
 
 
 ---
 
+
 ## Ok, so now let's Talk about the Deployment/Inference Pipeline  🚀
 
 *This is the diagram, of how the pipeline will look:*