This project fine-tunes the Gemma-7b model using LoRA (Low-Rank Adaptation) on a customer support tweet dataset from Hugging Face to generate automated support replies efficiently.
Train a memory-efficient LLM to generate accurate, context-aware customer support responses.
- Transformers
- PEFT for LoRA
- Hugging Face Datasets
- PyTorch
-
Load Dataset & Tokenizer
- Used
mo-customer-support-tweets-945k
dataset. - Loaded and modified the tokenizer with a
[PAD]
token.
- Used
-
Tokenize Dataset
- Tokenized both
input
(customer inquiries) andoutput
(responses). - Used padding and truncation.
- Tokenized both
-
Load & Prepare Gemma Model
- Loaded model in 8-bit with
load_in_8bit=True
. - Configured LoRA (rank=16, alpha=32, dropout=0.1).
- Applied LoRA to attention modules (
q_proj
,v_proj
).
- Loaded model in 8-bit with
-
Training Setup
- Defined
TrainingArguments
(batch size, epochs, learning rate, fp16). - Used
Trainer
andDataCollatorForSeq2Seq
.
- Defined
-
Train & Save
- Trained for 3 epochs with gradient accumulation.
- Saved both model and tokenizer for inference.