Instruction-tuned LoRA adapter for LLaMA 3 8B using QLoRA + Alpaca-style prompts, trained with Unsloth.
This repo hosts the training, evaluation, and inference pipeline for:
Cre4T3Tiv3/unsloth-llama3-alpaca-lora
A 4-bit QLoRA LoRA adapter trained on:
yahma/alpaca-cleaned
- 30+ grounded examples of QLoRA reasoning (added to mitigate hallucinations)
- Base Model:
unsloth/llama-3-8b-bnb-4bit
- Adapter Format: LoRA (merged post-training)
- Training Framework: Unsloth + HuggingFace PEFT
- Training Infra: A100 (40GB), 4-bit quantization
This adapter is purpose-built for:
- Instruction-following LLM tasks
- Low-resource, local inference (4-bit, merged LoRA)
- Agentic tools and CLI assistants
- Educational demos (fine-tuning, PEFT, Unsloth)
- Quick deployment in QLoRA-aware stacks
- Trained on ~2K samples + 3 custom prompts
- Single-run fine-tune only
- Not optimized for >2K context
- 4-bit quantization may reduce fidelity
- Hallucinations possible; not production-ready for critical workflows
- Previously hallucinated QLoRA terms now corrected; tested via eval script
- Still not production-grade for factual QA or critical domains
This repo includes an eval_adapter.py
script that:
- Checks for hallucination patterns (e.g. false QLoRA definitions)
- Computes keyword overlap per instruction (≥4/6 threshold)
- Outputs JSON summary (
eval_results.json
) with full logs
Run
make eval
to validate adapter behavior.
Parameter | Value |
---|---|
Base Model | unsloth/llama-3-8b-bnb-4bit |
Adapter Format | LoRA (merged) |
LoRA r |
16 |
LoRA alpha |
16 |
LoRA dropout |
0.05 |
Epochs | 2 |
Examples | ~2K (alpaca-cleaned + grounded) |
Precision | 4-bit (bnb) |
make install # Create .venv and install with uv
make train # Train LoRA adapter
make eval # Evaluate output quality
make run # Run quick inference
export HUGGINGFACE_TOKEN=hf_xxx
make login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE = "unsloth/llama-3-8b-bnb-4bit"
ADAPTER = "Cre4T3Tiv3/unsloth-llama3-alpaca-lora"
base_model = AutoModelForCausalLM.from_pretrained(BASE, device_map="auto", load_in_4bit=True)
model = PeftModel.from_pretrained(base_model, ADAPTER).merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
prompt = "### Instruction:\nExplain LoRA fine-tuning in simple terms.\n\n### Response:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
🖥 Try the model live via Hugging Face Spaces:
- 📦 Model Hub
- 🧪 Demo Space
- 🧰 Source Code
- 💼 ByteStack Labs
Built with ❤️ by @Cre4T3Tiv3 at ByteStack Labs
If you use this adapter or its training methodology, please consider citing:
@software{unsloth-llama3-alpaca-lora,
author = {Jesse Moses, Cre4T3Tiv3},
title = {Unsloth LoRA Adapter for LLaMA 3 (8B)},
year = {2025},
url = {https://huggingface.co/Cre4T3Tiv3/unsloth-llama3-alpaca-lora},
}
Apache 2.0