This project demonstrates how to fine-tune the lightweight Qwen3-0.6B language model using the Unsloth library. It combines Chain-of-Thought (CoT) reasoning and chat-style datasets for enhanced multiturn reasoning and conversation abilities using efficient LoRA adapters and 4-bit quantization.
- ✅ Based on Qwen3-0.6B – a fast, efficient LLM from Alibaba
- ✅ Uses Unsloth for blazing-fast fine-tuning and LoRA support
- ✅ Incorporates reasoning and non-reasoning data for balanced learning
- ✅ Supports thinking-mode inference using
<think>
tags - ✅ Designed for low-resource environments with 4-bit quantization
- ✅ Modular, extensible, and easy to adapt to your own datasets
Install required libraries:
pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
pip install --no-deps unsloth
Dataset | Type | Description |
---|---|---|
unsloth/OpenMathReasoning-mini |
Reasoning | Math problems with Chain-of-Thought answers |
mlabonne/FineTome-100k |
Chat/Non-CoT | Standard ShareGPT-style conversations |
You can easily swap in your own datasets using the standardized format.
- Load and Quantize the Model
- Add LoRA adapters
- Prepare reasoning and non-reasoning data
- Balance the dataset with a configurable
chat_percentage
- Tokenize and format inputs using Unsloth’s chat template
- Train using TRL’s
SFTTrainer
- Test inference with and without
<think>
tags - Save LoRA adapters locally or push to the Hugging Face Hub
messages = [{"role": "user", "content": "Solve (x + 2)^2 = 0."}]
text_input = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True, # Set to False for non-CoT response
)
After training:
model.save_pretrained("qwen3_0.6b_reasoning_chat_lora")
tokenizer.save_pretrained("qwen3_0.6b_reasoning_chat_lora")
Optionally, push to the 🤗 Hub:
model.push_to_hub("your-username/qwen3_0.6b_reasoning_chat_lora", token="your_token")
tokenizer.push_to_hub("your-username/qwen3_0.6b_reasoning_chat_lora", token="your_token")
You can tweak training and adapter parameters in the script:
# LoRA config
r = 32
lora_alpha = 32
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
# SFT config
batch_size = 2
gradient_accumulation = 4
learning_rate = 2e-4
max_steps = 30
With Thinking Enabled (<think>
):
<think>
Let’s solve (x + 2)^2 = 0.
Take the square root of both sides:
x + 2 = 0
=> x = -2
</think>
x = -2
Without Thinking:
x = -2
unsloth
transformers
,trl
,datasets
bitsandbytes
,xformers
,peft
torch
,sentencepiece
- Unsloth for efficient fine-tuning
- Alibaba Qwen team for the model
- Mlabonne and others for open datasets
This project follows the Apache 2.0 License, as per the base model and Unsloth.