This project focuses on fine-tune model for AO(llama.cpp), this project is an example of using sam's tweet to fine-tune Phi3-Mini-4k-Instruct model. Aims to provide an ai assiant of sam williams.
- GPU Server with CUDA support, at least 32GB of VRAM
- Python 3.10 or higher
- Git submodule version 2.34 or higher
- Huggingface
https://huggingface.co/docs/huggingface_hub/en/guides/cli
- Sign Up & Get AccessToken
https://huggingface.co/docs/hub/en/security-tokens
huggingface-cli login
- [optional]Agree to the terms of model, for example, Phi3
https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- Clone the repository and initialize submodules:
git clone https://github.com/apuslabs/aimodel-finetune-tools.git && cd aimodel-finetune-tools && git submodule update --init --recursive
- Create a virtual environment and activate it:
python3 -m venv .venv && source .venv/bin/activate
- Install the required packages:
pip install -r requirements.txt
We will use the Phi3-Mini-4k-Instruct model for fine-tuning. Here are the steps to fine-tune the model:
- Load dataset
- Split dataset into training and validation sets
- Load the Phi3-Mini-4k-Instruct model from Huggingface
- Define the model and tokenizer
- Define the training arguments and start the fine-tuning process
- Save the fine-tuned model to the specified directory
- Evaluate the fine-tuned model
Fine-tune
python fine-tune.py --dataset ./datasets/finetune-qa.json --model_name microsoft/Phi-3-mini-4k-instruct --output_dir ./models/v3
Evaluate
python evaluate.py --model_dir ./models/v3
To fine-tune based on other pre-trained models, follow these additional steps:
- Use different Model Loader based on the model's architecture and configuration, for Phi3, it's
transformers.AutoModelForCausalLM.from_pretrained
- Adjust the tokenizer settings to match the new model's requirements, ensuring compatibility with the fine-tuning process.
To fine-tune on other datasets, you need to prepare the dataset in the required format and ensure it aligns with the model's input specifications. Adjust the dataset loading and preprocessing steps accordingly to fit the new dataset structure. Modify tokenize_function
function in the fine-tune.py
.
If your GPU server has limited VRAM, consider using mixed precision training or reducing the batch size to fit the model within the available memory.
AO use llama.cpp
as the backend for its AI functionalities, which use gguf format for model storage and inference. Ensure compatibility with the latest version of llama.cpp
to leverage the full capabilities of the fine-tuned model.
Convert to GGUF
python llama.cpp/convert_hf_to_gguf.py models/v3 --outfile gguf/phi3-4k-v0.1.gguf --outtype q8_0
- Upload the fine-tuned model to the Arweave by Ardrive
- Get the Data Tx ID from Arweave transaction
- Using
Llama.load('/data/<your model data tx id>')
to load the fine-tuned model and start the inference process