Liked our work? give us a ⭐!
This repository contains an easy-to-use and understand code to fine-tune VLMs (Visual Language Models).
With Vision-Language Models (VLMs), you can ask questions about an image and receive answers. We'll do it on images with visual data representations. Such as graphs and charts. We'll use HuggingFaceM4/ChartQA as the dataset.
In this case, we'll be fine-tuning Qwen/Qwen2-VL-7B-Instruct. We will be using LoRA adapter and 4-bit quantization. Refer to the fine-tune-vlms-qwen.ipynb
file.