llava

LLaVA: Large Language and Vision Assistant. From data to deployment.

Install

Please note that this is only supported on Linux systems.

1. Clone repository

git clone https://github.com/Lornatang/llava.git
cd llava

2. Install Package

conda create -n llava python=3.11.13 -y
conda activate llava
pip3 install --upgrade pip
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.2/flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip3 install ./flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
pip3 install -e .

Train LLaVA-OneVision (Recommended)

Introduction

pass

Train LLaVA

Introduction

The training is mainly divided into two stages. The first stage is to achieve basic alignment of cross-modal features, referred to as pre-training. The second stage is based on feature alignment and end-to-end fine-tuning to allow the model to learn to follow diverse visual-language instructions and generate responses that meet the requirements.

Data Preparation

The LAION-CC-SBU with BLIP captions dataset with 558k data is used for pre-training.

This is the official mixed dataset annotation, please download it for fine-tuning.

Please place the downloaded files into the datasets directory according to the following format requirements.

- datasets
    - llava_pretrain
        - blip_laion_cc_sbu_558k
            - blip_laion_cc_sbu_558k.json
            - blip_laion_cc_sbu_558k_meta.json
            - images
    - llava_finetune
        - llava_v1_5_mix665k.json
        - coco
            - train2017
        - gqa
            - images
        - ocr_vqa
            - images
        - textvqa
            - train_images
        - vg
            - VG_100K
            - VG_100K_2

Train LLaVA pipeline

You can try different combinations of visual language architectures as shown below!

We use LLM(lmsys/vicuna-13b-v1.5) and VIT(openai/clip-vit-large-patch14-336) for the following examples, but you can also use other LLMs and VIT.

hf download openai/clip-vit-large-patch14-336 --local-dir ./results/pretrained_models/openai/clip-vit-large-patch14-336
hf download lmsys/vicuna-13b-v1.5 --local-dir ./results/pretrained_models/lmsys/vicuna-13b-v1.5
# Stage1: Pretrain (Visual Feature Alignment).
bash ./tools/stage1_pretrain.sh
# Stage2: Full Finetune (Instruction Tuning with Image-Text Pairs)
bash ./tools/stage2_finetune.sh

Acknowledgement

LLaVA: Providing the most original implementation, thanks.
LLaVA-NeXT: Provide many methods that are beneficial to the implementation of this project.
Vicuna: Provides many optional multimodal data processing methods.
Qwen: Provides LLM that is easy to fine-tune and has excellent performance.

Citation

@misc{liu2024llavanext,
    title={LLaVA-NeXT: Improved reasoning, OCR, and world knowledge},
    url={https://llava-vl.github.io/blog/2024-01-30-llava-next/},
    author={Liu, Haotian and Li, Chunyuan and Li, Yuheng and Li, Bo and Zhang, Yuanhan and Shen, Sheng and Lee, Yong Jae},
    month={January},
    year={2024}
}

@misc{liu2023improvedllava,
      title={Improved Baselines with Visual Instruction Tuning}, 
      author={Liu, Haotian and Li, Chunyuan and Li, Yuheng and Lee, Yong Jae},
      publisher={arXiv:2310.03744},
      year={2023},
}

@misc{liu2023llava,
      title={Visual Instruction Tuning}, 
      author={Liu, Haotian and Li, Chunyuan and Wu, Qingyang and Lee, Yong Jae},
      publisher={NeurIPS},
      year={2023},
}

Name		Name	Last commit message	Last commit date
Latest commit History 236 Commits
assets		assets
datasets		datasets
llava		llava
results/pretrained_models		results/pretrained_models
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llava

Install

Train LLaVA-OneVision (Recommended)

Introduction

Train LLaVA

Introduction

Data Preparation

Train LLaVA pipeline

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Lornatang/llava

Folders and files

Latest commit

History

Repository files navigation

llava

Install

Train LLaVA-OneVision (Recommended)

Introduction

Train LLaVA

Introduction

Data Preparation

Train LLaVA pipeline

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages