Skip to content

SJTU-TES/repro-Blip2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blip2

Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, BLIP-2 beats Flamingo on zero-shot VQAv2 (65.0 vs 56.3), establishing new state-of-the-art on zero-shot captioning (on NoCaps 121.6 CIDEr score vs previous best 113.2). Equipped with powerful LLMs (e.g. OPT, FlanT5), BLIP-2 also unlocks the new zero-shot instructed vision-to-language generation capabilities for various interesting applications!

1. Install helper

Please check your nvcc version and cuda version >= 11.7

nvcc -V
nvidia-smi

environment building

# create a new conda environment
conda activate --name blip2 python=3.9
conda activate blip2

# Install the following packages in order
pip install peft==0.9.0
pip install Pillow==10.3.0
pip install Requests==2.31.0
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install tqdm==4.66.2
pip install transformers==4.39.0

2. How to use

2.1 Download Pre-trained Files

Download Blip2's pre-trained files here.

2.2 Just try it!

Modify the text prompts and image in example.py and then run the following command.

CUDA_VISIBLE_DEVICES=0 python example.py

2.3 Reproducibility

  • "Question: what is the main elements in the picture? "

  • "Answer: the eiffel tower"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages