A Python tool for calculating the number of tokens generated when processing images with various Vision Language Models (VLMs).
- Calculate image tokens for different VLMs
- Support for both existing images and dummy images
- Detailed token analysis including image size and token count
- Easy-to-use command line interface
pip install vt-calc
pip install -e .
pip install -r requirements.txt
After installing with pip install -e .
, you can use the vt-calc
command directly:
# Using an existing image
vt-calc --image path/to/your/image.jpg
# Creating a dummy image with specific dimensions
vt-calc --size 1920 1080
# Specifying a different model
vt-calc --image path/to/your/image.jpg --model-path "model/path"
# Using an existing image
python calculate.py --image path/to/your/image.jpg
# Creating a dummy image with specific dimensions
python calculate.py --size 1920 1080
# Specifying a different model
python calculate.py --image path/to/your/image.jpg --model-path "model/path"
Model | Model size |
---|---|
Qwen2.5-VL | 3B / 7B / 32B / 72B |
Gemma3 | 4B / 12B / 27B |
InternVL3 | 1B / 2B / 8B / 14B / 38B / 78B |
This project is licensed under the MIT License - see the LICENSE file for details.