Skip to content

tavily-ai/image-caption-evaluator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Caption Evaluator

Compare and benchmark image-to-text models from OpenAI and AWS Bedrock on the XTD10 dataset—measure accuracy, latency, and cost in one place.

Python License: Apache-2.0


Features

  • Automatic dataset setup
    Downloads and extracts the XTD10 multilingual image corpus.
  • Multi-model captioning
    Generates captions using OpenAI GPT-4o variants and AWS Bedrock Nova Lite/Pro.
  • LLM-based evaluation
    Scores generated captions against ground truth via a judge LLM.
  • Comprehensive metrics
    Aggregates accuracy, latency, and cost; exports results as CSV.

Prerequisites

  • Python 3.8+
  • OpenAI API Key — set OPENAI_API_KEY
  • AWS Credentials with Bedrock access — set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY (and AWS_SESSION_TOKEN if required)

Installation

git clone https://github.com/tavily-ai/image-caption-evaluator.git
cd image-caption-evaluator
pip install -r requirements.txt

Usage

python run_evaluation.py

The script will:

  1. Download & extract images (if needed)
  2. Fetch captions for the chosen language
  3. Generate and evaluate captions across all models
  4. Save results.csv with per-image metrics

Output

A CSV with columns:

image_filename model similarity_score latency cost_usd

Use your favorite plotting library to visualize trade-offs.


Contributing

  1. Fork the repo
  2. Create a feature branch
  3. Submit a PR

Ideas welcome:

  • Add new LLM providers
  • Support batching or async evaluation
  • Extend to other vision-language tasks

Contact

Questions or custom integrations? Reach out to Tomer Weiss:


Tavily Logo

Powered by Tavily — The web API built for AI agents

About

A public repo for evaluating image captions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages