Images synthesized by text-to-image models (e.g. Stable Diffusion) often do not follow the text inputs well. TIFA is a simple tool to evaluate the fine-grained alignment between the text and the image. This repository contains the code and models for our paper TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering. This paper is also accepted to ICCV 2023. Please refer to the project page for a quick overview.
If you find our work helpful, please cite us:
@article{hu2023tifa,
title={TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering},
author={Hu, Yushi and Liu, Benlin and Kasai, Jungo and Wang, Yizhong and Ostendorf, Mari and Krishna, Ranjay and Smith, Noah A},
journal={arXiv preprint arXiv:2303.11897},
year={2023}
}