- 🔥 🔥 We are developing the Chimera-enhanced version to achieve superior multimodal reasoning by integrating stronger expert models, effective general models, and optimized fusion strategies
- Congratulations, Chimera has been accepted by ICCV-2025!
- Release the training code and data recipe
- Release the inference code and model checkpoints
-
Clone this repository:
git clone https://github.com/UniModal4Reasoning/Chimera.git
-
Create a conda virtual environment and activate it:
conda create -n chimera python=3.9 -y conda activate chimera
-
Install dependencies using
requirements.txt
:pip install -r requirements.txt
-
Install other requirements:
cd chimera/ pip install --upgrade pip # enable PEP 660 support pip install -e .
-
Install
flash-attn==2.3.4
:pip install flash-attn==2.3.4 --no-build-isolation
Alternatively you can compile from source:
git clone https://github.com/Dao-AILab/flash-attention.git cd flash-attention git checkout v2.3.4 python setup.py install
from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image
# prepare model
# model_path = "U4R/Chimera-Reasoner-2B"
# model_path = "U4R/Chimera-Reasoner-4B"
model_path = "U4R/Chimera-Reasoner-8B"
generation_config = dict(max_new_tokens=256, do_sample=False)
model = Chimera4easyuse(model_path, dtype = torch.bfloat16, generation_config= generation_config)
# prepare input
image_path = "path/to/image"
user_prompt = "<image>\nuser prompt"
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)
from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image
# prepare model
model_path = "U4R/Chimera-Extractor-1B"
generation_config = dict(max_new_tokens=4096, do_sample=False, no_repeat_ngram_size = 20)
model = Chimera4easyuse(model_path, dtype = torch.float16, generation_config= generation_config)
# prepare input
image_path = "path/to/document"
user_prompt = "<image>\nAs a smart PDF to Markdown conversion tool, please convert the content of the provided PDF into Markdown format."
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)
Chimera is released under the Apache License 2.0
If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)
@article{peng2024chimera,
title={Chimera: Improving generalist model with domain-specific experts},
author={Peng, Tianshuo and Li, Mingsheng and Zhou, Hongbin and Xia, Renqiu and Zhang, Renrui and Bai, Lei and Mao, Song and Wang, Bin and Zhou, Aojun and others},
journal={arXiv preprint arXiv:2412.05983},
year={2024}
}
If you encounter any issues or have questions, please feel free to contact us via bo.zhangzx@gmail.com or pengtianshuo@pjlab.org.cn.