(ICCV-2025 Official Code) Chimera: Improving Generalist Model with Domain-Specific Experts

[ Paper ] [ Website ] [ Dataset🤗 ] [ Models🤗 ]

News 🔥

🔥 🔥 We are developing the Chimera-enhanced version to achieve superior multimodal reasoning by integrating stronger expert models, effective general models, and optimized fusion strategies
Congratulations, Chimera has been accepted by ICCV-2025!
Release the training code and data recipe
Release the inference code and model checkpoints

🛠️ Installation

Clone this repository:

git clone https://github.com/UniModal4Reasoning/Chimera.git

Create a conda virtual environment and activate it:

conda create -n chimera python=3.9 -y
conda activate chimera

Install dependencies using requirements.txt:
```
pip install -r requirements.txt
```

Install other requirements:

cd chimera/
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Additional Instructions

Install flash-attn==2.3.4:

pip install flash-attn==2.3.4 --no-build-isolation

Alternatively you can compile from source:

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.3.4
python setup.py install

Quick Start

Multi-modal reasoning

from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image

# prepare model
# model_path = "U4R/Chimera-Reasoner-2B"
# model_path = "U4R/Chimera-Reasoner-4B"
model_path = "U4R/Chimera-Reasoner-8B"
generation_config = dict(max_new_tokens=256, do_sample=False)
model = Chimera4easyuse(model_path, dtype = torch.bfloat16, generation_config= generation_config)

# prepare input
image_path = "path/to/image"
user_prompt = "<image>\nuser prompt"
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)

Visual content extraction

from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image

# prepare model
model_path = "U4R/Chimera-Extractor-1B"
generation_config = dict(max_new_tokens=4096, do_sample=False, no_repeat_ngram_size = 20)
model = Chimera4easyuse(model_path, dtype = torch.float16, generation_config= generation_config)

# prepare input
image_path = "path/to/document"
user_prompt = "<image>\nAs a smart PDF to Markdown conversion tool, please convert the content of the provided PDF into Markdown format."
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)

License

Chimera is released under the Apache License 2.0

Citation

If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

@article{peng2024chimera,
  title={Chimera: Improving generalist model with domain-specific experts},
  author={Peng, Tianshuo and Li, Mingsheng and Zhou, Hongbin and Xia, Renqiu and Zhang, Renrui and Bai, Lei and Mao, Song and Wang, Bin and Zhou, Aojun and others},
  journal={arXiv preprint arXiv:2412.05983},
  year={2024}
}

Contact Us

If you encounter any issues or have questions, please feel free to contact us via bo.zhangzx@gmail.com or pengtianshuo@pjlab.org.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
chimera		chimera
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

(ICCV-2025 Official Code) Chimera: Improving Generalist Model with Domain-Specific Experts

News 🔥

🛠️ Installation

Additional Instructions

Quick Start

Multi-modal reasoning

Visual content extraction

License

Citation

Contact Us

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

Alpha-Innovator/Chimera

Folders and files

Latest commit

History

Repository files navigation

(ICCV-2025 Official Code) Chimera: Improving Generalist Model with Domain-Specific Experts

News 🔥

🛠️ Installation

Additional Instructions

Quick Start

Multi-modal reasoning

Visual content extraction

License

Citation

Contact Us

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages