Skip to content

Alpha-Innovator/Chimera

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(ICCV-2025 Official Code) Chimera: Improving Generalist Model with Domain-Specific Experts

[ Paper ] [ Website ] [ Dataset🤗 ] [ Models🤗 ]

News 🔥

  • 🔥 🔥 We are developing the Chimera-enhanced version to achieve superior multimodal reasoning by integrating stronger expert models, effective general models, and optimized fusion strategies
  • Congratulations, Chimera has been accepted by ICCV-2025!
  • Release the training code and data recipe
  • Release the inference code and model checkpoints

🛠️ Installation

  • Clone this repository:

    git clone https://github.com/UniModal4Reasoning/Chimera.git
  • Create a conda virtual environment and activate it:

    conda create -n chimera python=3.9 -y
    conda activate chimera
  • Install dependencies using requirements.txt:

    pip install -r requirements.txt
  • Install other requirements:

    cd chimera/
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .

Additional Instructions

  • Install flash-attn==2.3.4:

    pip install flash-attn==2.3.4 --no-build-isolation

    Alternatively you can compile from source:

    git clone https://github.com/Dao-AILab/flash-attention.git
    cd flash-attention
    git checkout v2.3.4
    python setup.py install

Quick Start

Multi-modal reasoning

from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image

# prepare model
# model_path = "U4R/Chimera-Reasoner-2B"
# model_path = "U4R/Chimera-Reasoner-4B"
model_path = "U4R/Chimera-Reasoner-8B"
generation_config = dict(max_new_tokens=256, do_sample=False)
model = Chimera4easyuse(model_path, dtype = torch.bfloat16, generation_config= generation_config)

# prepare input
image_path = "path/to/image"
user_prompt = "<image>\nuser prompt"
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)

Visual content extraction

from chimera.chimera_infer import Chimera4easyuse
import torch
from PIL import Image

# prepare model
model_path = "U4R/Chimera-Extractor-1B"
generation_config = dict(max_new_tokens=4096, do_sample=False, no_repeat_ngram_size = 20)
model = Chimera4easyuse(model_path, dtype = torch.float16, generation_config= generation_config)

# prepare input
image_path = "path/to/document"
user_prompt = "<image>\nAs a smart PDF to Markdown conversion tool, please convert the content of the provided PDF into Markdown format."
input_image = Image.open(image_path).convert('RGB')
response = model.get_response(user_prompt, [input_image])
print(response)

License

Chimera is released under the Apache License 2.0

Citation

If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)

@article{peng2024chimera,
  title={Chimera: Improving generalist model with domain-specific experts},
  author={Peng, Tianshuo and Li, Mingsheng and Zhou, Hongbin and Xia, Renqiu and Zhang, Renrui and Bai, Lei and Mao, Song and Wang, Bin and Zhou, Aojun and others},
  journal={arXiv preprint arXiv:2412.05983},
  year={2024}
}

Contact Us

If you encounter any issues or have questions, please feel free to contact us via bo.zhangzx@gmail.com or pengtianshuo@pjlab.org.cn.

About

(ICCV-2025 Official Code)) Improving Generalist Model with Domain-Specific Experts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •