-
Notifications
You must be signed in to change notification settings - Fork 12.4k
Closed
Labels
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
"Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene."
Motivation
This would be a great addition to llama.cpp!
The image features look interesting but it can also simply do Text -> Text and a lot of other combinations:
- Text -> Text
- Image -> Image
- Text -> Image
- Image -> Text
- Image -> Text + Image
- Text + Image -> Text
- Text -> Text + Image
- Text + Image -> Text + Image
chameleon.mp4
skoulik, YorkieDev, SolvAI, RachidAR, domsleee and 21 morejoseph777111, PredatorIWD, PaulCapestany, nejxosln1, cobalt32 and 3 morejoseph777111, martindevans, PredatorIWD, Trapper4888, nmandic78 and 8 more