We've a new king of the hill on visual llms since today (Sphinx-MLLM) #4096

cmp-nct · 2023-11-16T01:33:46Z

cmp-nct
Nov 16, 2023

Well, hate to tell it given llava is not even 100% completely implemented yet but the release of Sphinx sadly puts llava 1.5 into a corner.
It's also quite manageable in size, 4 q-bit would be just around 5GB in weights.
http://imagebind-llm.opengvlab.com/

I ran a few tests compared to llava 1.5, it didn't hallucinate, it mentioned small details llava can not see and could detect parts correctly llava mixed up.
I've not tested every type of scene, just a few quite difficult ones to specifically see if Sphinx can do stuff llava couldn't. And it was remarkable better.

It appears they have a refined image encoding process where they feed multiple encoders and multiple parts of the image, so the projector learns better about location of objects and it learns a general view of it as well as a detailed zoomed view of it.

That said, it should be possible to also train the llava-1.5 llm model like that.

cmp-nct · 2023-11-16T15:49:03Z

cmp-nct
Nov 16, 2023
Author

Hmm it was moved to discussion, maybe good like that. Though I posted it in enhancements to suggest supporting the model.
Once CLIP is GPU supported that would be the most interesting visual llm to add atm

0 replies

SleepyYui · 2023-11-16T21:01:21Z

SleepyYui
Nov 16, 2023

That would be really nice to have, also I don't think that CLIP is that far from being finished

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

We've a new king of the hill on visual llms since today (Sphinx-MLLM) #4096

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

We've a new king of the hill on visual llms since today (Sphinx-MLLM) #4096

Uh oh!

Uh oh!

cmp-nct Nov 16, 2023

Replies: 2 comments

Uh oh!

cmp-nct Nov 16, 2023 Author

Uh oh!

SleepyYui Nov 16, 2023

cmp-nct
Nov 16, 2023

cmp-nct
Nov 16, 2023
Author

SleepyYui
Nov 16, 2023