LLaVA LLaVA LLaVA LLaVA #3600

mirek190 · 2023-10-12T16:52:18Z

mirek190
Oct 12, 2023

Finally LLaVA under llamacpp !

If someone do not know LLaVA is for picture recognition and maybe for video in the furfure :D

ianscrivener · 2023-10-12T20:37:05Z

ianscrivener
Oct 12, 2023

Hooray !!!

0 replies

ianscrivener · 2023-10-12T21:19:30Z

ianscrivener
Oct 12, 2023

Is there a plan to implement grammer for LLaVA? That will be very handy, powerful!

0 replies

damian0815 · 2023-10-13T07:49:02Z

damian0815
Oct 13, 2023

i am taking a stab at llama-cpp-python binding support and then LMQL support. taking a leaf from the text-generation-webui book and will try to encode the image as a base64 blob that can be embedded in the prompt string eg <img src="data:image/jpeg;base64,{img_str}"> where img_str is a base64 encoding of the image (ref https://github.com/oobabooga/text-generation-webui/tree/main/extensions/multimodal).

i'm not sure if this work has already been started elsewhere, please lmk if this is double effort ha

0 replies

damian0815 · 2023-10-13T10:24:43Z

damian0815
Oct 13, 2023

work on the above started here (draft state): #3613

0 replies

Jipok · 2023-10-13T19:00:37Z

Jipok
Oct 13, 2023

Any chance for support CogVLM?
It seems to be much superior to llava in terms of capabilities.

4 replies

mirek190 Oct 13, 2023
Author

WOW - made tests - that model is insane!! Better that GPT4 vision ..how is that even possible?

ianscrivener Oct 13, 2023

Agree! CogVLM is amazing. I'd love to be able to run it on my MacOS M2 16Gb!!!

However.. the model files are around 28Gb each zipped. There are five (5) models... I am not sure how many models are required for CogVLM to run. I am wondering if the very large model size means that there is not a use case for "running CogLVM at the egde"... running on consumer grade hardware.

mirek190 Oct 13, 2023
Author

is 28 GB because is not quantized - I cheeked already ;)
btw - who is playing with AI models having 16 GB ram ... lol

ianscrivener Oct 13, 2023

Yes, quantization would substantially reduce the size...

16Gb RAM - me!
Checkout the llama.cpp Manifesto.. "inference at the edge"

aiaicode · 2023-10-14T13:06:27Z

aiaicode
Oct 14, 2023

Finally LLaVA under llamacpp !

If someone do not know LLaVA is for picture recognition and maybe for video in the furfure :D

Let's not forget, video is nothing but 30 pictures/second. So technically if you take input from user via voice / text and inference with image at that exact second of the video, you will get the reply, so you can talk to a video today if you have a high performance computer / new M2 Macbook Pro.

Knowing what the entire video on the other hand is slightly different. Inference every second of the video and combine the entire result and feed it back to an LLM and you should know what's happening in the video at each second of the video.

0 replies

LLaVA LLaVA LLaVA LLaVA #3600

Uh oh!

Replies: 6 comments · 4 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mirek190 Oct 13, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mirek190 Oct 13, 2023 Author

Uh oh!

Uh oh!

Replies: 6 comments 4 replies

mirek190 Oct 13, 2023
Author

mirek190 Oct 13, 2023
Author