Replies: 6 comments 4 replies
-
Hooray !!! |
Beta Was this translation helpful? Give feedback.
-
Is there a plan to implement grammer for LLaVA? That will be very handy, powerful! |
Beta Was this translation helpful? Give feedback.
-
i am taking a stab at llama-cpp-python binding support and then LMQL support. taking a leaf from the text-generation-webui book and will try to encode the image as a base64 blob that can be embedded in the prompt string eg i'm not sure if this work has already been started elsewhere, please lmk if this is double effort ha |
Beta Was this translation helpful? Give feedback.
-
work on the above started here (draft state): #3613 |
Beta Was this translation helpful? Give feedback.
-
Any chance for support CogVLM? |
Beta Was this translation helpful? Give feedback.
-
Let's not forget, video is nothing but 30 pictures/second. So technically if you take input from user via voice / text and inference with image at that exact second of the video, you will get the reply, so you can talk to a video today if you have a high performance computer / new M2 Macbook Pro. Knowing what the entire video on the other hand is slightly different. Inference every second of the video and combine the entire result and feed it back to an LLM and you should know what's happening in the video at each second of the video. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Finally LLaVA under llamacpp !
If someone do not know LLaVA is for picture recognition and maybe for video in the furfure :D
Beta Was this translation helpful? Give feedback.
All reactions