-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
When you process llava you have 3 different batch processings in sequence before the output is generated:
I guess this could be optimized by converting the two text prompts into embeddings first and then combining the evaluation, allowing for larger batch processing in one run. But I am doubtful on the gains. Looking at your general speed, you do not have a batch processing problem but a general performance problem. When your hardware is so weak you should first optimize the configuration, you can likely gain a lot more from that. |
Beta Was this translation helpful? Give feedback.
When you process llava you have 3 different batch processings in sequence before the output is generated:
In addition there is time spent to process the CLIP/ViT embeddings, currently on CPU.
I guess this could be optimized by converting the two text prompts into embeddings first and then combining the evaluation, allowing for larger batch processing in one run. But I am doubtful on the gains.
Looking at your general speed, you do not have a batch processing problem but a general performance problem.
I assume you tun this on very low hardware ? With a good GPU you can get thousands of tokens/second batch speed but you sit at 74.
When y…