Description
I found that the GPU memory usage is relatively small when batch_size=1 is setted for inference.
I want to make full use of the GPU by assigning a larger batch_size. But I encountered the following error, can anyone help me?
Traceback (most recent call last): File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/__main__.py", line 330, in cli_evaluate results, samples = cli_evaluate_single(args) File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/__main__.py", line 471, in cli_evaluate_single results = evaluator.simple_evaluate( File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/utils.py", line 533, in _wrapper return fn(*args, **kwargs) File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/evaluator.py", line 177, in simple_evaluate lm = lmms_eval.models.get_model(model).create_from_arg_string( File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/api/model.py", line 111, in create_from_arg_string return cls(**args, **args2) File "/home/user/MLLM/LLaVA-NeXT/lmms-eval/lmms_eval/models/llava_onevision.py", line 148, in __init__ assert self.batch_size_per_gpu == 1, "Llava currently does not support batched generation. See https://github.com/haotian-liu/LLaVA/issues/754. HF Llava also has this issue." AssertionError: Llava currently does not support batched generation. See https://github.com/haotian-liu/LLaVA/issues/754. HF Llava also has this issue.
By the way, I am evaluating LLaVA-OneVision on videomme.