Performance Issues running NVILA #230

UelisonSantos · 2025-04-22T21:15:56Z

Hello everyone, thanks for sharing this work.

I am trying to benchmark it using a different dataset/task. For now, I am more concerned about the latency numbers.

I am comparing on the same machine with an A100 80GB GPU, Qwen2VL, and NVILA.
Based on the results from the paper, I would expect a significant speed increase from Qwen, but I am not seeing this.

I did the same experiments using the provided vila-infer and got the same numbers.

The only thing I noticed on my environment, is those two warnings:

You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use mean_resizing=False

Could Flash attention be the problem? Any clues on how to fix it?

Thank you,
Uélison

The text was updated successfully, but these errors were encountered:

zhijian-liu · 2025-05-05T20:07:12Z

Latency can vary significantly depending on factors like the number of image tokens and output tokens. In our paper, we made sure to align both the number of image tokens and output tokens as closely as possible when comparing against other baselines.

Could you share more details about your specific benchmarking setup? For example, Qwen2-VL includes a configurable setting that can drastically change the number of visual tokens it generates. Similarly, the NVILA-lite models tend to produce significantly fewer tokens overall, which can also impact latency.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issues running NVILA #230

Performance Issues running NVILA #230

UelisonSantos commented Apr 22, 2025

zhijian-liu commented May 5, 2025

Performance Issues running NVILA #230

Performance Issues running NVILA #230

Comments

UelisonSantos commented Apr 22, 2025

zhijian-liu commented May 5, 2025