Description:
I've noticed that when using DeepSeekVL2-tiny, predictions for the same input differ significantly between single inference (processing one input at a time) and batch inference (processing multiple inputs together). The outputs for a single item in a batch are notably different from its output when run alone. I suspect this might relate to how batch normalization or dropout is handled, or a difference in how the data is preprocessed. Any guidance on why this discrepancy occurs, or how to align the results, would be greatly appreciated.