Open
Description
System Info
Google Colab using T4 and V100 GPUs
Reproduction
Here is a Google Colab link: https://colab.research.google.com/drive/1KH2oBL0h1L3_PTmGgvHVtpaIeIpB9wv_?usp=sharing
In this Colab notebook, we load the state-spaces/mamba-370m-hf model from huggingface using load_in_8bit=True, and then we do some perplexity testing.
When running the notebook using T4 GPU, we get NaN.
When running the notebook using V100 GPU, we get a reasonable perplexity score (between 10 and 20).
Expected behavior
I would expect that the results are similar when ran on T4 vs V100 GPU.