Quantized model using load_in_8bit produces very different results on T4 vs V100 GPU on Colab

### System Info

Google Colab using T4 and V100 GPUs

### Reproduction

Here is a Google Colab link: https://colab.research.google.com/drive/1KH2oBL0h1L3_PTmGgvHVtpaIeIpB9wv_?usp=sharing

In this Colab notebook, we load the state-spaces/mamba-370m-hf model from huggingface using load_in_8bit=True, and then we do some perplexity testing.

When running the notebook using T4 GPU, we get NaN.

When running the notebook using V100 GPU, we get a reasonable perplexity score (between 10 and 20).

### Expected behavior

I would expect that the results are similar when ran on T4 vs V100 GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantized model using load_in_8bit produces very different results on T4 vs V100 GPU on Colab #1186

System Info

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantized model using load_in_8bit produces very different results on T4 vs V100 GPU on Colab #1186

Description

System Info

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions