Skip to content

Quantized model using load_in_8bit produces very different results on T4 vs V100 GPU on Colab #1186

Open
@cjiangz

Description

@cjiangz

System Info

Google Colab using T4 and V100 GPUs

Reproduction

Here is a Google Colab link: https://colab.research.google.com/drive/1KH2oBL0h1L3_PTmGgvHVtpaIeIpB9wv_?usp=sharing

In this Colab notebook, we load the state-spaces/mamba-370m-hf model from huggingface using load_in_8bit=True, and then we do some perplexity testing.

When running the notebook using T4 GPU, we get NaN.

When running the notebook using V100 GPU, we get a reasonable perplexity score (between 10 and 20).

Expected behavior

I would expect that the results are similar when ran on T4 vs V100 GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Model SupportRelated to a specific modeling situation.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions