No Significant Improvement Observed in Model Training Speed #664

albertbou92 · 2025-04-11T23:47:28Z

I am trying to speedup inference and training of a mistralai/Mistral-Small-3.1-24B-Instruct-2503 model.

Simply replacing AutoModelForCausalLM with AutoLigerKernelForCausalLM does not lead to any speedup in my sampling speed or memory usage. I am also using DeepSpeed for distributed training.

model = AutoLigerKernelForCausalLM.from_pretrained(
            "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
            torch_dtype=TorchDtype.float32,
            attn_implementation="sdpa",
        )

I have also tried this with the same result:

model = AutoModelForCausalLM.from_pretrained(
        "mistralai/Mistral-Small-3.1-24B-Instruct-2503",
        torch_dtype=TorchDtype.float32,
       attn_implementation="sdpa",
)
apply_liger_kernel_to_mistral(
    rope=True,
    cross_entropy=False,
    fused_linear_cross_entropy=True,
    rms_norm=True,
    swiglu=True,
    model=model,
)

Am I missing anything? Thanks for any help.
Should I expect to see the speedup and memory optimization in the autoregressive generative sampling or in the backward pass? or in both?

########################

Python version: 3.12.9
PyTorch version: 2.6.0+cu124
CUDA version: 12.4
Triton version: 3.2.0
Transformers version: 4.51.1
DeepSpeed version: 0.15.4

The text was updated successfully, but these errors were encountered:

shivam15s · 2025-04-21T22:39:34Z

Hi, you should see the perf gains while training (fwd + bwd)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No Significant Improvement Observed in Model Training Speed #664

No Significant Improvement Observed in Model Training Speed #664

albertbou92 commented Apr 11, 2025 •

edited

Loading

shivam15s commented Apr 21, 2025

No Significant Improvement Observed in Model Training Speed #664

No Significant Improvement Observed in Model Training Speed #664

Comments

albertbou92 commented Apr 11, 2025 • edited Loading

shivam15s commented Apr 21, 2025

albertbou92 commented Apr 11, 2025 •

edited

Loading