You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to speedup inference and training of a mistralai/Mistral-Small-3.1-24B-Instruct-2503 model.
Simply replacing AutoModelForCausalLM with AutoLigerKernelForCausalLM does not lead to any speedup in my sampling speed or memory usage. I am also using DeepSpeed for distributed training.
model = AutoLigerKernelForCausalLM.from_pretrained(
"mistralai/Mistral-Small-3.1-24B-Instruct-2503",
torch_dtype=TorchDtype.float32,
attn_implementation="sdpa",
)
Am I missing anything? Thanks for any help.
Should I expect to see the speedup and memory optimization in the autoregressive generative sampling or in the backward pass? or in both?
I am trying to speedup inference and training of a
mistralai/Mistral-Small-3.1-24B-Instruct-2503
model.Simply replacing
AutoModelForCausalLM
withAutoLigerKernelForCausalLM
does not lead to any speedup in my sampling speed or memory usage. I am also using DeepSpeed for distributed training.I have also tried this with the same result:
Am I missing anything? Thanks for any help.
Should I expect to see the speedup and memory optimization in the autoregressive generative sampling or in the backward pass? or in both?
########################
Python version: 3.12.9
PyTorch version: 2.6.0+cu124
CUDA version: 12.4
Triton version: 3.2.0
Transformers version: 4.51.1
DeepSpeed version: 0.15.4
The text was updated successfully, but these errors were encountered: