You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Pipelines] infer model device with optional override (#1572)
## Purpose ##
* Fix support for deepseekv2.5
* Add more robustness inference for model devices when calibrating
## Prerequisites ##
* neuralmagic/compressed-tensors#363
## Background ##
Normally, starting model inputs on the cpu is not an issue for the
sequential pipeline, since the sequential pipeline offloads models and
offloaded models automatically place inputs on the proper devices.
However, the deepseekv2.5 model is an exception, as this model [performs
an add
operation](https://huggingface.co/deepseek-ai/DeepSeek-V2.5/blob/main/modeling_deepseek.py#L886)
between a module output (`attn_weights` and a model input
`attention_mask`) before the model input has a chance to be placed on
the proper device.
## Changes ##
* Use `model_device` when deciding the onload device for model inputs
## Testing ##
* Ran deepseekv2.5 example to completion
* TODO: run nightly to confirm other models work with new input device
placement
---------
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
0 commit comments