Skip to content

Sparse-quantized model runs without VNNI acceleration #1

@ylep

Description

@ylep

Describe the bug

Hi Dr @clementpoiret! Now that you have graduated 🎉 here is a technical issue to keep you busy 😉

On a workstation with AVX512 and VNNI CPU capabilities, I am getting the following message:

DeepSparse Optimization Status (minimal: AVX2 | partial: AVX512 | full: AVX512 VNNI): full
[nm_ort 7fda254c7280 >WARN<  is_supported_graph src/onnxruntime_neuralmagic/supported/ops.cc:134] Warning: Optimized runtime disabled - Detecte
d dynamic input input dim 2. Set inputs to static shapes to enable optimal performance.

The performance is indeed worse than the non-sparse model (although I am not sure how it is counting CPU-time here w.r.t. HyperThreading):

  • 6 min 52 s wall-time / 40 min 56 s user CPU-time for segmentation=bagging_sq hardware=deepsparse
  • vs 4 min 16 s wall-time / 79 min 29 s user CPU-time for hardware=onnxruntime model=bagging_accurate hardware.engine_settings.execution_providers="['CPUExecutionProvider']).

Environment

  • OS: Ubuntu 22.04
  • Python: 3.10.12
  • HSF Version: 1.1.3
  • Relevant settings: segmentation=bagging_sq hardware=deepsparse
  • Versions of a few relevant dependencies:
deepsparse==1.5.3
onnx==1.12.0
onnxruntime==1.16.1
onnxruntime-gpu==1.16.1
sparsezoo==1.5.2
torch==2.1.0
torchio==0.18.92

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions