Sparse-quantized model runs without VNNI acceleration

**Describe the bug**

Hi Dr @clementpoiret! Now that you have graduated :tada: here is a technical issue to keep you busy :wink: 

On a workstation with AVX512 and VNNI CPU capabilities, I am getting the following message:

```
DeepSparse Optimization Status (minimal: AVX2 | partial: AVX512 | full: AVX512 VNNI): full
[nm_ort 7fda254c7280 >WARN<  is_supported_graph src/onnxruntime_neuralmagic/supported/ops.cc:134] Warning: Optimized runtime disabled - Detecte
d dynamic input input dim 2. Set inputs to static shapes to enable optimal performance.
```

The performance is indeed worse than the non-sparse model (although I am not sure how it is counting CPU-time here w.r.t. HyperThreading):
- 6 min 52 s wall-time / 40 min 56 s user CPU-time for `segmentation=bagging_sq hardware=deepsparse`
- vs 4 min 16 s wall-time / 79 min 29 s user CPU-time  for `hardware=onnxruntime model=bagging_accurate hardware.engine_settings.execution_providers="['CPUExecutionProvider']`).


**Environment**
 - OS: Ubuntu 22.04
 - Python: 3.10.12
 - HSF Version: 1.1.3
 - Relevant settings: `segmentation=bagging_sq hardware=deepsparse`
- Versions of a few relevant dependencies:
```
deepsparse==1.5.3
onnx==1.12.0
onnxruntime==1.16.1
onnxruntime-gpu==1.16.1
sparsezoo==1.5.2
torch==2.1.0
torchio==0.18.92
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sparse-quantized model runs without VNNI acceleration #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sparse-quantized model runs without VNNI acceleration #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions