olive: a weird behavior of a model converted to ONNX format

### Describe the issue

I have an in-house model in safetensors format. I used the following command to convert it to ONNX format, quantize and optimize:

`olive auto-opt --model_name_or_path . --output_path . --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1`

The model behaves inappropriately when used in the prefill model or for the speculative decoding. For reminder, in the both cases a model is fed with a number of tokens, and the model processes all them **in parallel in one iteration**. The time for any model to process **N** tokens in parallel is expected to be the same **for any N** - that's the principle which the speed-up due to the speculative decoding is based upon.

However, the model I produced with the above command demonstrates that the inference time aggressively depends on **N**. Here is an example with typical figures:
```
N = 1, time = 29ms
N = 2, time = 100ms
N = 3, time = 195ms
N = 4, time = 195ms
N = 5, time = 195ms
```
You see that the time grows with N, then "saturates" at N = 3. Due to this behavior, speculative decoding optimization becomes impractical.

I believe there is a bug in olive. The evidence for the claim:
- the same model when converted to ONNX **without quantization** behaves as expected, i.e. the time to process N tokens in parallel is constant and does not depend on N
- the ONNX models from Huggingface behave as expected too.

Is there anything else that I am missing?

### To reproduce

The conversion command:

`olive auto-opt --model_name_or_path . --output_path . --device gpu --provider CUDAExecutionProvider --precision int4 --use_model_builder --log_level 1`

I am not allowed to provide the model.

### Urgency

_No response_

### Platform

Linux

### OS Version

Linux + CUDA

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.20.0

### ONNX Runtime API

C++

### Architecture

X64

### Execution Provider

CUDA

### Execution Provider Library Version

CUDA 12.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

olive: a weird behavior of a model converted to ONNX format #25600

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

olive: a weird behavior of a model converted to ONNX format #25600

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions