Missing Out Variants When Running Llama3.2 Example Without XNNPack

I am follwing the [instructions in the Llama2 README](https://github.com/pytorch/executorch/blob/d9aeca556566104c2594ec482a673b9ec5b11390/examples/models/llama2/README.md#instructions) to test llama model with Executorch.
I want to compare the performance of the model with and without XNNPack. From the code, it seems that DQLinear operations are delegated to XNNPack by default. However, I would like to understand how to use the quantized ops defined in Executorch, as listed in [quantized.yaml](https://github.com/pytorch/executorch/blob/main/kernels/quantized/quantized.yaml). Could you provide guidance on configuring the model to use Executorch's quantized ops instead of XNNPack?

I encounter the following error when the -X(--xnnpack) flag is removed from the python export:
``raise RuntimeError(f"Missing out variants: {missing_out_vars}")
RuntimeError: Missing out variants: {'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::dequantize_per_channel', 'quantized_decomposed::dequantize_per_channel_group', 'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::quantize_per_token'}``


```
LLAMA_QUANTIZED_CHECKPOINT=/content/SpinQuant_workspace/consolidated.00.pth
LLAMA_PARAMS= /src/gitrepo/llama/Llama3.2-1B/params.json
python -m examples.models.llama2.export_llama \
   --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \
   --params "${LLAMA_PARAMS:?}" \
   --use_sdpa_with_kv_cache \
   --preq_mode 8da4w_output_8da8w \
   --preq_group_size 32 \
   --max_seq_length 2048 \
   --output_name "llama3_2_noxnn.pte" \
   -kv \
   -d fp32 \
   --preq_embedding_quantize 8,0 \
   --use_spin_quant native \
   --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}'
```
What adjustments are required to resolve the "missing out variants" error when the -X flag is omitted?
Thank you for your assistance!


### Versions

Collecting environment information...
PyTorch version: 2.6.0.dev20240927+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.0
Libc version: glibc-2.35

Python version: 3.10.0 (default, Mar  3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.167.1-1.cm2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 12.6.77

Versions of relevant libraries:
[pip3] executorch==0.5.0a0+20a157f
[pip3] numpy==1.26.4
[pip3] torch==2.6.0.dev20240927+cpu
[pip3] torchao==0.5.0+git0916b5b2
[pip3] torchaudio==2.5.0.dev20240927+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240927+cpu
[conda] executorch                0.5.0a0+20a157f          pypi_0    pypi
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.6.0.dev20240927+cpu          pypi_0    pypi
[conda] torchaudio                2.5.0.dev20240927+cpu          pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.20.0.dev20240927+cpu          pypi_0    pypi

cc @digantdesai @mcr229 @JacobSzwejbka @dbort

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Missing Out Variants When Running Llama3.2 Example Without XNNPack #6975

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing Out Variants When Running Llama3.2 Example Without XNNPack #6975

Description

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions