Skip to content

Missing Out Variants When Running Llama3.2 Example Without XNNPack #6975

@sheetalarkadam

Description

@sheetalarkadam

I am follwing the instructions in the Llama2 README to test llama model with Executorch.
I want to compare the performance of the model with and without XNNPack. From the code, it seems that DQLinear operations are delegated to XNNPack by default. However, I would like to understand how to use the quantized ops defined in Executorch, as listed in quantized.yaml. Could you provide guidance on configuring the model to use Executorch's quantized ops instead of XNNPack?

I encounter the following error when the -X(--xnnpack) flag is removed from the python export:
raise RuntimeError(f"Missing out variants: {missing_out_vars}") RuntimeError: Missing out variants: {'quantized_decomposed::choose_qparams_per_token_asymmetric', 'quantized_decomposed::dequantize_per_channel', 'quantized_decomposed::dequantize_per_channel_group', 'quantized_decomposed::dequantize_per_token', 'quantized_decomposed::quantize_per_token'}

LLAMA_QUANTIZED_CHECKPOINT=/content/SpinQuant_workspace/consolidated.00.pth
LLAMA_PARAMS= /src/gitrepo/llama/Llama3.2-1B/params.json
python -m examples.models.llama2.export_llama \
   --checkpoint "${LLAMA_QUANTIZED_CHECKPOINT:?}" \
   --params "${LLAMA_PARAMS:?}" \
   --use_sdpa_with_kv_cache \
   --preq_mode 8da4w_output_8da8w \
   --preq_group_size 32 \
   --max_seq_length 2048 \
   --output_name "llama3_2_noxnn.pte" \
   -kv \
   -d fp32 \
   --preq_embedding_quantize 8,0 \
   --use_spin_quant native \
   --metadata '{"append_eos_to_prompt": 0, "get_bos_id":128000, "get_eos_ids":[128009, 128001], "get_n_bos": 0, "get_n_eos": 0}'

What adjustments are required to resolve the "missing out variants" error when the -X flag is omitted?
Thank you for your assistance!

Versions

Collecting environment information...
PyTorch version: 2.6.0.dev20240927+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 14.0.0-1ubuntu1.1
CMake version: version 3.31.0
Libc version: glibc-2.35

Python version: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.15.167.1-1.cm2-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: 12.6.77

Versions of relevant libraries:
[pip3] executorch==0.5.0a0+20a157f
[pip3] numpy==1.26.4
[pip3] torch==2.6.0.dev20240927+cpu
[pip3] torchao==0.5.0+git0916b5b2
[pip3] torchaudio==2.5.0.dev20240927+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240927+cpu
[conda] executorch 0.5.0a0+20a157f pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.6.0.dev20240927+cpu pypi_0 pypi
[conda] torchaudio 2.5.0.dev20240927+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.20.0.dev20240927+cpu pypi_0 pypi

cc @digantdesai @mcr229 @JacobSzwejbka @dbort

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: runtimeIssues related to the core runtime and code under runtime/module: xnnpackIssues related to xnnpack delegation and the code under backends/xnnpack/triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    Status

    To triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions