Zen3 scheduler model for the latency of VEXTRACTF128rri is probably incorrect

See also discussion at https://discourse.llvm.org/t/are-the-latencies-of-vextractf128-correct-for-zen2-3-in-mca/86422

LLVM MCA relies on LLVM's scheduler models to predict cycle counts. This is the predicted timeline graph for a small snippet on Zen3:
```
[0,0]     DeeeeeeeeER    .    .   vmovapd       (%rdi), %ymm0
[0,1]     D=eeeeeeeeeeER .    .   vsubpd        (%rsi), %ymm0, %ymm0
[0,2]     D===========eeeER   .   vmulpd        %ymm0, %ymm0, %ymm0
[0,3]     D==============eeeeER   vextractf128  $1, %ymm0, %xmm1
[0,4]     D==============eE---R   vmovhlps      %xmm0, %xmm0, %xmm2
```
As you can see, `vextractf128` is predicted to have 4 cycles of latency. This however is inconsistent with both Agner Fogs latency tables (which list 3 cycles) and my own measurements with llvm-exegesis.

```
./llvm-exegesis -mode=latency -opcode-name=VEXTRACTF128rri -mcpu=znver3 --benchmark-repeat-count=100000 -min-instructions=1000  --repetition-mode=loop
---
mode:            latency
key:
  instructions:
    - 'VEXTRACTF128rri XMM0 YMM0 i_0x1'
  config:          ''
  register_initial_values:
    - 'YMM0=0x0'
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
min_instructions: 1000
measurements:
  - { key: latency, value: 3.15, per_snippet_value: 3.15, validation_counters: {} }
error:           ''
info:            Repeating a single explicitly serial instruction
assembled_snippet: 4883EC20C7042400000000C744240400000000C744240800000000C744240C00000000C744241000000000C744241400000000C744241800000000C744241C00000000C5FE6F04244883C42049B80200000000000000662E0F1F840000000000C4E37D19C001C4E37D19C0014983C0FF75EEC3
...
```

Confusingly, AMD's official instruction latency table for Zen3 (Family_19h_Instruction_Latencies_version_1-00.xlsx, AMD Publication No. 56665 Revision 3.00 November 2020) lists `vextractf128` as having 4 cycles of latency. Perhaps I am misinterpreting my measurement results, but I cannot see how that figure could be correct. My confidence in the accuracy of the official latency table is further eroded by the fact that the two `vextractf128` variants are both listed with empty operand fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Zen3 scheduler model for the latency of VEXTRACTF128rri is probably incorrect #146564

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Zen3 scheduler model for the latency of VEXTRACTF128rri is probably incorrect #146564

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions