Open
Description
.att_syntax
.loop:
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
nop
incq %rax
cmpq $999, %rax
jne .loop
UICA predicts 4.50 cycles per iteration due to an issue bottleneck; MCA claims 3.2
According to Agner,
The maximum throughput from the decoders is four instructions or five μops per clock cycle
Similar problem with Icelake, again from Agner
The maximum throughput is improved to five instructions per clock cycle, where Skylake has four.