Skip to content

AArch64 UMOV instruction fails with Aarch64InvalidInstruction when VAS is ARM64_VAS_INVALID #2677

@dguido

Description

@dguido

AArch64 UMOV instruction fails with Aarch64InvalidInstruction when VAS is ARM64_VAS_INVALID

Summary

The AArch64 UMOV instruction handler raises Aarch64InvalidInstruction when processing certain valid UMOV/MOV instructions. This occurs when Capstone decodes instructions like umov w0, v1.s[0] or mov w0, v1.s[0] (MOV is an alias for UMOV) and returns a Vector Access Specifier (VAS) value of ARM64_VAS_INVALID (0x0).

Environment

  • Manticore version: Latest (from ekilmer/use-pyproject-toml branch)
  • Python version: 3.11
  • OS: Linux
  • Architecture: Testing AArch64 emulation
  • Dependencies:
    • Capstone: 5.0.0
    • Keystone: 0.9.2
    • Unicorn: 2.1.3

Steps to Reproduce

Run the AArch64 CPU tests:

uv run pytest tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_umov -xvs
uv run pytest tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_mov_to_general -xvs

Actual Behavior

Tests fail with Aarch64InvalidInstruction exception:

FAILED tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_umov - manticore.native.cpu.aarch64.Aarch64InvalidInstruction
FAILED tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_mov_to_general - manticore.native.cpu.aarch64.Aarch64InvalidInstruction

The same failure occurs for the symbolic instruction tests:

  • Aarch64SymInstructions::test_umov
  • Aarch64SymInstructions::test_mov_to_general

Expected Behavior

Tests should pass. The UMOV instruction handler should correctly process instructions even when Capstone returns ARM64_VAS_INVALID as the vector access specifier.

Root Cause Analysis

The Problem

The UMOV instruction handler in /root/manticore/manticore/native/cpu/aarch64.py (lines 5131-5145) only handles specific VAS values:

if vas == cs.arm64.ARM64_VAS_1B:    # value = 4
    elem_size = 8
elif vas == cs.arm64.ARM64_VAS_1H:  # value = 8
    elem_size = 16
elif vas == cs.arm64.ARM64_VAS_1S:  # value = 11
    elem_size = 32
elif vas == cs.arm64.ARM64_VAS_1D:  # value = 13
    elem_size = 64
else:
    raise Aarch64InvalidInstruction  # Line 5145

However, when Capstone decodes certain UMOV instructions, particularly:

  • umov w0, v1.s[0]
  • umov x0, v1.d[0]
  • mov w0, v1.s[0] (MOV alias for UMOV)
  • mov x0, v1.d[0] (MOV alias for UMOV)

It returns ARM64_VAS_INVALID (value = 0) as the VAS value. Since the handler doesn't have a case for VAS=0, it raises the exception.

Verification

When assembling and disassembling these instructions:

Instruction Keystone Encoding Capstone Decode VAS Value Result
umov w0, v1.b[0] 203c010e umov w0, v1.b[0] 4 (ARM64_VAS_1B) ✓ Works
umov w0, v1.h[0] 203c020e umov w0, v1.h[0] 8 (ARM64_VAS_1H) ✓ Works
umov w0, v1.s[0] 203c040e mov w0, v1.s[0] 0 (ARM64_VAS_INVALID) ✗ Fails
umov x0, v1.d[0] 203c084e mov x0, v1.d[0] 0 (ARM64_VAS_INVALID) ✗ Fails

Note how Capstone decodes the last two as mov (the alias form) and returns VAS_INVALID.

Proposed Fix

Add handling for ARM64_VAS_INVALID in the UMOV handler. When VAS is 0, the element size needs to be inferred from the instruction operands:

if vas == cs.arm64.ARM64_VAS_INVALID:  # value = 0
    # Handle MOV alias form - infer element size from operand
    # This commonly happens with .s and .d element specifiers
    op_str = insn.op_str
    if '.b[' in op_str:
        elem_size = 8
    elif '.h[' in op_str:
        elem_size = 16
    elif '.s[' in op_str:
        elem_size = 32
    elif '.d[' in op_str:
        elem_size = 64
    else:
        raise Aarch64InvalidInstruction
elif vas == cs.arm64.ARM64_VAS_1B:
    elem_size = 8
elif vas == cs.arm64.ARM64_VAS_1H:
    elem_size = 16
elif vas == cs.arm64.ARM64_VAS_1S:
    elem_size = 32
elif vas == cs.arm64.ARM64_VAS_1D:
    elem_size = 64
else:
    raise Aarch64InvalidInstruction

Alternatively, the element size could be inferred from the destination register size and instruction encoding.

Impact

  • 4 test methods fail completely (2 in Aarch64CpuInstructions, 2 in Aarch64SymInstructions)
  • Each test method contains multiple test cases, affecting coverage of UMOV/MOV instructions
  • This blocks proper testing of vector-to-general register moves for 32-bit and 64-bit elements

Additional Notes

  • The issue only affects certain forms of UMOV that Capstone decodes as MOV aliases
  • The byte and halfword variants (umov w0, v1.b[i], umov w0, v1.h[i]) work correctly
  • This appears to be a quirk in how Capstone handles the MOV alias form of UMOV instructions

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions