-
Notifications
You must be signed in to change notification settings - Fork 485
Description
AArch64 UMOV instruction fails with Aarch64InvalidInstruction when VAS is ARM64_VAS_INVALID
Summary
The AArch64 UMOV instruction handler raises Aarch64InvalidInstruction
when processing certain valid UMOV/MOV instructions. This occurs when Capstone decodes instructions like umov w0, v1.s[0]
or mov w0, v1.s[0]
(MOV is an alias for UMOV) and returns a Vector Access Specifier (VAS) value of ARM64_VAS_INVALID
(0x0).
Environment
- Manticore version: Latest (from ekilmer/use-pyproject-toml branch)
- Python version: 3.11
- OS: Linux
- Architecture: Testing AArch64 emulation
- Dependencies:
- Capstone: 5.0.0
- Keystone: 0.9.2
- Unicorn: 2.1.3
Steps to Reproduce
Run the AArch64 CPU tests:
uv run pytest tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_umov -xvs
uv run pytest tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_mov_to_general -xvs
Actual Behavior
Tests fail with Aarch64InvalidInstruction
exception:
FAILED tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_umov - manticore.native.cpu.aarch64.Aarch64InvalidInstruction
FAILED tests/native/test_aarch64cpu.py::Aarch64CpuInstructions::test_mov_to_general - manticore.native.cpu.aarch64.Aarch64InvalidInstruction
The same failure occurs for the symbolic instruction tests:
Aarch64SymInstructions::test_umov
Aarch64SymInstructions::test_mov_to_general
Expected Behavior
Tests should pass. The UMOV instruction handler should correctly process instructions even when Capstone returns ARM64_VAS_INVALID
as the vector access specifier.
Root Cause Analysis
The Problem
The UMOV instruction handler in /root/manticore/manticore/native/cpu/aarch64.py
(lines 5131-5145) only handles specific VAS values:
if vas == cs.arm64.ARM64_VAS_1B: # value = 4
elem_size = 8
elif vas == cs.arm64.ARM64_VAS_1H: # value = 8
elem_size = 16
elif vas == cs.arm64.ARM64_VAS_1S: # value = 11
elem_size = 32
elif vas == cs.arm64.ARM64_VAS_1D: # value = 13
elem_size = 64
else:
raise Aarch64InvalidInstruction # Line 5145
However, when Capstone decodes certain UMOV instructions, particularly:
umov w0, v1.s[0]
umov x0, v1.d[0]
mov w0, v1.s[0]
(MOV alias for UMOV)mov x0, v1.d[0]
(MOV alias for UMOV)
It returns ARM64_VAS_INVALID
(value = 0) as the VAS value. Since the handler doesn't have a case for VAS=0, it raises the exception.
Verification
When assembling and disassembling these instructions:
Instruction | Keystone Encoding | Capstone Decode | VAS Value | Result |
---|---|---|---|---|
umov w0, v1.b[0] |
203c010e |
umov w0, v1.b[0] |
4 (ARM64_VAS_1B) | ✓ Works |
umov w0, v1.h[0] |
203c020e |
umov w0, v1.h[0] |
8 (ARM64_VAS_1H) | ✓ Works |
umov w0, v1.s[0] |
203c040e |
mov w0, v1.s[0] |
0 (ARM64_VAS_INVALID) | ✗ Fails |
umov x0, v1.d[0] |
203c084e |
mov x0, v1.d[0] |
0 (ARM64_VAS_INVALID) | ✗ Fails |
Note how Capstone decodes the last two as mov
(the alias form) and returns VAS_INVALID.
Proposed Fix
Add handling for ARM64_VAS_INVALID
in the UMOV handler. When VAS is 0, the element size needs to be inferred from the instruction operands:
if vas == cs.arm64.ARM64_VAS_INVALID: # value = 0
# Handle MOV alias form - infer element size from operand
# This commonly happens with .s and .d element specifiers
op_str = insn.op_str
if '.b[' in op_str:
elem_size = 8
elif '.h[' in op_str:
elem_size = 16
elif '.s[' in op_str:
elem_size = 32
elif '.d[' in op_str:
elem_size = 64
else:
raise Aarch64InvalidInstruction
elif vas == cs.arm64.ARM64_VAS_1B:
elem_size = 8
elif vas == cs.arm64.ARM64_VAS_1H:
elem_size = 16
elif vas == cs.arm64.ARM64_VAS_1S:
elem_size = 32
elif vas == cs.arm64.ARM64_VAS_1D:
elem_size = 64
else:
raise Aarch64InvalidInstruction
Alternatively, the element size could be inferred from the destination register size and instruction encoding.
Impact
- 4 test methods fail completely (2 in Aarch64CpuInstructions, 2 in Aarch64SymInstructions)
- Each test method contains multiple test cases, affecting coverage of UMOV/MOV instructions
- This blocks proper testing of vector-to-general register moves for 32-bit and 64-bit elements
Additional Notes
- The issue only affects certain forms of UMOV that Capstone decodes as MOV aliases
- The byte and halfword variants (
umov w0, v1.b[i]
,umov w0, v1.h[i]
) work correctly - This appears to be a quirk in how Capstone handles the MOV alias form of UMOV instructions