-
Notifications
You must be signed in to change notification settings - Fork 1
Description
first i'd explain how our function dispatcher transform works and what it currently solves
the function dispatcher
detection of the standard dispatcher
we scan the contract’s CFG/IR
for the typical Solidity‐style dispatcher pattern:
PUSH1 0x00 CALLDATALOAD PUSH1 0xE0 SHR ← extract 4‑byte selector
DUP1 PUSH4 <sel₁> EQ PUSH2 <addr₁> JUMPI
DUP1 PUSH4 <sel₂> EQ PUSH2 <addr₂> JUMPI
…
REVERT
this linear chain of DUP1 PUSH4 <selector> EQ PUSH2 <addr> JUMPI
checks is a primary fingerprint that analyzer use
harvesting selectors & destinations
once detected, we pull out the list of (selector
ᵢ, addressᵢ
) pairs from those comparison blocks.
rewriting into an obfuscated “Jump Table”
now, instead of emitting that flat chain, we rebuild the dispatch logic by randomizing across several “patterns” (chosen per‐seed):
- Standard: the original pattern, but with entries shuffled into a random order.
- Arithmetic: obscure the selector check via ADD/SUB/XOR before comparing.
- Inverted: use NEQ or inverted comparisons plus conditional jumps to swap taken paths.
- Cascaded: layer multiple comparisons and branches (e.g. check half the selectors, branch, then check the rest).
- Dummy Branches: insert dead‑code comparisons or opaque predicates that never fire, purely to add noise.
we then reassign new basic blocks for each check, re‑index PCs, and wire up the final jumps so that at runtime the selector still routes to the correct function and at this point the onchain bytecode no longer bears the simple signature chain analyzers look for.
curious, what does this solve anyway?
-
hide Function Signatures
you see i discovered that unobfuscated, the 4‑byte selectors (e.g.0xa9059cbb
) appear literally in the bytecode. any static analyzer can extract them and map back to function names. so we hide or handle the selector comparisons, and in so doing we deny easy signature recovery. -
defeat classifiers (not totally though, i'd come back to this)
many bulk‐analysis tools fingerprint contracts by their dispatcher shape. our transform yields a unique dispatch structure per seed, so automated clustering can’t group all Mirage/Azoth contracts under a common fingerprint. -
increase revers engineering cost
even if someone disassembles a single contract, the randomized dispatcher forces them into manual CFG reconstruction (i am yet to test this though, most of the transformations here are based on research). and because each obfuscated build differs, the manual reverse engineering you do on one obfuscated contract can’t be reused on others; each one still costs you full effort because they all differ
statistical indistinguishability concerns
when we talk about statistical indistinguishability, we mean that an adversary—even with broad access to many obfuscated bytecodes—cannot reliably tell which contracts were processed by Azoth versus ordinary deployments, nor cluster Azoth outputs together. can we say we have fully achieved this?, i would say NO, yes even with the current function dispatcher transform, we haven't. since i built this transforms, i know that the way the current transforms right now optimize for individual contract obfuscation rather than distributional or large data indistinguishability.
finite variant space
our transforms operate within a bounded, finite pattern space. function dispatcher supports five variants (Standard, Arithmetic, Inverted, Cascaded, Dummy‑Branches), each following a deterministic template per seed. OpaquePredicate
currently implements only constant‐equality checks; no arithmetic or hash‐based predicates yet. JumpAddressTransformer
uses only addition‐based splitting of jump targets. and Shuffle
reorders existing basic blocks without creating new control structures.
because each pass follows a small, known set of templates—even though the order and parameters are randomized—a large scale analysis could catalog every possible output pattern and then cluster or identify Azoth obfuscated contracts
a part of the solution
one of the solutions we might wanna consider is that we implement a statistical analysis framework to measure our current detectability baseline and this will be within analysis\
, then we iteratively improve our transforms (add/remove) until we achieve genuine distributional indistinguishability