Skip to content

discussion: statistical indistinguishability in azoth #52

@g4titanx

Description

@g4titanx

first i'd explain how our function dispatcher transform works and what it currently solves

the function dispatcher

detection of the standard dispatcher

we scan the contract’s CFG/IR for the typical Solidity‐style dispatcher pattern:

PUSH1 0x00    CALLDATALOAD    PUSH1 0xE0    SHR          ← extract 4‑byte selector  
DUP1           PUSH4 <sel₁>   EQ             PUSH2 <addr₁>   JUMPI  
DUP1           PUSH4 <sel₂>   EQ             PUSH2 <addr₂>   JUMPI  

REVERT  

this linear chain of DUP1 PUSH4 <selector> EQ PUSH2 <addr> JUMPI checks is a primary fingerprint that analyzer use

harvesting selectors & destinations

once detected, we pull out the list of (selectorᵢ, addressᵢ) pairs from those comparison blocks.

rewriting into an obfuscated “Jump Table”

now, instead of emitting that flat chain, we rebuild the dispatch logic by randomizing across several “patterns” (chosen per‐seed):

  • Standard: the original pattern, but with entries shuffled into a random order.
  • Arithmetic: obscure the selector check via ADD/SUB/XOR before comparing.
  • Inverted: use NEQ or inverted comparisons plus conditional jumps to swap taken paths.
  • Cascaded: layer multiple comparisons and branches (e.g. check half the selectors, branch, then check the rest).
  • Dummy Branches: insert dead‑code comparisons or opaque predicates that never fire, purely to add noise.

we then reassign new basic blocks for each check, re‑index PCs, and wire up the final jumps so that at runtime the selector still routes to the correct function and at this point the onchain bytecode no longer bears the simple signature chain analyzers look for.

curious, what does this solve anyway?

  • hide Function Signatures
    you see i discovered that unobfuscated, the 4‑byte selectors (e.g. 0xa9059cbb) appear literally in the bytecode. any static analyzer can extract them and map back to function names. so we hide or handle the selector comparisons, and in so doing we deny easy signature recovery.

  • defeat classifiers (not totally though, i'd come back to this)
    many bulk‐analysis tools fingerprint contracts by their dispatcher shape. our transform yields a unique dispatch structure per seed, so automated clustering can’t group all Mirage/Azoth contracts under a common fingerprint.

  • increase revers engineering cost
    even if someone disassembles a single contract, the randomized dispatcher forces them into manual CFG reconstruction (i am yet to test this though, most of the transformations here are based on research). and because each obfuscated build differs, the manual reverse engineering you do on one obfuscated contract can’t be reused on others; each one still costs you full effort because they all differ

statistical indistinguishability concerns

when we talk about statistical indistinguishability, we mean that an adversary—even with broad access to many obfuscated bytecodes—cannot reliably tell which contracts were processed by Azoth versus ordinary deployments, nor cluster Azoth outputs together. can we say we have fully achieved this?, i would say NO, yes even with the current function dispatcher transform, we haven't. since i built this transforms, i know that the way the current transforms right now optimize for individual contract obfuscation rather than distributional or large data indistinguishability.

finite variant space

our transforms operate within a bounded, finite pattern space. function dispatcher supports five variants (Standard, Arithmetic, Inverted, Cascaded, Dummy‑Branches), each following a deterministic template per seed. OpaquePredicate currently implements only constant‐equality checks; no arithmetic or hash‐based predicates yet. JumpAddressTransformer uses only addition‐based splitting of jump targets. and Shuffle reorders existing basic blocks without creating new control structures.

because each pass follows a small, known set of templates—even though the order and parameters are randomized—a large scale analysis could catalog every possible output pattern and then cluster or identify Azoth obfuscated contracts

a part of the solution

one of the solutions we might wanna consider is that we implement a statistical analysis framework to measure our current detectability baseline and this will be within analysis\, then we iteratively improve our transforms (add/remove) until we achieve genuine distributional indistinguishability

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions