discussion: statistical indistinguishability in azoth

first i'd explain how our function dispatcher transform works and what it currently solves
## the function dispatcher
### detection of the standard dispatcher
we scan the contract’s `CFG/IR` for the typical Solidity‐style dispatcher pattern:
```assembly
PUSH1 0x00    CALLDATALOAD    PUSH1 0xE0    SHR          ← extract 4‑byte selector  
DUP1           PUSH4 <sel₁>   EQ             PUSH2 <addr₁>   JUMPI  
DUP1           PUSH4 <sel₂>   EQ             PUSH2 <addr₂>   JUMPI  
…  
REVERT  
```
this linear chain of `DUP1 PUSH4 <selector> EQ PUSH2 <addr> JUMPI` checks is a primary fingerprint that analyzer use

### harvesting selectors & destinations
once detected, we pull out the list of (`selector`ᵢ, `addressᵢ`) pairs from those comparison blocks.

### rewriting into an obfuscated “Jump Table”
now, instead of emitting that flat chain, we rebuild the dispatch logic by randomizing across several “patterns” (chosen per‐seed):
- Standard: the original pattern, but with entries shuffled into a random order.
- Arithmetic: obscure the selector check via ADD/SUB/XOR before comparing.
- Inverted: use NEQ or inverted comparisons plus conditional jumps to swap taken paths.
- Cascaded: layer multiple comparisons and branches (e.g. check half the selectors, branch, then check the rest).
- Dummy Branches: insert dead‑code comparisons or opaque predicates that never fire, purely to add noise.

we then reassign new basic blocks for each check, re‑index PCs, and wire up the final jumps so that at runtime the selector still routes to the correct function and at this point the onchain bytecode no longer bears the simple signature chain analyzers look for.

### curious, what does this solve anyway?
- hide Function Signatures
you see i discovered that unobfuscated, the 4‑byte selectors (e.g. `0xa9059cbb`) appear literally in the bytecode. any static analyzer can extract them and map back to function names. so we hide or handle the selector comparisons, and in so doing we deny easy signature recovery.

- defeat classifiers (not totally though, i'd come back to this)
many bulk‐analysis tools fingerprint contracts by their dispatcher shape. our transform yields a unique dispatch structure per seed, so automated clustering can’t group all Mirage/Azoth contracts under a common fingerprint.

- increase revers engineering cost 
even if someone disassembles a single contract, the randomized dispatcher forces them into manual CFG reconstruction (i am yet to test this though, most of the transformations here are based on research). and because each obfuscated build differs, the manual reverse engineering you do on one obfuscated contract can’t be reused on others; each one still costs you full effort because they all differ

## statistical indistinguishability concerns
when we talk about statistical indistinguishability, we mean that an adversary—even with broad access to many obfuscated bytecodes—cannot reliably tell which contracts were processed by Azoth versus ordinary deployments, nor cluster Azoth outputs together. can we say we have fully achieved this?, i would say **NO**, yes even with the current function dispatcher transform, we haven't. since i built this transforms, i know that the way the current transforms right now optimize for individual contract obfuscation rather than distributional or large data indistinguishability.

### finite variant space
our transforms operate within a bounded, finite pattern space. function dispatcher supports five variants (Standard, Arithmetic, Inverted, Cascaded, Dummy‑Branches), each following a deterministic template per seed. `OpaquePredicate` currently implements only constant‐equality checks; no arithmetic or hash‐based predicates yet. `JumpAddressTransformer` uses only addition‐based splitting of jump targets. and `Shuffle` reorders existing basic blocks without creating new control structures.

because each pass follows a small, known set of templates—even though the order and parameters are randomized—a large scale analysis could catalog every possible output pattern and then cluster or identify Azoth obfuscated contracts

### a part of the solution
one of the solutions we might wanna consider is that we implement a statistical analysis framework to measure our current detectability baseline and this will be within `analysis\`, then we iteratively improve our transforms (add/remove) until we achieve genuine distributional indistinguishability

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

discussion: statistical indistinguishability in azoth #52

the function dispatcher

detection of the standard dispatcher

harvesting selectors & destinations

rewriting into an obfuscated “Jump Table”

curious, what does this solve anyway?

statistical indistinguishability concerns

finite variant space

a part of the solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

discussion: statistical indistinguishability in azoth #52

Description

the function dispatcher

detection of the standard dispatcher

harvesting selectors & destinations

rewriting into an obfuscated “Jump Table”

curious, what does this solve anyway?

statistical indistinguishability concerns

finite variant space

a part of the solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions