TikTok Stack-Based Virtual Machine Reverse Engineering

_{OutpostProxies.com - Premium USA/Global proxies for security research. HTTP/SOCKS5 support, unlimited bandwidth, flexible plans starting at $2/GB. Updates | Support}

If you want to buy all algorithms reversed in js or any lang that i will rewrite, hit me up, dont write if ur not willing to pay 3k+$

TikTok Stack-Based Virtual Machine Reverse Engineering

Reverse engineering analysis of TikTok's client-side stack-based virtual machine implementation, including string deobfuscation, bytecode parsing, and instruction disassembly.

Overview

This repo contains tools and analysis for reverse engineering TikTok's JavaScript virtual machine used for client-side protection and code obfuscation. The VM implements a stack-based architecture with 77 different opcodes and uses custom bytecode interpretation.

Note: This entire analysis was completed in 5-6 hours, so there may be some mistakes or areas for improvement. Feel free to contribute and fix any issues.

I'm not planning to continue working on this project because after making the devirtualizer my interest is totally gone (most of the time when I work on VMs). I wanted to help people who are struggling with VM reverse engineering, so I'm sharing all my work here.

Important: If TikTok is not happy about this repository, please contact me first instead of sending a DMCA takedown: (All work done in this repository was for my own learning purposes and I have not coded any complete signer.)

Discord: emrovsky
Discord server: https://discord.gg/switchuwu
Email: eemrovsky@proton.me

String Deobfuscation

First thing I had to deal with was the heavily obfuscated strings. TikTok's code uses multiple string arrays to hide what's actually going on.

How the obfuscation works

The original code had references like:

// Obfuscated references
Kg[123]  // References to main string array
aa[45]   // References to numeric constants array

My approach

Found the string arrays: Located the main arrays (giggernigger and aa) containing 1000+ encoded strings
Used Babel for static analysis: Traversed the AST to find all MemberExpression nodes referencing these arrays
Direct replacement: Just replaced the array lookups with actual values

// deobf.js - The core deobfuscation logic
const deobfuscateEncodedStringVisitor = {
  MemberExpression(path) {
    if (path.node && path.node.object && path.node.object.name == "Kg") {
      toReplace = giggernigger[path.node.property.value]
      path.replaceWith(t.valueToNode(toReplace))
    }
    
    if (path.node && path.node.object && path.node.object.name == "aa") {
      toReplace = aa[path.node.property.value]
      path.replaceWith(t.valueToNode(toReplace))
    }
  },
};

Results

After running this, meaningless array references became readable:

// Before: Kg[123]
// After: "navigator"

// Before: aa[45] 
// After: 2654435769

Bytecode Analysis

Understanding the instruction format

The VM uses custom bytecode stored in Uint8Array. Each instruction has:

Opcode (1 byte): Operation type (0-76)
Operands (variable): Immediate values, offsets, or indices

The fetchInstructions function

This function parses the binary format. I had to reverse engineer how it reads the bytecode:

function fetchInstructions() {
    var R = [];
    var r = blabla(t);  // Read string count
    
    j = []
    for (var n = 0; n < r; ++n) {
        j.push(decodestring(t));  // Decode strings
    }
    
    var o = blabla(t);  // Read instruction count
    
    for (n = 0; n < o; ++n) {
        var i = blabla(t);        // Function index
        var u = Boolean(blabla(t)); // Strict mode flag
        var a = new Array();
        var s = blabla(t);
        for (var c = 0; c < s; ++c) {
            a.push([blabla(t), blabla(t), blabla(t), blabla(t)]); // Exception handlers
        }
        var f = new Array();
        var d = blabla(t);
        for (var l = 0; l < d; ++l) {
            f.push(blabla(t));  // Bytecode instructions
        }
        R.push([f, i, u, a]);
    }
    
    return {strings: j, instructions: R}
}

Why I hardcoded the bytecode

Instead of implementing the zip decompression algorithm (which would have taken more time), I just extracted the bytecode and hardcoded it for quicker static analysis:

t = { 
    d: new Uint8Array([/* extracted bytecode */]),
    i: 0
}

This let me focus on the VM analysis rather than spending time on decompression implementation.

Control Flow Handling

Manual analysis approach

I did this part manually because automated control flow analysis would have taken way longer. Here's what I did:

Mapped all opcodes: Created a complete list of all 77 opcodes and what they do
Traced jumps: Followed conditional and unconditional jumps manually
Exception handlers: Figured out how try-catch-finally blocks work
Function calls: Tracked how functions call each other

Converting nested conditionals to clean opcodes

The original VM code was a mess of nested if-else statements that made it hard to understand. I converted this:

function h() {
  while (true) {
    var e = o[c++];
    if (e < 38) {
      if (e < 19) {
        if (e < 9) {
          if (e < 4) {
            if (e < 2) {
              if (e === 0) {
                var t = v[l--];
                v[l] -= t;
              } else {
                var r = o[c++];
                if (!v[l--]) {
                  c += r;
                }
              }
            } else if (e === 2) {
              var n = o[c++];
              l -= n;
              var h = v.slice(l + 1, l + n + 1);
              var m = v[l--];
              var b = v[l--];
              // ... more nested code

Into a clean switch statement:

function h() {
  while (true) {
    var opcode = o[c++];
    
    switch (opcode) {
      case 0: //SUB
        var t = v[l--];
        v[l] -= t;
        break;
        
      case 1: //JUMP_IF_FALSE
        var offset = o[c++];
        if (!v[l--]) {
          c += offset;
        }
        break;
        
      case 2: //CALL
        var argCount = o[c++];
        l -= argCount;
        var args = v.slice(l + 1, l + argCount + 1);
        var func = v[l--];
        var thisArg = v[l--];
        if (typeof func != "function") {
          f = 3;
          d = new TypeError(typeof func + " is not a function");
          return;
        }
        var w = C.get(func);
        if (w) {
          g.push([o, i, u, a, s, c, f, d]);
          Ag[1](w[0], thisArg, args, w[1]);
        } else {
          var result = func.apply(thisArg, args);
          v[++l] = result;
        }
        break;
        
      // ... 74 more cases
    }
  }
}

Key control flow instructions

Opcode	Instruction	What it does
1	JUMP_IF_FALSE	Jump if top of stack is falsy
3	JUMP	Unconditional jump
19	TRY	Set up exception handler
20	JUMP_IF_FALSE_OR_POP	Jump or pop based on condition
26	JUMP_IF_TRUE_OR_POP	Jump or pop based on condition
29	SWITCH_CASE	Switch statement case handling
31	JUMP_IF_TRUE	Jump if top of stack is truthy

Virtual Machine Architecture

Stack-based design

The VM is a classic stack machine with these components:

Evaluation Stack (v): Where operands go for computations
Stack Pointer (l): Points to current top of stack
Program Counter (c): Current instruction being executed
Scope Chain (a): Handles variable scoping
Exception State (f, d): For error handling

The main VM loop

function h() {
  while (true) {
    var opcode = o[c++];
    
    switch (opcode) {
      case 0: // SUB
        var t = v[l--];
        v[l] -= t;
        break;
        
      case 2: // CALL
        var argCount = o[c++];
        l -= argCount;
        var args = v.slice(l + 1, l + argCount + 1);
        var func = v[l--];
        var thisArg = v[l--];
        var result = func.apply(thisArg, args);
        v[++l] = result;
        break;
        
      // ... 75 more opcodes
    }
  }
}

Instruction types

The VM supports these categories of instructions:

Stack Operations: PUSH_, POP, DUP
Arithmetic: ADD, SUB, MUL, DIV, MOD
Bitwise: AND, OR, XOR, NOT, SHIFT
Comparison: EQ, NEQ, LT, GT, LTE, GTE
Control Flow: JUMP, CALL, RETURN, TRY
Object Operations: GET_PROP, SET_PROP, NEW_OBJECT
Variable Access: GET_VAR, SET_VAR, GET_GLOBAL

Finding Encryption Functions

One of the main goals was to find where TikTok handles request signing and encryption. Here's the methodology I used to track down these functions:

Step 1: Finding the entry points

I started by locating where the tokens are actually generated. Through debugging, I found these key calls:

a = yn(u, new Uint8Array(t));  // a is the X-Bogus token
s = bn(u, new Uint8Array(r));  // s is the X-Gnarly token

Step 2: VM execution tracing

Then I put a breakpoint in the main VM execution loop and added logging to track what's happening:

while (true) {
    var opcode = o[c++];  // <-- Breakpoint here
    
    if (window.thatarray) {
        console.log(`[VM] Opcode ${opcode} at position ${c-1}, stack level: ${l}, instruction length ${o.length}`);
        window.thatarray.push(`[VM] Opcode ${opcode} at position ${c-1}, stack level: ${l}, instruction length ${o.length}`);
    }
    
    if (window.oparrays) {
        for (let i = 0; i < R.length; i++) {
            if (R[i][0] === o) {
                window.oparrays.push(`${i}`)
            }
        }
    }
    // ... rest of VM loop
}

Step 3: Finding strData function

For the strData function, I used a slightly different approach. I found this return statement in the code:

return xe({
    magic: 538969122,
    version: 1,
    dataType: e,
    strData: t,           // <-- This is what I was looking for
    tspFromClient: new Date().getTime()
});

I put a breakpoint here, then stepped back a bit in the execution to trace where the strData value was coming from. Using the same logging approach with window.thatarray and window.oparrays, I tracked the execution flow to find which VM function was responsible for generating that data.

Step 4: A small debugging advantage

Here's where it gets interesting - if you trace back a bit from the breakpoint, you can actually see the raw version of strData before it gets processed by the VM. This is a small oversight that's quite useful for browser engineers who want to understand what data is being fed into the encryption process.

While you still need to reverse engineer the actual encryption algorithm, being able to see the input data makes it easier to understand the data flow and validate your analysis.

Step 5: Collecting execution traces

To track which VM functions are involved in token generation:

Set window.thatarray = []; to collect opcode execution info
Set window.oparrays = []; to collect function indices being used
Resume execution from the breakpoint
Use [...new Set(window.oparrays)] to get unique function indices

Step 6: Results

After tracing the execution during token generation, I identified these key functions:

vm249 | "X-Gnarly"
vm103 | "X-Bogus" 
vm42  | strData

These functions are the starting points of the flow for token generation - they handle the generation of request headers that TikTok uses for API authentication and anti-bot protection. The "X-Bogus" and "X-Gnarly" headers contain encrypted request signatures that are critical for bypassing TikTok's protection mechanisms.

Note that the return can be another opcode depending on the execution path. You can check debug_asf/bogus_flow.txt for an example of a complete execution flow to see how the VM processes these functions step by step.

The beauty of this approach is that you can trace exactly which parts of the VM are involved in any specific operation just by setting up the logging arrays and following the execution flow. Whether you're tracking token generation or any other functionality, the same methodology applies - find the output, breakpoint there, trace backwards to see what VM functions contributed to that result.

In my opinion, the best approach is getting the execution flow and used opcodes from debugging, then matching those patterns with the disassembly output to build working signers. This gives you both the high-level understanding from the disasm and the real-world execution traces from debugging.

Usage

Running the deobfuscator

# Deobfuscate strings in JavaScript code
node deobf.js

# Generate disassembly from bytecode  
node disasm.js

Analyzing VM functions

# The disassembler outputs individual function files
ls functions/
# vm0.js, vm1.js, vm2.js, ... vm271.js

Example output

------------------------103--------------------------
// 0 PUSH_STRING → stack[0] = "d41d8cd98f00b204e9800998ecf8427e"
// 3 SET_VAR scope[0][4] ← stack[0]
// 6 GET_VAR → stack[0] = scope[0][3]
// 9 PUSH_UNDEFINED → stack[1] = undefined
// 10 STRICT_NOT_EQUAL stack[0] = stack[0] !== stack[1]

Contributing

This was a quick analysis and there are definitely things that could be improved:

Some opcode interpretations might be wrong
Exception handling logic needs more work
Variable scoping analysis could be better
More test cases would be helpful

Feel free to submit PRs with fixes or improvements. If you find mistakes, just fix them - I probably missed some details in the rush.

License

MIT License - Use this for educational and research purposes.

Disclaimer

This project is for educational and research purposes only. Don't use it for anything malicious.

This was a 5-6 hour speedrun, so double-check everything and improve what you can.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
debug_asf		debug_asf
deobf		deobf
functions		functions
README.md		README.md
bytearray.js		bytearray.js
disasm.js		disasm.js

notemrovsky/tiktok-reverse-engineering

Folders and files

Latest commit

History

Repository files navigation

TikTok Stack-Based Virtual Machine Reverse Engineering

Overview

Table of Contents

String Deobfuscation

How the obfuscation works

My approach

Results

Bytecode Analysis

Understanding the instruction format

The fetchInstructions function

Why I hardcoded the bytecode

Control Flow Handling

Manual analysis approach

Converting nested conditionals to clean opcodes

Key control flow instructions

Virtual Machine Architecture

Stack-based design

The main VM loop

Instruction types

Finding Encryption Functions

Step 1: Finding the entry points

Step 2: VM execution tracing

Step 3: Finding strData function

Step 4: A small debugging advantage

Step 5: Collecting execution traces

Step 6: Results

Usage

Running the deobfuscator

Analyzing VM functions

Example output

Contributing

License

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages