OutpostProxies.com - Premium USA/Global proxies for security research. HTTP/SOCKS5 support, unlimited bandwidth, flexible plans starting at $2/GB. Updates | Support

Reverse engineering analysis of TikTok's client-side stack-based virtual machine implementation, including string deobfuscation, bytecode parsing, and instruction disassembly.
This repo contains tools and analysis for reverse engineering TikTok's JavaScript virtual machine used for client-side protection and code obfuscation. The VM implements a stack-based architecture with 77 different opcodes and uses custom bytecode interpretation.
Note: This entire analysis was completed in 5-6 hours, so there may be some mistakes or areas for improvement. Feel free to contribute and fix any issues.
I'm not planning to continue working on this project because after making the devirtualizer my interest is totally gone (most of the time when I work on VMs). I wanted to help people who are struggling with VM reverse engineering, so I'm sharing all my work here.
Important: If TikTok is not happy about this repository, please contact me first instead of sending a DMCA takedown: (All work done in this repository was for my own learning purposes and I have not coded any complete signer.)
- Discord: emrovsky
- Discord server: https://discord.gg/switchuwu
- Email: eemrovsky@proton.me
- String Deobfuscation
- Bytecode Analysis
- Control Flow Handling
- Virtual Machine Architecture
- Finding Encryption Functions
- Usage
- Contributing
First thing I had to deal with was the heavily obfuscated strings. TikTok's code uses multiple string arrays to hide what's actually going on.
The original code had references like:
// Obfuscated references
Kg[123] // References to main string array
aa[45] // References to numeric constants array
- Found the string arrays: Located the main arrays (
giggernigger
andaa
) containing 1000+ encoded strings - Used Babel for static analysis: Traversed the AST to find all
MemberExpression
nodes referencing these arrays - Direct replacement: Just replaced the array lookups with actual values
// deobf.js - The core deobfuscation logic
const deobfuscateEncodedStringVisitor = {
MemberExpression(path) {
if (path.node && path.node.object && path.node.object.name == "Kg") {
toReplace = giggernigger[path.node.property.value]
path.replaceWith(t.valueToNode(toReplace))
}
if (path.node && path.node.object && path.node.object.name == "aa") {
toReplace = aa[path.node.property.value]
path.replaceWith(t.valueToNode(toReplace))
}
},
};
After running this, meaningless array references became readable:
// Before: Kg[123]
// After: "navigator"
// Before: aa[45]
// After: 2654435769
The VM uses custom bytecode stored in Uint8Array
. Each instruction has:
- Opcode (1 byte): Operation type (0-76)
- Operands (variable): Immediate values, offsets, or indices
This function parses the binary format. I had to reverse engineer how it reads the bytecode:
function fetchInstructions() {
var R = [];
var r = blabla(t); // Read string count
j = []
for (var n = 0; n < r; ++n) {
j.push(decodestring(t)); // Decode strings
}
var o = blabla(t); // Read instruction count
for (n = 0; n < o; ++n) {
var i = blabla(t); // Function index
var u = Boolean(blabla(t)); // Strict mode flag
var a = new Array();
var s = blabla(t);
for (var c = 0; c < s; ++c) {
a.push([blabla(t), blabla(t), blabla(t), blabla(t)]); // Exception handlers
}
var f = new Array();
var d = blabla(t);
for (var l = 0; l < d; ++l) {
f.push(blabla(t)); // Bytecode instructions
}
R.push([f, i, u, a]);
}
return {strings: j, instructions: R}
}
Instead of implementing the zip decompression algorithm (which would have taken more time), I just extracted the bytecode and hardcoded it for quicker static analysis:
t = {
d: new Uint8Array([/* extracted bytecode */]),
i: 0
}
This let me focus on the VM analysis rather than spending time on decompression implementation.
I did this part manually because automated control flow analysis would have taken way longer. Here's what I did:
- Mapped all opcodes: Created a complete list of all 77 opcodes and what they do
- Traced jumps: Followed conditional and unconditional jumps manually
- Exception handlers: Figured out how try-catch-finally blocks work
- Function calls: Tracked how functions call each other
The original VM code was a mess of nested if-else statements that made it hard to understand. I converted this:
function h() {
while (true) {
var e = o[c++];
if (e < 38) {
if (e < 19) {
if (e < 9) {
if (e < 4) {
if (e < 2) {
if (e === 0) {
var t = v[l--];
v[l] -= t;
} else {
var r = o[c++];
if (!v[l--]) {
c += r;
}
}
} else if (e === 2) {
var n = o[c++];
l -= n;
var h = v.slice(l + 1, l + n + 1);
var m = v[l--];
var b = v[l--];
// ... more nested code
Into a clean switch statement:
function h() {
while (true) {
var opcode = o[c++];
switch (opcode) {
case 0: //SUB
var t = v[l--];
v[l] -= t;
break;
case 1: //JUMP_IF_FALSE
var offset = o[c++];
if (!v[l--]) {
c += offset;
}
break;
case 2: //CALL
var argCount = o[c++];
l -= argCount;
var args = v.slice(l + 1, l + argCount + 1);
var func = v[l--];
var thisArg = v[l--];
if (typeof func != "function") {
f = 3;
d = new TypeError(typeof func + " is not a function");
return;
}
var w = C.get(func);
if (w) {
g.push([o, i, u, a, s, c, f, d]);
Ag[1](w[0], thisArg, args, w[1]);
} else {
var result = func.apply(thisArg, args);
v[++l] = result;
}
break;
// ... 74 more cases
}
}
}
Opcode | Instruction | What it does |
---|---|---|
1 | JUMP_IF_FALSE | Jump if top of stack is falsy |
3 | JUMP | Unconditional jump |
19 | TRY | Set up exception handler |
20 | JUMP_IF_FALSE_OR_POP | Jump or pop based on condition |
26 | JUMP_IF_TRUE_OR_POP | Jump or pop based on condition |
29 | SWITCH_CASE | Switch statement case handling |
31 | JUMP_IF_TRUE | Jump if top of stack is truthy |
The VM is a classic stack machine with these components:
- Evaluation Stack (
v
): Where operands go for computations - Stack Pointer (
l
): Points to current top of stack - Program Counter (
c
): Current instruction being executed - Scope Chain (
a
): Handles variable scoping - Exception State (
f
,d
): For error handling
function h() {
while (true) {
var opcode = o[c++];
switch (opcode) {
case 0: // SUB
var t = v[l--];
v[l] -= t;
break;
case 2: // CALL
var argCount = o[c++];
l -= argCount;
var args = v.slice(l + 1, l + argCount + 1);
var func = v[l--];
var thisArg = v[l--];
var result = func.apply(thisArg, args);
v[++l] = result;
break;
// ... 75 more opcodes
}
}
}
The VM supports these categories of instructions:
Stack Operations: PUSH_, POP, DUP
Arithmetic: ADD, SUB, MUL, DIV, MOD
Bitwise: AND, OR, XOR, NOT, SHIFT
Comparison: EQ, NEQ, LT, GT, LTE, GTE
Control Flow: JUMP, CALL, RETURN, TRY
Object Operations: GET_PROP, SET_PROP, NEW_OBJECT
Variable Access: GET_VAR, SET_VAR, GET_GLOBAL
One of the main goals was to find where TikTok handles request signing and encryption. Here's the methodology I used to track down these functions:
I started by locating where the tokens are actually generated. Through debugging, I found these key calls:
a = yn(u, new Uint8Array(t)); // a is the X-Bogus token
s = bn(u, new Uint8Array(r)); // s is the X-Gnarly token
Then I put a breakpoint in the main VM execution loop and added logging to track what's happening:
while (true) {
var opcode = o[c++]; // <-- Breakpoint here
if (window.thatarray) {
console.log(`[VM] Opcode ${opcode} at position ${c-1}, stack level: ${l}, instruction length ${o.length}`);
window.thatarray.push(`[VM] Opcode ${opcode} at position ${c-1}, stack level: ${l}, instruction length ${o.length}`);
}
if (window.oparrays) {
for (let i = 0; i < R.length; i++) {
if (R[i][0] === o) {
window.oparrays.push(`${i}`)
}
}
}
// ... rest of VM loop
}
For the strData function, I used a slightly different approach. I found this return statement in the code:
return xe({
magic: 538969122,
version: 1,
dataType: e,
strData: t, // <-- This is what I was looking for
tspFromClient: new Date().getTime()
});
I put a breakpoint here, then stepped back a bit in the execution to trace where the strData
value was coming from. Using the same logging approach with window.thatarray
and window.oparrays
, I tracked the execution flow to find which VM function was responsible for generating that data.
Here's where it gets interesting - if you trace back a bit from the breakpoint, you can actually see the raw version of strData before it gets processed by the VM. This is a small oversight that's quite useful for browser engineers who want to understand what data is being fed into the encryption process.
While you still need to reverse engineer the actual encryption algorithm, being able to see the input data makes it easier to understand the data flow and validate your analysis.
To track which VM functions are involved in token generation:
- Set
window.thatarray = [];
to collect opcode execution info - Set
window.oparrays = [];
to collect function indices being used - Resume execution from the breakpoint
- Use
[...new Set(window.oparrays)]
to get unique function indices
After tracing the execution during token generation, I identified these key functions:
vm249 | "X-Gnarly"
vm103 | "X-Bogus"
vm42 | strData
These functions are the starting points of the flow for token generation - they handle the generation of request headers that TikTok uses for API authentication and anti-bot protection. The "X-Bogus" and "X-Gnarly" headers contain encrypted request signatures that are critical for bypassing TikTok's protection mechanisms.
Note that the return can be another opcode depending on the execution path. You can check debug_asf/bogus_flow.txt
for an example of a complete execution flow to see how the VM processes these functions step by step.
The beauty of this approach is that you can trace exactly which parts of the VM are involved in any specific operation just by setting up the logging arrays and following the execution flow. Whether you're tracking token generation or any other functionality, the same methodology applies - find the output, breakpoint there, trace backwards to see what VM functions contributed to that result.
In my opinion, the best approach is getting the execution flow and used opcodes from debugging, then matching those patterns with the disassembly output to build working signers. This gives you both the high-level understanding from the disasm and the real-world execution traces from debugging.
# Deobfuscate strings in JavaScript code
node deobf.js
# Generate disassembly from bytecode
node disasm.js
# The disassembler outputs individual function files
ls functions/
# vm0.js, vm1.js, vm2.js, ... vm271.js
------------------------103--------------------------
// 0 PUSH_STRING → stack[0] = "d41d8cd98f00b204e9800998ecf8427e"
// 3 SET_VAR scope[0][4] ← stack[0]
// 6 GET_VAR → stack[0] = scope[0][3]
// 9 PUSH_UNDEFINED → stack[1] = undefined
// 10 STRICT_NOT_EQUAL stack[0] = stack[0] !== stack[1]
This was a quick analysis and there are definitely things that could be improved:
- Some opcode interpretations might be wrong
- Exception handling logic needs more work
- Variable scoping analysis could be better
- More test cases would be helpful
Feel free to submit PRs with fixes or improvements. If you find mistakes, just fix them - I probably missed some details in the rush.
MIT License - Use this for educational and research purposes.
This project is for educational and research purposes only. Don't use it for anything malicious.
This was a 5-6 hour speedrun, so double-check everything and improve what you can.