Analysis of Virtual.1 by Bagolymadar
In the following post will describe the analysis of Virtual.1 a crakcme that implements a simple virtual machine, I used Triton emulation capabilities to solve it. Here I show an analysis of the VM, and the resources I used to do it.
You can find all the resources here: Virtual.1.
Authors
Writer:
- Eduardo Blazquez
Virtual.1 by Bagolymadar
Welcome my friend, today we are going to do an analysis of an old VM crackme I discovered on crackmes.one — bagolymadar's virtual.1 (Linux), available at: https://crackmes.one/crackme/5f3a64df33c5d42a7c667d45
I was a bit bored and looked for a simple VM crackme. The idea of the crackme is to write a keygen, and to do that, two things are mandatory:
- Analyze the VM to understand how it works.
- Understand the code run by the VM in order to know the real executed logic.
Code Virtualization
Code Virtualization is a protection technique where the program implements its own small CPU. Together with this CPU, the protection also contains a structure that represents things like registers, a stack, and memory. The difficulty here is not only understanding the x86 code that we can see in a disassembler — we also need to understand how this virtual CPU works, and then separately understand the code that runs inside it.
A good analogy: imagine you find a document written in a made-up language. Before you can understand what the document says, you first need to learn the rules of that language. The VM is the language, and the bytecode is the document.
There are commercial protections with virtualization capabilities (like VMProtect or Themida), but in this crackme the protection is implemented directly in the binary, and the bytecode was likely produced with a small custom assembler written by the author.
The Crackme
The crackme is a 64-bit ELF binary, not stripped, so we can access the symbols. The virtualized bytecode is not easy to read directly, and neither is the virtual CPU that executes it. We can check it out with file:
$ file virtual.1
virtual.1: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=3e7ac84584b944be377a3a42335f38dd1c85ee95, for GNU/Linux 3.2.0, not stripped
If we run the binary we get:
$ ./virtual.1
__ ___ ____ _____ _ _ _ _ _
\ \ / (_) _ \_ _| | | | / \ | | / |
\ \ / /| | |_) || | | | | |/ _ \ | | | |
\ V / | | _ < | | | |_| / ___ \| |___ _| |
\_/ |_|_| \_\|_| \___/_/ \_\_____(_)_|
bagolymadar
2020.08
Username:
If we try whatever when the program asks for a username and a serial:
Username: Pepe
Serial: ABCD1234567890
Nope, this is incorrect!
The username and the serial are not correct, so we need to dig deeper.
Analysis of the Binary
I will use Binary Ninja for this challenge, but you can use any disassembler/decompiler you like. If we open the binary, it finds the main function right away. The High Level IL view gives us something close to C source code:
00401130 int32_t main(int32_t argc, char** argv, char** envp)
0040113c printmenu()
00401154 printf(format: "%s", "Username: ")
00401170 char name_buffer[0x100]
00401170
00401170 if (fgets(&name_buffer, n: 0xfa, fp: __TMC_END__) == 0)
00401212 puts(str: "\n Nope, this is incorrect!\n")
00401170 else
00401176 char i = name_buffer[0]
0040117a char (* rdx_1)[0x100] = &name_buffer
0040117f if (i != 0)
00401192 do
00401196 if (i == 0xa)
00401198 *rdx_1 = 0
00401198 break
00401188 i = (*rdx_1)[1]
0040118c rdx_1 = &(*rdx_1)[1]
00401192 while (i != 0)
00401192
004011ab printf(format: "%s", "Serial: ")
004011cf char serial_value[0x108]
004011cf
004011cf if (fgets(buf: &serial_value, n: 0xfa, fp: __TMC_END__) == 0)
00401212 puts(str: "\n Nope, this is incorrect!\n")
004011cf else
004011d1 char i_1 = serial_value[0]
004011db if (i_1 != 0)
004011dd char (* rdx_3)[0x108] = &serial_value
004011f2 do
004011f6 if (i_1 == 0xa)
004011f8 *rdx_3 = 0
004011f8 break
004011e8 i_1 = (*rdx_3)[1]
004011ec rdx_3 = &(*rdx_3)[1]
004011f2 while (i_1 != 0)
004011f2
00401209 if (vmstart(name: &name_buffer, serial: &serial_value) == 1)
0040122b puts(str: "\n Yep, you got it!\n")
00401209 else
00401212 puts(str: "\n Nope, this is incorrect!\n")
00401223 return 0
The structure is simple — fgets reads both the username and the serial, a small loop strips the trailing newline, and then everything is handed off to vmstart. If vmstart returns 1 we win, otherwise we lose. The goal of the crackme is not to patch the binary but to understand the validation well enough to generate a correct serial for any username.
If we look at vmstart:
004017f0 int64_t vmstart(char* name, char* serial)
00401803 __builtin_memset(dest: &vm_data, ch: 0, count: 0xc0)
00401810 vm_data.NAME = name
00401814 vm_data.SERIAL = serial
0040181f vm_data.SP = &__TMC_END__
00401826 virtualcpu(vm: &vm_data)
0040182f return vm_data.result
A block of memory (vm_data) is zeroed out, the name and serial pointers are stored inside it, and then the whole structure is passed to virtualcpu. Whatever virtualcpu leaves in vm_data.result becomes the return value. So the real question is — what does virtualcpu do?
How a Virtual Machine Works
Before entering virtualcpu, it helps to understand the general model. A code virtualization protection tries to emulate a real CPU by providing:
- Registers — locations to store temporary values, implemented as fields in a struct or slots in an array.
- A program counter (PC) — a register that holds the address of the next instruction to execute, equivalent to
RIPin x86-64. - A stack pointer (SP) — a register that points to the top of the stack, equivalent to
RSPin x86-64. - Flags — bits that record the result of comparisons, like the zero flag and the less-than flag.
- A dispatcher — a big
switchstatement that reads the next opcode from the bytecode stream and runs the corresponding handler.
The bytecode is just an array of bytes in memory. The dispatcher reads one byte (the opcode), optionally reads one or two more bytes (the operands), executes the operation, advances the PC, and loops back. This is exactly the fetch-decode-execute cycle of a real CPU, just implemented in software.
In this crackme the VM struct looks like this once we rename the fields:
| Offset | Field | Description |
|---|---|---|
+0x00 |
ACC |
Accumulator register (r0) |
+0x08 |
r1 |
General purpose register |
+0x10 |
result |
Result register (r2), returned by vmstart |
+0x18 |
PC |
Program counter |
+0x20 |
NAME |
Pointer to the name buffer |
+0x28 |
SERIAL |
Pointer to the serial buffer |
+0x30 |
FLAGS |
Condition flags (bit0 = zero/equal, bit1 = less-than) |
+0x38 |
SP |
Stack pointer |
Each field is 8 bytes wide, so register index N maps to vm + (N * 8). That is why every handler in the switch does vm + (zx.q(op.b) << 3) — it is just an index multiplied by 8.
Do you want to know what happens next? Keep reading!
Analysis of the Binary — Part 2: Understanding the VM Handlers
Now let’s enter virtualcpu. The outer loop and dispatch look like this:
00401830 uint64_t virtualcpu(struct vm_struct* vm)
00401837 uint64_t op
00401837
00401837 while (true)
00401837 char* p_instr = &bytecode + vm->PC
0040183b op.b = *p_instr
0040183b int16_t* p_operands = &p_instr[1]
0040183e int64_t rcx
0040183e
0040183e switch (op.b)
At each iteration, the current instruction is fetched from bytecode + PC. The first byte is the opcode, and p_operands points to the bytes that follow. Let’s go through some handlers to build intuition.
MOVQ — copies a full 64-bit register value into another register. Both operands are register indices:
case MOVQ // opcode 0x01
op.b = *p_operands // destination register index
uint64_t* dst = vm + (zx.q(op.b) << 3)
op.b = *(p_operands + 1) // source register index
*dst = *(vm + (zx.q(op.b) << 3))
vm->PC += 3
MOVB — same idea but copies only the low byte, leaving the upper 7 bytes of the destination untouched:
case MOVB // opcode 0x02
op.b = *p_operands
char* dst = vm + (zx.q(op.b) << 3)
op.b = *(p_operands + 1)
op.b = *(vm + (zx.q(op.b) << 3)) // read low byte of source
*dst = op.b // write low byte of destination
vm->PC += 3
JMP — unconditional jump. Reads a 16-bit big-endian address from the operands and sets PC directly:
case JMP // opcode 0x11
// read 16-bit big-endian address
op.w = *p_operands
char temp = op.b
op.b = op:1.b
op:1.b = temp
vm->PC = zx.q(op.w)
The byte swap is there because the bytecode stores addresses in big-endian order but x86 is little-endian. This pattern — read word, swap bytes, assign to PC — appears in every instruction that encodes an address.
PUSH_IMM8 — pushes an immediate byte onto the VM stack. The stack grows downward, just like on a real CPU:
case PUSH_IMM8 // opcode 0x09
vm->SP -= 1
op.b = *p_operands
*vm->SP = op.b
vm->PC += 2
INC / DEC — increment or decrement a register by 1. The register index is the single operand:
case INC // opcode 0x41
op.b = *p_operands
int64_t* reg = vm + (zx.q(op.b) << 3)
*reg += 1
vm->PC += 2
case DEC // opcode 0x14
op.b = *p_operands
int64_t* reg = vm + (zx.q(op.b) << 3)
*reg -= 1
vm->PC += 2
PTR_NULL — dereferences a register as a pointer, reads a byte from that address, and sets FLAGS bit0 to 1 if the byte is zero. This is used to check for string termination:
case PTR_NULL // opcode 0x1e
op.b = *p_operands
op.b = **(vm + (zx.q(op.b) << 3)) // dereference pointer in register
op.b = op.b == 0
vm->FLAGS = (vm->FLAGS & ~1) | op.b
vm->PC += 2
CMPZ — similar to PTR_NULL but tests the register value itself rather than what it points to:
case CMPZ // opcode 0x54
op.b = *p_operands
op.b = *(vm + (zx.q(op.b) << 3)) == 0
vm->FLAGS = (vm->FLAGS & ~1) | op.b
vm->PC += 2
LOAD_SERIAL / LOAD_NAME — read the next byte from the serial or name buffer into ACC, then advance the pointer. These are single-byte instructions with no operands:
case LOAD_SERIAL // opcode 0x52
vm->ACC = zx.q(*vm->SERIAL)
vm->SERIAL += 1
vm->PC += 1
case LOAD_NAME // opcode 0x69
vm->ACC = zx.q(*vm->NAME)
vm->NAME += 1
vm->PC += 1
CMP_IMM8 — compares a register against an immediate byte and writes the result into two bits of FLAGS. Bit 0 is the equal flag, bit 1 is the less-than flag:
case CMP_IMM8 // opcode 0x7f
int64_t* reg = vm + (zx.q(*p_operands) << 3)
uint8_t imm = *(p_operands + 1)
int64_t val = *reg
bool lt = val s< imm
op.b = (val == imm)
op.b |= (lt << 1)
vm->FLAGS = (vm->FLAGS & ~3) | op.b
vm->PC += 3
JZ / JNZ / JL — conditional jumps that read FLAGS and either take the big-endian address from the operands or fall through by incrementing PC by 3:
case JZ // opcode 0x55 -- jump if FLAGS bit0 == 1 (equal/zero)
if ((vm->FLAGS & 1) == 0)
vm->PC += 3
else
// big-endian address -> PC
case JNZ // opcode 0x81 -- jump if FLAGS bit0 == 0 (not equal)
if ((vm->FLAGS & 1) != 0)
vm->PC += 3
else
// big-endian address -> PC
case JL // opcode 0x95 -- jump if FLAGS bit1 == 1 (less-than)
if ((vm->FLAGS & 2) == 0)
vm->PC += 3
else
// big-endian address -> PC
CALL / RET — subroutine call and return. CALL pushes the return address (PC + 3) as a 16-bit value onto the stack and jumps. RET pops it and jumps back:
case CALL // opcode 0xab
vm->SP -= 2
*vm->SP = (vm->PC + 3).w // return address stored as 16-bit value
// big-endian address -> PC
case RET // opcode 0xba
op = zx.q(*vm->SP)
vm->SP += 2
vm->PC = op
HLT — terminates the VM loop and returns. The result of the validation is whatever is left in vm->result, which the bytecode sets explicitly before calling HLT:
default
if (op.b == HLT) // opcode 0x4c
return op
The remaining handlers follow the same pattern. Here is the complete opcode table:
| Opcode | Mnemonic | Description |
|---|---|---|
0x01 |
MOVQ r_dst, r_src |
64-bit register-to-register copy |
0x02 |
MOVB r_dst, r_src |
8-bit register-to-register copy (low byte only) |
0x09 |
PUSH_IMM8 imm8 |
Push immediate byte onto stack |
0x11 |
JMP imm16 |
Unconditional jump (big-endian address) |
0x14 |
DEC r |
Decrement register by 1 |
0x1e |
PTR_NULL r |
Test if *reg == 0, write result to FLAGS bit0 |
0x41 |
INC r |
Increment register by 1 |
0x4c |
HLT |
Halt and return |
0x52 |
LOAD_SERIAL |
Load byte from *SERIAL into ACC, advance pointer |
0x54 |
CMPZ r |
Test if reg == 0, write result to FLAGS bit0 |
0x55 |
JZ imm16 |
Jump if FLAGS bit0 == 1 (equal/zero) |
0x56 |
MOV_IMM8 r, imm8 |
Load immediate byte into register (zero-extended) |
0x69 |
LOAD_NAME |
Load byte from *NAME into ACC, advance pointer |
0x7f |
CMP_IMM8 r, imm8 |
Compare register vs immediate, set FLAGS bit0/bit1 |
0x81 |
JNZ imm16 |
Jump if FLAGS bit0 == 0 (not equal) |
0x8d |
PUSH_REG r |
Push full 64-bit register onto stack |
0x8e |
POP_REG r |
Pop full 64-bit value from stack into register |
0x8f |
SHR r, imm8 |
Logical shift right |
0x90 |
TEST_IMM8 r, imm8 |
AND register with immediate, set FLAGS bit0 if zero |
0x93 |
ADD_REG r_dst, r_src |
Add register to register |
0x94 |
AND_IMM64 r, imm64 |
AND register with 8-byte immediate |
0x95 |
JL imm16 |
Jump if FLAGS bit1 == 1 (less-than) |
0x96 |
SUB_IMM8 r, imm8 |
Subtract immediate byte from register |
0x97 |
SHL r, imm8 |
Logical shift left |
0x98 |
CMP_REG r_dst, r_src |
Compare two registers, set FLAGS bit0/bit1 |
0x99 |
MOV_IMM64 r, imm64 |
Load 8-byte immediate into register |
0x9a |
ROL r_dst, r_src |
Rotate left by register value |
0x9b |
NOT r |
Bitwise NOT of low byte |
0x9c |
XOR r_dst, r_src |
XOR register with register (byte) |
0xab |
CALL imm16 |
Push return address, jump |
0xba |
RET |
Pop 16-bit address, jump |
And the full virtualcpu function with all symbols renamed:
00401830 uint64_t virtualcpu(struct vm_struct* vm)
00401837 uint64_t op
00401837
00401837 while (true)
00401837 char* p_instr = &bytecode + vm->PC
0040183b op.b = *p_instr
0040183b int16_t* p_operands = &p_instr[1]
0040183e int64_t rcx
0040183e
0040183e switch (op.b)
0040183c case MOVQ // MOV dst_reg, src_reg
00401939 op.b = *p_operands // get first operand
00401942 uint64_t* dst = vm + (zx.q(op.b) << 3)
00401946 op.b = *(p_operands + 1) // get second operand
00401956 *dst = *(vm + (zx.q(op.b) << 3))
00401959 vm->PC += 3
00401844 case MOVB // MOVB dst_reg, src_reg
00401963 op.b = *p_operands
0040196c char* r8_1 = vm + (zx.q(op.b) << 3)
00401970 op.b = *(p_operands + 1)
0040197d op.b = *(vm + (zx.q(op.b) << 3))
00401980 *r8_1 = op.b
00401983 vm->PC += 3
0040184c case PUSH_IMM8 // PUSH imm8
00401991 vm->SP -= 1
00401997 op.b = *p_operands
00401998 *vm->SP = op.b
0040199b vm->PC += 2
00401854 case JMP
004019a5 op.w = *p_operands
004019a7 char temp0_1 = op.b
004019a7 op.b = op:1.b
004019a7 op:1.b = temp0_1
004019ad vm->PC = zx.q(op.w)
0040185c case DEC
004019b6 op.b = *p_operands
004019bf int64_t* r8_4 = vm + (zx.q(op.b) << 3)
004019c3 *r8_4 -= 1
004019c6 vm->PC += 2
00401864 case PTR_NULL
004019d0 op.b = *p_operands
004019e0 op.b = **(vm + (zx.q(op.b) << 3))
004019e8 op.b = op.b == 0
004019f2 rcx.b = (vm->FLAGS & 0xfffffffffffffffe).b | op.b
004019f4 vm->FLAGS = rcx
004019f7 vm->PC += 2
0040186c case INC
00401a01 op.b = *p_operands
00401a0a int64_t* r8_7 = vm + (zx.q(op.b) << 3)
00401a0e *r8_7 += 1
00401a11 vm->PC += 2
0040183e default
00401876 if (op.b == HLT)
00401a1c return op
00401a1c
0040187e void* r10
0040187e
0040187e switch (op.b)
0040187c case LOAD_SERIAL
00401a2e vm->ACC = zx.q(*vm->SERIAL)
00401a31 vm->SERIAL = &vm->SERIAL[1]
00401a34 vm->PC += 1
00401884 case CMPZ
00401a3e op.b = *p_operands
00401a55 op.b = *(vm + (zx.q(op.b) << 3)) == 0
00401a5f rcx.b = (vm->FLAGS & 0xfffffffffffffffe).b | op.b
00401a61 vm->FLAGS = rcx
00401a64 vm->PC += 2
0040188c case JZ
00401a79 if ((vm->FLAGS & 1) == 0)
00401a8c vm->PC += 3
00401a79 else
00401a7b op.w = *p_operands
00401a7d char temp0_2 = op.b
00401a7d op.b = op:1.b
00401a7d op:1.b = temp0_2
00401a83 vm->PC = zx.q(op.w)
00401894 case MOV_IMM8
00401a96 op.b = *p_operands
00401aa3 uint64_t rax_5
00401aa3 rax_5.b = *(p_operands + 1)
00401aa8 *(vm + (zx.q(op.b) << 3)) = zx.q(rax_5.b)
00401aab vm->PC += 3
0040189c case LOAD_NAME
00401ac6 vm->ACC = zx.q(*vm->NAME)
00401ac9 vm->NAME = &vm->NAME[1]
00401acc vm->PC += 1
004018a4 case CMP_IMM8
00401ad6 op.b = *p_operands
00401adf int64_t* r1 = vm + (zx.q(op.b) << 3)
00401ae3 uint64_t rax_7
00401ae3 rax_7.b = *(p_operands + 1)
00401ae4 op = zx.q(rax_7.b)
00401ae8 int64_t r1_content = *r1
00401ae8 bool cond:2_1 = r1_content s< op
00401aeb op.b = r1_content == op
00401aee r10.b = cond:2_1
00401af2 r10.b <<= 1
00401af5 op.b |= r10.b
00401b03 rcx.b = (vm->FLAGS & 0xfffffffffffffffc).b | op.b
00401b05 vm->FLAGS = rcx
00401b08 vm->PC += 3
004018ac case JNZ
00401b1d if ((vm->FLAGS & 1) != 0)
00401b30 vm->PC += 3
00401b1d else
00401b1f op.w = *p_operands
00401b21 char temp0_3 = op.b
00401b21 op.b = op:1.b
00401b21 op:1.b = temp0_3
00401b27 vm->PC = zx.q(op.w)
004018b4 case PUSH_REG
00401b3e vm->SP -= 8
00401b45 op.b = *p_operands
00401b55 *vm->SP = *(vm + (zx.q(op.b) << 3))
00401b58 vm->PC += 2
004018bc case POP_REG
00401b62 op.b = *p_operands
00401b73 int64_t SP = vm->SP
00401b76 vm->SP += 8
00401b7d *(vm + (zx.q(op.b) << 3)) = *SP
00401b80 vm->PC += 2
004018c4 case SHR
00401b8a op.b = *p_operands
00401b93 uint64_t* reg = vm + (zx.q(op.b) << 3)
00401b97 op.b = *(p_operands + 1)
00401b98 rcx.b = op.b
00401b9a *reg u>>= rcx.b
00401b9d vm->PC += 3
004018cc case TEST_IMM8
00401ba7 op.b = *p_operands
00401bb0 int64_t* r9_8 = vm + (zx.q(op.b) << 3)
00401bb4 op.b = *(p_operands + 1)
00401bbc op.b = (*r9_8 & zx.q(op.b)) == 0
00401bca r10.b = (vm->FLAGS & 0xfffffffffffffffe).b | op.b
00401bcd vm->FLAGS = r10
00401bd0 vm->PC += 3
004018d4 case ADD_REG
00401bda op.b = *p_operands
00401be3 uint64_t* reg1 = vm + (zx.q(op.b) << 3)
00401be7 op.b = *(p_operands + 1)
00401bfa *reg1 += *(vm + (zx.q(op.b) << 3))
00401bfd vm->PC += 3
004018dc case AND_IMM64
00401c07 op.b = *p_operands
00401c07 char* rsi_25 = p_operands + 1
00401c10 void** r9_10 = vm + (zx.q(op.b) << 3)
00401c14 op.b = *rsi_25
00401c14 char* rsi_26 = &rsi_25[1]
00401c19 uint64_t rax_9
00401c19 rax_9.b = *rsi_26
00401c19 char* rsi_27 = &rsi_26[1]
00401c1e uint64_t rax_10
00401c1e rax_10.b = *rsi_27
00401c1e char* rsi_28 = &rsi_27[1]
00401c23 uint64_t rax_11
00401c23 rax_11.b = *rsi_28
00401c23 char* rsi_29 = &rsi_28[1]
00401c28 uint64_t rax_12
00401c28 rax_12.b = *rsi_29
00401c28 char* rsi_30 = &rsi_29[1]
00401c2d uint64_t rax_13
00401c2d rax_13.b = *rsi_30
00401c2d char* rsi_31 = &rsi_30[1]
00401c32 uint64_t rax_14
00401c32 rax_14.b = *rsi_31
00401c37 op.b = rsi_31[1]
00401c3e *r9_10 &= op
00401c41 vm->PC += 0xa
004018e4 case JL
00401c56 if ((vm->FLAGS & 2) == 0)
00401c69 vm->PC += 3
00401c56 else
00401c58 op.w = *p_operands
00401c5a char temp0_4 = op.b
00401c5a op.b = op:1.b
00401c5a op:1.b = temp0_4
00401c60 vm->PC = zx.q(op.w)
004018ec case SUB_IMM8
00401c73 op.b = *p_operands
00401c7c char* r9_11 = vm + (zx.q(op.b) << 3)
00401c80 op.b = *(p_operands + 1)
00401c81 *r9_11 -= op.b
00401c84 vm->PC += 3
004018f4 case SHL
00401c8e op.b = *p_operands
00401c97 int64_t* r9_12 = vm + (zx.q(op.b) << 3)
00401c9b op.b = *(p_operands + 1)
00401c9c rcx.b = op.b
00401c9e *r9_12 <<= rcx.b
00401ca1 vm->PC += 3
004018fc case CMP_REG
00401cab op.b = *p_operands
00401cb4 int64_t* r8_25 = vm + (zx.q(op.b) << 3)
00401cb8 uint64_t rax_16
00401cb8 rax_16.b = *(p_operands + 1)
00401cc5 op = *(vm + (zx.q(rax_16.b) << 3))
00401cc8 int64_t temp2_1 = *r8_25
00401cc8 bool cond:3_1 = temp2_1 s< op
00401ccb op.b = temp2_1 == op
00401cce r10.b = cond:3_1
00401cd2 r10.b <<= 1
00401cd5 op.b |= r10.b
00401ce3 rcx.b = (vm->FLAGS & 0xfffffffffffffffc).b | op.b
00401ce5 vm->FLAGS = rcx
00401ce8 vm->PC += 3
00401904 case MOV_IMM64
00401cf2 op.b = *p_operands
00401cf2 char* rsi_41 = p_operands + 1
00401cfb uint64_t* r8_27 = vm + (zx.q(op.b) << 3)
00401cff uint64_t rax_20
00401cff rax_20.b = *rsi_41
00401cff char* rsi_42 = &rsi_41[1]
00401d04 uint64_t rax_21
00401d04 rax_21.b = *rsi_42
00401d04 char* rsi_43 = &rsi_42[1]
00401d09 uint64_t rax_22
00401d09 rax_22.b = *rsi_43
00401d09 char* rsi_44 = &rsi_43[1]
00401d0e uint64_t rax_23
00401d0e rax_23.b = *rsi_44
00401d0e char* rsi_45 = &rsi_44[1]
00401d13 uint64_t rax_24
00401d13 rax_24.b = *rsi_45
00401d13 char* rsi_46 = &rsi_45[1]
00401d18 uint64_t rax_25
00401d18 rax_25.b = *rsi_46
00401d18 char* rsi_47 = &rsi_46[1]
00401d1d uint64_t rax_26
00401d1d rax_26.b = *rsi_47
00401d22 op.b = rsi_47[1]
00401d23 *r8_27 = op
00401d26 vm->PC += 0xa
0040190c case ROL
00401d30 op.b = *p_operands
00401d39 int64_t* r8_28 = vm + (zx.q(op.b) << 3)
00401d3d uint64_t rax_28
00401d3d rax_28.b = *(p_operands + 1)
00401d4a rcx.b = *(vm + (zx.q(rax_28.b) << 3))
00401d4d *r8_28 = rol.q(*r8_28, rcx.b)
00401d50 vm->PC += 3
00401914 case NOT
00401d5a op.b = *p_operands
00401d63 char* r8_29 = vm + (zx.q(op.b) << 3)
00401d67 *r8_29 = not.b(*r8_29)
00401d6a vm->PC += 2
0040191c case XOR
00401d74 op.b = *p_operands
00401d7d char* r8_30 = vm + (zx.q(op.b) << 3)
00401d81 uint64_t rax_32
00401d81 rax_32.b = *(p_operands + 1)
00401d8e op.b = *(vm + (zx.q(rax_32.b) << 3))
00401d91 *r8_30 ^= op.b
00401d94 vm->PC += 3
00401924 case CALL
00401da2 vm->SP -= 2
00401daa op.w = vm->PC.w
00401dae op.w += 3
00401db5 *vm->SP = op.w
00401db9 op.w = *p_operands
00401dbb char temp0_5 = op.b
00401dbb op.b = op:1.b
00401dbb op:1.b = temp0_5
00401dc1 vm->PC = zx.q(op.w)
0040187e default
0040192e if (op.b != RET)
0040192e break
0040192e
00401dd0 op = zx.q(*vm->SP)
00401dd4 vm->SP += 2
00401ddc vm->PC = op
00401ddc
00401de4 vm->result = 0
00401dec return op
At this point we have a complete picture of the virtual CPU. We know what every instruction does, we know how the registers and flags are laid out, and we know where the bytecode lives in memory. The next step is to actually trace the execution and read the bytecode logic — and for that we will use Triton as a concrete emulator, building a VM-level disassembler on top of it.
Triton, your favorite emulator
As I mentioned, we cannot simply symbolize the input data, run the VM code until the data has been transformed into a final value, and then use Z3 to solve for what we need. The problem here is different — the VM does not compute a single expression from the serial and compare it against a constant. Instead, it reads the serial byte by byte and uses those values to drive control flow. A wrong serial byte does not just produce the wrong hash value — it makes the VM jump to the failure path immediately. Symbolic execution would be useful if all paths eventually converged at a single comparison, but here incorrect input causes the VM to terminate early, which means Z3 never sees a constraint worth solving.
So symbolic execution is off the table for this one. What we can do instead is use Triton purely as a concrete x86-64 emulator — no symbolization, no Z3 — and instrument it to print the VM instructions as they execute. That gives us a clean VM-level trace that we can read manually, which is exactly what we need to understand the algorithm.
Let’s start building the emulator (solve-vm.py). First, the configuration variables — binary path, base address, the VM struct address, and the offsets of each field inside it:
BINARY_PATH = "./virtual.1"
BASE_ADDRESS = 0x400000
# VM struct base
VM_STRUCT = 0x404230
# VM struct offsets
VM_ACC = VM_STRUCT + 0x00 # r0 / accumulator
VM_R1 = VM_STRUCT + 0x08 # r1
VM_RESULT = VM_STRUCT + 0x10 # r2 / result / return value
VM_PC = VM_STRUCT + 0x18 # r3 / program counter
VM_NAME = VM_STRUCT + 0x20 # r4 / pointer to name buffer
VM_SERIAL = VM_STRUCT + 0x28 # r5 / pointer to serial buffer
VM_FLAGS = VM_STRUCT + 0x30 # r6 / flags
VM_SP = VM_STRUCT + 0x38 # r7 / stack pointer
Next, the bytecode base address. This is computed from the lea rsi, [rip + 0x28d1] instruction at 0x401830 — rip at execution time is 0x401837 (the next instruction), so the base is 0x401837 + 0x28d1 = 0x404108. We also define where the emulation starts and ends:
# Bytecode base: lea rsi, [rip + 0x28d1] at 0x401830, rip = 0x401837
BYTECODE_BASE = 0x401837 + 0x28d1
# Emulation range
ADDR_VMSTART = 0x4017f1
ADDR_VMSTART_RET = 0x40182f
The key insight for the VM-level disassembler is that instead of checking whether RIP is at the top of the dispatch loop, we check whether it has just entered a specific handler. Each opcode maps to exactly one handler address — the target of the je in the switch. We build a reverse map so the emulation loop can look up the opcode from the current RIP:
# Map opcode -> x86 handler address (je targets from the switch)
OPCODE_HANDLERS = {
0x01: 0x401939, # MOVQ
0x02: 0x401963, # MOVB
0x09: 0x40198d, # PUSH_IMM8
0x11: 0x4019a5, # JMP
0x14: 0x4019b6, # DEC
0x1e: 0x4019d0, # PTR_NULL
0x41: 0x401a01, # INC
0x4c: 0x401a1b, # HLT
0x52: 0x401a1d, # LOAD_SERIAL
0x54: 0x401a3e, # CMPZ
0x55: 0x401a6e, # JZ
0x56: 0x401a96, # MOV_IMM8
0x69: 0x401ab5, # LOAD_NAME
0x7f: 0x401ad6, # CMP_IMM8
0x81: 0x401b12, # JNZ
0x8d: 0x401b3a, # PUSH_REG
0x8e: 0x401b62, # POP_REG
0x8f: 0x401b8a, # SHR
0x90: 0x401ba7, # TEST_IMM8
0x93: 0x401bda, # ADD_REG
0x94: 0x401c07, # AND_IMM64
0x95: 0x401c4b, # JL
0x96: 0x401c73, # SUB_IMM8
0x97: 0x401c8e, # SHL
0x98: 0x401cab, # CMP_REG
0x99: 0x401cf2, # MOV_IMM64
0x9a: 0x401d30, # ROL
0x9b: 0x401d5a, # NOT
0x9c: 0x401d74, # XOR
0xab: 0x401d9e, # CALL
0xba: 0x401dc9, # RET
}
# Reverse map: handler address -> opcode
HANDLER_TO_OPCODE = {v: k for k, v in OPCODE_HANDLERS.items()}
HANDLER_ADDRESSES = set(OPCODE_HANDLERS.values())
Finally, the fake buffer addresses where we place the name and serial in Triton’s memory, together with a cap on the number of instructions and two command-line flags — one to print the underlying x86 instructions alongside the VM ones, and another to suppress printing inside known utility subroutines:
# Fake buffers
FAKE_NAME_ADDR = 0x00200000
FAKE_SERIAL_ADDR = 0x00300000
NAME_VALUE = b"Pepe\x00"
SERIAL_VALUE = b"ABCD1234\x00"
MAX_INSTRUCTIONS = 500_000
# CLI flags
PRINT_X86 = "--x86" in sys.argv
SKIP_STDLIB = "--no-stdlib" in sys.argv
The print_vm_instruction function reads the current VM PC from memory, then reads the operand bytes from the bytecode region and formats them according to the opcode. Crucially, we read the operands before the instruction executes, while the VM state is still consistent:
def print_vm_instruction(ctx, opcode):
"""Print the current VM instruction using the known opcode."""
pc = ctx.getConcreteMemoryValue(MemoryAccess(VM_PC, CPUSIZE.QWORD))
flags = ctx.getConcreteMemoryValue(MemoryAccess(VM_FLAGS, CPUSIZE.QWORD))
acc = ctx.getConcreteMemoryValue(MemoryAccess(VM_ACC, CPUSIZE.QWORD))
def bc(offset):
return ctx.getConcreteMemoryValue(
MemoryAccess(BYTECODE_BASE + pc + offset, CPUSIZE.BYTE))
def bc16(offset):
return (bc(offset) << 8) | bc(offset + 1) # big-endian
def bc64(offset):
val = 0
for i in range(8):
val = (val << 8) | bc(offset + i)
return val
match opcode:
case 0x01: s = f"MOVQ {rname(bc(1))}, {rname(bc(2))}"
case 0x02: s = f"MOVB {rname(bc(1))}, {rname(bc(2))}"
case 0x09: s = f"PUSH_IMM8 0x{bc(1):02x}"
case 0x11: s = f"JMP 0x{bc16(1):04x}"
case 0x14: s = f"DEC {rname(bc(1))}"
case 0x1e: s = f"PTR_NULL {rname(bc(1))}"
case 0x41: s = f"INC {rname(bc(1))}"
case 0x4c: s = f"HLT"
case 0x52: s = f"LOAD_SERIAL ; ACC=0x{acc:02x}"
case 0x54: s = f"CMPZ {rname(bc(1))}"
case 0x55: s = f"JZ 0x{bc16(1):04x} ; FLAGS=0x{flags:x}"
case 0x56: s = f"MOV_IMM8 {rname(bc(1))}, 0x{bc(2):02x}"
case 0x69: s = f"LOAD_NAME ; ACC=0x{acc:02x}"
case 0x7f: s = f"CMP_IMM8 {rname(bc(1))}, 0x{bc(2):02x} ; FLAGS=0x{flags:x}"
case 0x81: s = f"JNZ 0x{bc16(1):04x} ; FLAGS=0x{flags:x}"
case 0x8d: s = f"PUSH_REG {rname(bc(1))}"
case 0x8e: s = f"POP_REG {rname(bc(1))}"
case 0x8f: s = f"SHR {rname(bc(1))}, 0x{bc(2):02x}"
case 0x90: s = f"TEST_IMM8 {rname(bc(1))}, 0x{bc(2):02x}"
case 0x93: s = f"ADD_REG {rname(bc(1))}, {rname(bc(2))}"
case 0x94: s = f"AND_IMM64 {rname(bc(1))}, 0x{bc64(2):016x}"
case 0x95: s = f"JL 0x{bc16(1):04x} ; FLAGS=0x{flags:x}"
case 0x96: s = f"SUB_IMM8 {rname(bc(1))}, 0x{bc(2):02x}"
case 0x97: s = f"SHL {rname(bc(1))}, 0x{bc(2):02x}"
case 0x98: s = f"CMP_REG {rname(bc(1))}, {rname(bc(2))} ; FLAGS=0x{flags:x}"
case 0x99: s = f"MOV_IMM64 {rname(bc(1))}, 0x{bc64(2):016x}"
case 0x9a: s = f"ROL {rname(bc(1))}, {rname(bc(2))}"
case 0x9b: s = f"NOT {rname(bc(1))}"
case 0x9c: s = f"XOR {rname(bc(1))}, {rname(bc(2))}"
case 0xab: s = f"CALL 0x{bc16(1):04x}"
case 0xba: s = f"RET"
case _: s = f"UNKNOWN 0x{opcode:02x}"
print(f" [VM:{pc:04x}] {s}")
Triton does not provide a built-in way to load binaries, so we use LIEF to parse the ELF and map each loadable segment into the Triton context. We add BASE_ADDRESS to each segment’s virtual address because the binary was compiled as a PIE — without the base offset the segments land at the wrong addresses and the opcodes decode as garbage:
def load_binary(ctx: TritonContext, path: str) -> lief.Binary:
binary = lief.parse(path)
if binary is None:
raise RuntimeError(f"LIEF could not parse '{path}'")
for segment in binary.segments:
if segment.type == lief.ELF.Segment.TYPE.LOAD:
vaddr = BASE_ADDRESS + segment.virtual_address
content = bytes(segment.content)
ctx.setConcreteMemoryAreaValue(vaddr, content)
print(f"[LIEF] Mapped segment @ 0x{vaddr:08x} size=0x{len(content):x}")
return binary
There is also a binary patch needed. The two movabs instructions in vmstart that load the VM struct address (0x404230) were assembled without the PIE base, so they contain 0x4230 instead of 0x404230. We fix them directly in Triton’s memory after loading:
def patch_binary(ctx: TritonContext) -> None:
# movabs rdi, 0x404230 at 0x4017f9 (encoding: 48 BF + 8 bytes LE)
ctx.setConcreteMemoryAreaValue(0x4017f9 + 2, (0x404230).to_bytes(8, 'little'))
# movabs r10, 0x404230 at 0x401805 (encoding: 49 BA + 8 bytes LE)
ctx.setConcreteMemoryAreaValue(0x401805 + 2, (0x404230).to_bytes(8, 'little'))
print("[PATCH] Fixed VM struct address in movabs instructions")
The setup method places the name and serial at their fake addresses, zeroes out the VM struct, and initializes the x86 registers to match what vmstart would receive at its entry point:
def setup_state(ctx: TritonContext) -> None:
ctx.setConcreteMemoryAreaValue(FAKE_NAME_ADDR, NAME_VALUE)
ctx.setConcreteMemoryAreaValue(FAKE_SERIAL_ADDR, SERIAL_VALUE)
ctx.setConcreteMemoryAreaValue(VM_STRUCT, b"\x00" * 0xc0)
ctx.setConcreteRegisterValue(ctx.registers.rip, ADDR_VMSTART)
ctx.setConcreteRegisterValue(ctx.registers.rsp, 0x7ffff000)
ctx.setConcreteRegisterValue(ctx.registers.rbp, 0x7ffff000)
ctx.setConcreteRegisterValue(ctx.registers.rdi, FAKE_NAME_ADDR)
ctx.setConcreteRegisterValue(ctx.registers.rsi, FAKE_SERIAL_ADDR)
print(f"[SETUP] name = {NAME_VALUE!r} @ 0x{FAKE_NAME_ADDR:08x}")
print(f"[SETUP] serial = {SERIAL_VALUE!r} @ 0x{FAKE_SERIAL_ADDR:08x}")
print(f"[SETUP] BYTECODE_BASE= 0x{BYTECODE_BASE:08x}")
print(f"[SETUP] Concrete state initialised")
The Triton context is initialized with x86-64 architecture and a few optimization flags that speed up concrete emulation:
def init_triton() -> TritonContext:
ctx = TritonContext(ARCH.X86_64)
ctx.setMode(MODE.ALIGNED_MEMORY, True)
ctx.setMode(MODE.CONSTANT_FOLDING, True)
ctx.setMode(MODE.AST_OPTIMIZATIONS, True)
ctx.setAstRepresentationMode(AST_REPRESENTATION.PYTHON)
return ctx
Before the emulation loop, here is main to show how everything fits together:
def main() -> None:
print(f"[INFO] name = {NAME_VALUE!r}")
print(f"[INFO] serial = {SERIAL_VALUE!r}")
print(f"[INFO] x86 printing = {PRINT_X86}\n")
ctx = init_triton()
load_binary(ctx, BINARY_PATH)
patch_binary(ctx)
setup_state(ctx)
emulate(ctx)
if __name__ == "__main__":
main()
The emulation loop is the standard Triton pattern — fetch the instruction bytes at RIP, create an Instruction object, call ctx.processing() to execute it, then move on. The only addition is the handler check: if RIP matches a known handler address we print the VM instruction before the x86 code executes. The --no-stdlib flag suppresses printing inside known utility subroutines to keep the trace focused on the high-level logic:
def emulate(ctx: TritonContext) -> None:
print(f"\n[EMU] Starting emulation from 0x{ADDR_VMSTART:08x}")
print(f"[EMU] Printing: VM instructions {'+ x86' if PRINT_X86 else 'only'}\n")
skip_depth = 0
for count in range(MAX_INSTRUCTIONS):
rip = ctx.getConcreteRegisterValue(ctx.registers.rip)
if rip == ADDR_VMSTART_RET:
break
should_print_vm = False
vm_opcode = None
if rip in HANDLER_ADDRESSES:
vm_opcode = HANDLER_TO_OPCODE[rip]
pc = ctx.getConcreteMemoryValue(MemoryAccess(VM_PC, CPUSIZE.QWORD))
if SKIP_STDLIB:
if pc in SKIP_SUBROUTINES:
skip_depth += 1
if vm_opcode == 0xba and skip_depth > 0:
skip_depth -= 1
if skip_depth == 0:
should_print_vm = True
if should_print_vm:
print_vm_instruction(ctx, vm_opcode)
opcodes = ctx.getConcreteMemoryAreaValue(rip, 16)
insn = Instruction(rip, opcodes)
ctx.processing(insn)
if insn.getType() == OPCODE.X86.HLT:
print(f"[EMU] HLT @ 0x{rip:08x}")
break
if PRINT_X86 and skip_depth == 0:
print(f" [x86] 0x{rip:08x} {insn.getDisassembly()}")
else:
print(f"[EMU] Reached instruction cap ({MAX_INSTRUCTIONS})")
print("\n[VM FINAL STATE]")
print(f" r0/ACC = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_ACC, CPUSIZE.QWORD)):016x}")
print(f" r1 = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_R1, CPUSIZE.QWORD)):016x}")
print(f" r2/RESULT = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_RESULT, CPUSIZE.QWORD)):016x}")
print(f" r3/PC = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_PC, CPUSIZE.QWORD)):016x}")
print(f" r4/NAME = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_NAME, CPUSIZE.QWORD)):016x}")
print(f" r5/SERIAL = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_SERIAL, CPUSIZE.QWORD)):016x}")
print(f" r6/FLAGS = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_FLAGS, CPUSIZE.QWORD)):016x}")
print(f" r7/SP = 0x{ctx.getConcreteMemoryValue(MemoryAccess(VM_SP, CPUSIZE.QWORD)):016x}")
Analysis of the Bytecode
Now that we have the emulator working, let’s run it and analyze the trace. We start with NAME_VALUE = b"Pepe\x00" and SERIAL_VALUE = b"ABCD1234\x00":
$ python3 solve-vm.py
The first thing the VM does is check that neither the name nor the serial pointer is null, then call the strlen routine at 0x0110 on the serial:
[VM:0000] CMPZ r4/NAME
[VM:0002] JZ 0x0120 ; FLAGS=0x0
[VM:0005] CMPZ r5/SERIAL
[VM:0007] JZ 0x0120 ; FLAGS=0x0
[VM:000a] MOVQ r0/ACC, r5/SERIAL
[VM:000d] CALL 0x0110
The subroutine at 0x0110 is a standard strlen — it walks the string pointer in r0 byte by byte until it hits a null terminator, counting the characters in r1:
[VM:0113] PTR_NULL r0/ACC
[VM:0115] JZ 0x011f ; FLAGS=0x0
[VM:0118] INC r1
[VM:011a] INC r0/ACC
[VM:011c] JMP 0x0113
After strlen returns, the length in r1 is compared against 0x15 (21 in decimal). Our test serial ABCD1234 is only 8 characters, so it fails immediately:
[VM:0010] CMP_IMM8 r1, 0x15 ; FLAGS=0x1
[VM:0013] JNZ 0x0120 ; FLAGS=0x2
[VM:0120] MOV_IMM8 r2/RESULT, 0x00
[VM:0123] HLT
The serial must be exactly 21 characters. Let’s update it and run again:
SERIAL_VALUE = b"AAAAAAAAAAAAAAAAAAAAA\x00"
Now strlen passes, and the VM calls strlen on the name too (storing the result in r1), then immediately calls the subroutine at 0x00be which parses the first two serial characters as a hex digit pair. The parsed value is compared against the name length:
[VM:0016] MOVQ r0/ACC, r4/NAME
[VM:0019] CALL 0x0110 # strlen(name) -> r1
[VM:001c] CALL 0x00be # parse 2 hex chars from serial -> r2
[VM:001f] CMP_REG r1, r2/RESULT
[VM:0022] JNZ 0x0120
The subroutine at 0x009a (called by 0x00be) validates that each character is a valid hex digit (0-9 or A-F) and converts it to its numeric value. 0x00be calls it twice, shifts the first result left by 4 bits, and adds the second to produce a full byte. So the first two serial characters must be the hex representation of the name length. For Pepe (length 4) that is 04.
The next check parses two more hex characters and compares them against the sum of the popcount of all characters in the name. The popcount of a byte is the number of set bits in its binary representation:
[VM:0025] CALL 0x00ef # sum of popcount of all name chars -> r1
[VM:0028] CALL 0x00be # parse 2 more hex chars -> r2
[VM:002b] CMP_REG r1, r2/RESULT
[VM:002e] JNZ 0x0120
For Pepe:
P=0x50=01010000b— 2 set bitse=0x65=01100101b— 4 set bitsp=0x70=01110000b— 3 set bitse=0x65=01100101b— 4 set bits- Total = 13 =
0x0D
So the second pair of serial characters must be 0D.
After that, the VM reads one more byte and checks it against 0x2d, which is the ASCII code for -:
[VM:0031] LOAD_SERIAL ; read next serial byte
[VM:0032] CMP_IMM8 r0/ACC, 0x2d ; compare with '-'
[VM:0035] JNZ 0x0120
The serial format so far is 040D-. We have consumed 5 characters and 16 remain. To keep the trace clean from here, we add the known utility subroutines to a skip list and run with --no-stdlib:
SKIP_SUBROUTINES = {
0x0110, # strlen
0x00ce, # popcount single char
0x00ef, # popcount whole string
0x009a, # parse hex digit
0x00be, # parse hex byte (2 digits)
}
Reading the 64-bit Serial Value
After the dash separator is consumed, the VM calls subroutine 0x004b. This one reads the remaining 16 hex characters from the serial and assembles them into a single 64-bit value stored in ACC:
[VM:0038] CALL 0x004b
[VM:004b] MOV_IMM8 r0/ACC, 0x00 ; r0 = 0 (will hold the result)
[VM:004e] MOV_IMM8 r1, 0x08 ; r1 = 8 (loop counter, one per byte)
; loop:
[VM:0051] CMP_IMM8 r1, 0x00 ; counter == 0?
[VM:0054] JZ 0x0069 ; yes -> done
[VM:0057] PUSH_REG r0/ACC ; save current result
[VM:0059] CALL 0x00be ; parse next 2 hex chars -> r2
[VM:005c] POP_REG r0/ACC ; restore result
[VM:005e] SHL r0/ACC, 0x08 ; shift left 8 bits (make room for new byte)
[VM:0061] MOVB r0/ACC, r2 ; insert parsed byte into low byte of r0
[VM:0064] DEC r1 ; counter--
[VM:0066] JMP 0x0051 ; loop back
[VM:0069] RET ; r0 = full 64-bit value
Each iteration shifts the accumulator left by 8 bits and slots the next parsed byte into the low position. After 8 iterations (16 hex characters) the accumulator holds the complete 64-bit big-endian value.
Computing and Comparing the Hash
Once 0x004b returns, the parsed 64-bit value sits in r0. The VM pushes it and calls the hash function at 0x006a:
[VM:003b] PUSH_REG r0/ACC ; save the parsed serial value
[VM:003d] CALL 0x006a ; compute hash(name) -> r2
The hash function starts from a fixed constant and processes the name character by character:
[VM:006a] MOV_IMM64 r2, 0xb7e151628aed2a6a ; h = initial constant
; loop -- read one name byte per iteration:
[VM:0074] LOAD_NAME ; r0 = next byte from name, advance pointer
[VM:0075] CMPZ r0/ACC ; r0 == 0? (end of string)
[VM:0077] JZ 0x0099 ; yes -> done
[VM:007a] PUSH_REG r2 ; save current hash
[VM:007c] PUSH_REG r0/ACC ; save current char
[VM:007e] CALL 0x00ce ; popcount(r0) -> r2
[VM:0081] MOVQ r1, r2 ; r1 = popcount(char)
[VM:0084] POP_REG r0/ACC ; restore char
[VM:0086] POP_REG r2 ; restore hash
[VM:0088] ROL r2, r1 ; h = ROL(h, popcount(char))
[VM:008b] TEST_IMM8 r1, 0x01 ; is popcount odd?
[VM:008e] JZ 0x0093 ; yes -> skip NOT (jump over it)
[VM:0091] NOT r0/ACC ; no (even popcount) -> NOT the char byte
[VM:0093] XOR r2, r0/ACC ; h ^= char (or ~char)
[VM:0096] JMP 0x0074 ; next character
[VM:0099] RET ; r2 = final hash
For each character the algorithm does three things — rotates the hash left by the popcount of the character, conditionally NOTs the character byte if that popcount is even, and XORs the result into the hash. The TEST_IMM8 r1, 0x01 / JZ 0x0093 pair controls the NOT — when the jump is taken (FLAGS bit0 == 1, meaning popcount & 1 == 0, i.e. popcount is even) the NOT is skipped, so only characters with an odd popcount get their byte inverted before the XOR.
After the hash is computed, the final comparison is straightforward:
[VM:0040] POP_REG r0/ACC ; restore parsed serial value
[VM:0042] CMP_REG r0, r2 ; serial_value == hash(name)?
[VM:0045] JZ 0x0124 ; yes -> success
[VM:0048] JMP 0x0120 ; no -> fail
[VM:0124] MOV_IMM8 r2/RESULT, 0x01
[VM:0127] HLT ; return 1 -> "Yep, you got it!"
[VM:0120] MOV_IMM8 r2/RESULT, 0x00
[VM:0123] HLT ; return 0 -> "Nope, this is incorrect!"
The hash computed from the name must exactly match the 64-bit value encoded in the last 16 characters of the serial. If it does, vm->result is set to 1 and vmstart returns success.
Writing a Keygen
Now that we know what each field of the serial must contain, we can write a standalone keygen (keygen.py) that computes the correct serial for any name:
#!/usr/bin/env python3
"""
VirtUAL 1 serial generator (bagolymadar 2020)
Usage: python3 keygen.py <username>
"""
import sys
def popcount(n: int) -> int:
return bin(n).count('1')
def compute_hash(name: str) -> int:
h = 0xb7e151628aed2a6a
for c in name:
b = ord(c)
pc = popcount(b) & 63
h = ((h << pc) | (h >> (64 - pc))) & 0xffffffffffffffff
if pc & 1: # odd popcount -> NOT the char
b = (~b) & 0xff
h ^= b
h &= 0xffffffffffffffff
return h
def make_serial(name: str) -> str:
length = len(name)
pop_sum = sum(popcount(ord(c)) for c in name) & 0xff
hash_val = compute_hash(name)
return f"{length:02X}{pop_sum:02X}-{hash_val:016X}"
def main():
if len(sys.argv) < 2:
print("Usage: python3 keygen.py <username>")
sys.exit(1)
name = sys.argv[1]
serial = make_serial(name)
print(f"Username : {name}")
print(f"Serial : {serial}")
if __name__ == "__main__":
main()
Let’s run it with Pepe:
$ python3 keygen.py Pepe
Username : Pepe
Serial : 040D-2A2C515DA54FECE9
We can verify the result by putting the serial back into the Triton emulator. The trace should now end with JZ 0x0124 taken, r2/RESULT set to 1, and HLT:
[VM:0042] CMP_REG r0/ACC, r2/RESULT ; FLAGS=0x1
[VM:0045] JZ 0x0124 ; FLAGS=0x1
[VM:0124] MOV_IMM8 r2/RESULT, 0x01
[VM:0127] HLT
[VM FINAL STATE]
r0/ACC = 0x2a2c515da54fece9
r1 = 0x0000000000000004
r2/RESULT = 0x0000000000000001
r3/PC = 0x0000000000000127
r4/NAME = 0x0000000000200005
r5/SERIAL = 0x0000000000300015
r6/FLAGS = 0x0000000000000001
r7/SP = 0x00000000004042f0
r2/RESULT = 0x01 confirms success. And the binary agrees:
$ ./virtual.1
Username: Pepe
Serial: 040D-2A2C515DA54FECE9
Yep, you got it!
Conclusion
Code virtualization is a strong protection technique. Even for a relatively simple crackme like this one, we had to build a proper disassembler and emulator before we could read any of the logic. Symbolic execution with Triton and Z3 turned out not to be the right tool here — the serial drives control flow rather than feeding into a single comparison, so there was nothing useful to symbolize. What actually helped was using Triton purely as a concrete emulator and building a VM-level disassembler on top of it, which gave us a clean trace of the bytecode execution.
Once the trace was readable, the algorithm fell out naturally: the serial is 21 characters, the first two encode the name length in hex, the next two encode the total popcount of the name in hex, the fifth is a dash separator, and the remaining sixteen encode a 64-bit hash of the name computed by rotating and XORing with popcount-conditioned byte inversions.
Extra — Binary Ninja Architecture Plugin
As an extra step, we can take the full opcode table we decoded and use it to write a Binary Ninja architecture plugin (virtual1_arch.py). This lets us extract the raw bytecode from the binary, open it as a standalone file in Binary Ninja with the custom architecture selected, and get full disassembly, LLIL lifting, and decompiler output directly — without any of the surrounding x86 code getting in the way.
To extract the bytecode from the open binary in the Binary Ninja scripting console:
bytecode_addr = 0x404108
bytecode_size = 0x200
bytecode = bv.read(bytecode_addr, bytecode_size)
with open("/tmp/vm_bytecode.bin", "wb") as f:
f.write(bytecode)
Open vm_bytecode.bin as a Raw binary with the Virtual1VM platform selected, press P at offset 0x0000 to define the entry point, and let analysis run. The disassembly looks like this:
00000000 void main(int64_t arg1 @ NAME, int64_t arg2 @ SERIAL, int64_t arg3 @ FLAGS) __noreturn
00000000 5404 CMPZ NAME
00000002 550120 JZ bad_ending
00000005 5405 CMPZ SERIAL
00000007 550120 JZ bad_ending
0000000a 010005 MOVQ ACC, SERIAL
0000000d ab0110 CALL strlen
00000010 7f0115 CMP_IMM8 r1, 0x15
00000013 810120 JNZ bad_ending
00000016 010004 MOVQ ACC, NAME
00000019 ab0110 CALL strlen
0000001c ab00be CALL hexstr_to_hex_number
0000001f 980102 CMP_REG r1, RESULT
00000022 810120 JNZ bad_ending
00000025 ab00ef CALL bitcount_string
00000028 ab00be CALL hexstr_to_hex_number
0000002b 980102 CMP_REG r1, RESULT
0000002e 810120 JNZ bad_ending
00000031 52 LOAD_SERIAL
00000032 7f002d CMP_IMM8 ACC, '-'
00000035 810120 JNZ bad_ending
00000038 ab004b CALL str_to_hex
0000003b 8d00 PUSH_REG ACC {var_10}
0000003d ab006a CALL create_hash_from_name
00000040 8e00 POP_REG ACC {var_8}
00000042 980002 CMP_REG ACC, RESULT
00000045 550124 JZ good_ending
00000048 110120 JMP bad_ending
And the decompiler output for the main entry point:
00000000 void main(int64_t arg1 @ NAME, int64_t arg2 @ SERIAL, int64_t arg3 @ FLAGS) __noreturn
00000000 {
00000000 if (!arg1)
00000002 return bad_ending();
00000007 if (!arg2)
00000007 return bad_ending();
0000000d int64_t r1 = strlen();
00000013 if (r1 != 0x15)
00000013 return bad_ending();
00000019 int64_t r1_1 = strlen();
0000001c RESULT = hexstr_to_hex_number();
0000001f if (r1_1 != RESULT)
00000022 return bad_ending();
00000025 int64_t r1_2 = bitcount_string();
00000028 RESULT_1 = hexstr_to_hex_number();
0000002b if (r1_2 != RESULT_1)
0000002e return bad_ending();
00000031 if (ACC_2 != 0x2d)
00000035 return bad_ending();
0000003b int64_t var_10 = str_to_hex();
0000003d RESULT_2 = create_hash_from_name();
00000045 if (var_8 == RESULT_2)
00000045 return good_ending();
00000048 return bad_ending();
00000048 }
The validation logic that took us some time to read from the raw x86 trace is now expressed clearly by the decompiler, with named functions and readable control flow. The saved Binary Ninja database for the bytecode is included as vm_bytecode.bin.bndb.
Files
| File | Description |
|---|---|
virtual.1 |
The original crackme binary |
solve-vm.py |
Triton + LIEF emulator with VM-level disassembler. Run with --x86 to also print x86 instructions, --no-stdlib to suppress output from internal subroutines |
keygen.py |
Serial generator — usage: python3 keygen.py <username> |
virtual1_arch.py |
Binary Ninja architecture plugin for the VM. Copy to ~/.binaryninja/plugins/ and restart Binary Ninja |
vm_bytecode.bin |
Raw extracted bytecode, suitable for opening with the architecture plugin |
vm_bytecode.bin.bndb |
Binary Ninja database for the bytecode with analysis already applied |