Writing a (toy) symbolic interpreter, and solving challenges, part 2

Written by Simon Marechal - 23/07/2021 - in Outils - Download
In the second part of this series, we write a concrete interpreter for a subset of WebAssembly.

This is a series, please check part 1 if you have not done already or even fast forward to part 3 or part 4!

The WASM architecture

This article will not be a crash course in WASM, but will just highlight enough to be able to follow the interpreter code. First of all, I am really not an expert in WASM, and I know that parts of the interpreter are really shoddy. For the definitive answer on how some feature works, please check the official documentation. You will see that it is written in a very formal way that can be annoying to parse. For our purpose, we can use a very simplified model of the WASM runtime.

First of all, the WASM instructions manipulate the following state (there is more to it, but let us gloss over it):

  • local variables, accessed by their indexes
  • a global memory area, accessed by an address
  • a stack, used by most instructions
  • but no registers!

Stack based languages will be familiar for HP calculator users, Lua VM implementors, or Forth programmers. For the others, here is an example. The following program fragment adds local variables 0 and 1, and stores the result in local variable 2:

# stack is represented at each steps with comments
# []  <-  the stack is empty at this point
local.get 0
# [value0]
local.get 1
# [value1, value0]
i32.add
# this instruction pops two values from the stack, sums them, and pushes the result back
# [value1 + value0]
local.set 2
# a value is popped, and stored in the second local variable
# []

Variables can be integer or floating point values, with a width of 32 or 64 bits, and all instructions are strongly typed. Another interesting design property of WASM is that everything can be typechecked, so that one can be sure that the program will not "go wrong".

But the weirdest part is how control flow works. Here is a sample C function:

Compiled, it will look something like:

block           
    block           
        local.get       0
        i32.const       1
        i32.ge_s
        br_if           0   # 0: down to label1
        i32.const       0
        local.set       1
        br              1   # 1: down to label0
    end_block               # label1:
    i32.const       0
    local.set       2
    i32.const       0
    local.set       1
    loop                    # label2:
        local.get       2
        local.get       1
        i32.xor 
        local.set       1
        local.get       0
        local.get       2
        i32.const       1
        i32.add 
        local.tee       2
        i32.ne  
        br_if           0  # 0: up to label2
    end_loop
end_block                  # label0:
local.get       1
end_function

What is interesting is that the execution blocks are explicitly defined (using the ... block keyword), and that the jump instructions (here, br and br_if) can't jump at arbitrary places, but only to the end of an execution block (or the beginning for loop blocks)! This means that all jumps are "outward".

Writing a concrete interpreter

The interpreter will be written in Haskell. There are several reasons for this:

  • it is exceedingly well suited for the task at hand ;
  • it is not Python. Most security related tools seems to be written in Python these days, and it is not so bad for some of them. But code analysis tools, such as symbolic interpreters, require significant processing to happen, which Python is really bad at. Also, the aversion from writing documentation that most tool writers have is at least slightly offset by having expressive static types :)
  • it has the high quality libraries we need for this project ;
  • it makes it really easy to perform step 2 ;
  • as a Haskell user, I can't help talking about it :)

Code can be browsed here (be careful about how you browse this repository, the final code is in the repository, so make sure you use this link to browse the code that corresponds to part2 tag).

Representing the interpreter state

The following things will be modified during execution:

  • the memory ;
  • the stack ;
  • the instruction pointer.

As we have seen beforehand, we will keep close to the abstract representation of the virtual machine, which means the stack will hold three types of values:

data StackElement
  = Value VNum -- standard values
  | Cont [Instruction Natural] -- continuation (for execution blocks)
  | Activation ResultType Frame -- continuation (for functions)

For those unfamiliar with the syntax, this is just like the following Rust declaration:

enum StackElement {
  Value(VNum),
  Cont(Vec<Instruction>),
  Activation(ResultType, Frame)
}

It means that a StackElement can either be a numerical value (32/64bits integer/float), a continuation (holding a list of instructions), or an Activation. The Activation represents the context a called function returns to (the Frame), and the type of result that is expected from the function. This is kind of how it is represented in the official specification, so we will keep close to this. Actual implementations will of course not work like that, for performance reasons, but we will be keeping the code as simple as possible here.

The interpreter state holds all the things that need to be altered during execution. It looks like:

data WState
  = WState
      { _smem :: M.Map Int Word8,
        _sstack :: [StackElement],
        _sframe :: Frame,
      }

Memory is represented as a sparse map, and the frame contains the local variables and current list of instructions to be executed.

Interpreting the instructions

We will write the functions as generally as possible, for two purposes. The first is that, contrary to what intuition says, it is usually easier to get a generalized function right, the reason being that you have less rope to hang yourself with. The second is that we do not yet know how turning the concrete interpreter into a symbolic interpreter will work.

We will write two functions : decodeInstr, that interprets a single WASM instruction, and interpret, that runs the program, using decodeInstr until we reach the end of execution.  We will also need several helper functions, for example for popping from and pushing into the stack, that will be used later, such as:

  • pop32: pops a value from the stack, making sure it is a 32 bits integer
  • push: pushes a stack element (value, continuation, etc.) on the stack
  • loadByte: loads a byte from a memory address

The decodeInstr function takes a single instruction as parameter, and modifies the interpreter state, potentially throwing errors. In order to keep the notation more readable to non haskellers, we will use the following type:

decodeInstr
  :: (MonadError String m, MonadState WState m)
  => Instruction Natural -> m ()

The part before the => arrow is a list of constraints (read traits for Rustaceans, or interface if you are familiar with languages such as Java).  It means basically that we are writing our interpreter in a monad m that:

  • can throw String errors ;
  • can manipulate a state of type WState.

For example, here is the part that interprets the I32Load8U instruction (load a byte from memory and push it on the stack as a 32 bits integer):

case instruction of
  -- case for the I32Load8U instruction
  -- ma represents memory offset, and memory alignment parameters
  I32Load8U ma -> do
    addr <- pop32  -- pop a 32 bit integer from the stack, representing the address
    byte <- loadByte addr ma  -- load the byte the address is pointing to
    pushV (VI32 (u8tou32 byte)) -- push the value on the stack after widening it
  ...

It starts by popping the address from the stack, a 32 bits value, and then loads the byte from memory. The byte is converted to a 32 bits unsigned integer and pushed on the stack.

Another example is the If instruction:

  ...
  -- here, tb is the "true branch" and fb the "false branch"
  If (Inline Nothing) tb fb -> do
    res <- popV  -- pop a value from the stack
    -- push the next instructions to be executed as a continuation
    -- on the stack:
    current_instructions <- use (wframe . finstrs)
    push (Cont current_instructions)
    -- set the current instructions to be executed:
    wframe . finstrs .= if toBool res -- convert the value to a boolean
                          then tb -- run the "true branch"
                          else fb -- run the "false branch"
  ...

The If instruction holds the instruction blocks for the "true" case, and the "false" case. It pops the first value from the stack, and checks if it is true (different from zero). It saves the current list of instructions on the stack, and replaces it with either the "true" or "false" code block.

Parsing the WASM module

Fortunately, there already is a good library for this task : the wasm package. This library is fairly capable, as it has a parser for the binary and test formats, a validator and an interpreter. However, we will only use the binary parser.

The challenge

For this series of articles, our goal will be to solve the ForumCrack challenge automatically. Getting the WASM module is left as an exercise to the reader for now :)

This module exposes the following function, as obtained with wasm-decompile:

Both sub-functions return either 0 or 1, so this function can return 0, 1, 2, or 3, theoretically. Looking at the JavaScript code that calls this module, it is possible to notice that 0 is a failure, 1 is a success, and 2 is a great success.

If you clone the repository and reset it to the part2 tag, you can test the code:

$ cd toy-wasm-symbexp
$ git checkout part2
$ stack run /path/to/keygenme.wasm username password
Right [VI32 0]
# solution revealed in the next episode :)
$ stack run /path/to/keygenme.wasm username *****
Right [VI32 1]
# solution revealed in the last episode!
$ stack run /path/to/keygenme.wasm username *****
Right [VI32 2]

Without the solutions, you will have to take my word that the interpreter works. In the meantime, you can check the test suite :)

Conclusion

You can check this simple interpreter here. You might notice that only a tiny subset of the instruction set is implemented, and that the way branching instructions are implemented is very hacky. But, and that is what is important here, it works for the challenge I targeted. Please do not hesitate to send feedback, especially if you have questions regarding the implementation.

Next time, we will abstract the values type away, and build the symbolic interpreter (almost) without modifying the interpreter functions!