Modologic. Decompiling proprietary Assembler into C++ code

This article describes the creation of a Hex-Rays-like decompiler for an exotic programming language used in the game “Pathologic”. You will learn how cross-compilation works and master the basics of the compilation theory so that you can write such a decompiler yourself.

info

All scripts and other results obtained using this decompiler are available in my repository on GitHub.

A few words about decompilation

I will use the assembly code obtained at the previous stage (see the article Modologic. Dissecting the Pathologic virtual machine). It represents a set of 89 opcodes. In fact, all I have is a set of instructions and file headers.

Decompilation into a higher-level language represents, in fact, cross-compilation. This procedure requires a compiler that takes the source code written in one language as input and converts it into code written in another language. To create such a tool, you have to construct an execution graph for the original code and write a simple emulator for the stack machine. But first things first. My previous experience with Angr described in the article Anger management. Welcome to Angr, a symbolic emulation framework helped me a lot in writing a new decompiler. In fact, I repeated typical solutions used in compilers and code analyzers, and they worked! Kudos to the compilation theory.

Control flow graph

When you analyze some code, the first step is to break it down into basic blocks; each block represents a set of instructions that terminates with a control transfer instruction. Then you can construct a graph consisting of such blocks and reflecting the connections: who transfers control to whom.

The screenshot shows that some blocks end with commands that don’t transfer control. This is because the code section following such a block is used as a jump address: another code (usually a loop) transfers control to the middle of the basic block. In such a case, the block is divided into two blocks so that the second one begins with the address to which control can be transferred.

Therefore, first of all, I go through the entire code and record jump addresses. Whether it’s a CALL or JMP instruction, I write down where do they transfer control. After that, the code can be divided into basic blocks. You go from the first instruction to the last one and divide them by known jump addresses. For this purpose, I introduced special fields in the instruction class: metadata storing addresses and jump type (conditional or unconditional).

After dividing the code into basic blocks and creating an array of BasicNode objects, you can start connecting them (i.e. establishing links: which block does the current one transfer control to). Each basic block contains information about its address (i.e. address of its first instruction) and about jump addresses known to its last instruction.

A graph must have a root: the first instruction that starts the control flow. Headers from the assembler file provide a list of possible entry points. I start recursive analysis from the first basic block; after that, the algorithm goes through all jump addresses and adds links to ‘child’ blocks.

Importantly, I go through each block only once. There is a global seen list: addresses of blocks already seen by the algorithm. They will be ignored during the repeat analysis. This makes it possible to quickly analyze a recursive graph that might include nested loops.

When all links to ‘child’ blocks are established, the graph is ready. This is an analogue of an abstract syntax tree (AST). From now on, my research will be focused on it.

Function parsing

Game’s proprietary language supports equivalents of CALL and RET commands in the regular x86 assembler. However, there are no calling conventions like stdcall. It’s impossible to determine the number of variables passed as function arguments without complete local stack emulation.

That’s why I didn’t cut out the beginnings of functions as separate graph roots. Each call is considered a regular transfer of control, and the body of the called function is part of the body of the calling function. This trick makes it possible to analyze a function in the context of the code that calls it. After the initial analysis, the function can be detached into a separate graph.

Graph construction example

I will demonstrate all analysis phases using a small fire.bin file as an example. Its assembly code looks as follows:

RunOp = 0x6
RunTask = 1
GlobalTasks:
    GTASK_0  Params = 0
        EVENT_5 Op = 0x3 Vars = ()
    GTASK_1  Params = 0
        EVENT_6 Op = 0x26 Vars = ()
0x0: @ Hold()
0x1: Pop(0)
0x2: Return(); Pop(0)
0x3: @ StopGroup0()
0x4: Pop(0)
0x5: Return(); Pop(0)
0x6: PushEmpty(object, object)
0x7: PushEmpty(bool)
0x8: Call 0x2c
0x9: Pop(0)
0xa: Pop(1); Push((bool) Stack[-1] == 0)
0xb: IF (Stack[-1] == 0) GOTO 0x11; Pop(1)
0xc: PushEmpty()
0xd: Push(-0, 0); TaskCall(0)
0xe: Call 0x0
0xf: Pop(-0, 0); TaskReturn
0x10: Pop(0)
0x11: Push("fire")
0x12: @ FindParticleSystem(Stack[-1], Stack[-2])
0x13: Pop(1)
0x14: Pop(0); Push((bool) Stack[-1] == 0)
0x15: IF (Stack[-1] == 0) GOTO 0x1a; Pop(1)
0x16: Push("Can't find fire particle system")
0x17: @ Trace(Stack[-1])
0x18: Pop(1)
0x19: Return(); Pop(2)
0x1a: Push(CVector(0.0, 0.0, 0.0))
0x1b: Push(CVector(0.0, 1.0, 0.0))
0x1c: Push((float)0.0)
0x1d: @@ AddSource(Stack[-3], Stack[-2], Stack[-1])
0x1e: Pop(3)
0x1f: @@ Enable()
0x20: Pop(0)
0x21: @ Hold()
0x22: Pop(0)
0x23: GOTO 0x21
0x24: Return(); Pop(2)
0x25: Stack[-1] = 0
0x26: PushEmpty()
0x27: Push(-0, 0); TaskCall(0)
0x28: Call 0x0
0x29: Pop(-0, 0); TaskReturn
0x2a: Pop(0)
0x2b: Return(); Pop(0)
0x2c: PushEmpty(bool, bool)
0x2d: @ IsLoaded(Stack[-1])
0x2e: Pop(0)
0x2f: Stack[-3] = Stack[-1]
0x30: Return(); Pop(2)

It can be seen from the headers that there are three entry points: 0x3, 0x6, and 0x26. After going through the code, you can notice two addresses where the control is transferred using Call: 0x0 and 0x2C. These are the function entry points. The initial graph looks as follows:

0x3: @ StopGroup0()
0x4: Pop(0)
0x5: Return(); Pop(0)
0x26: PushEmpty()
0x27: Push(-0, 0); TaskCall(0)
0x28: Call 0x0
    0x0: @ Hold()
    0x1: Pop(0)
    0x2: Return(); Pop(0)
    0x29: Pop(-0, 0); TaskReturn
    0x2a: Pop(0)
    0x2b: Return(); Pop(0)
0x6: PushEmpty(object, object)
0x7: PushEmpty(bool)
0x8: Call 0x2c
    0x2c: PushEmpty(bool, bool)
    0x2d: @ IsLoaded(Stack[-1])
    0x2e: Pop(0)
    0x2f: Stack[-3] = Stack[-1]
    0x30: Return(); Pop(2)
    0x9: Pop(0)
    0xa: Pop(1); Push((bool) Stack[-1] == 0)
    0xb: IF (Stack[-1] == 0) GOTO 0x11; Pop(1)
        0xc: PushEmpty()
        0xd: Push(-0, 0); TaskCall(0)
        0xe: Call 0x0
            0x0: @ Hold()
            0x1: Pop(0)
            0x2: Return(); Pop(0)
            0xf: Pop(-0, 0); TaskReturn
            0x10: Pop(0)
                0x11: Push("fire")
                0x12: @ FindParticleSystem(Stack[-1], Stack[-2])
                0x13: Pop(1)
                0x14: Pop(0); Push((bool) Stack[-1] == 0)
                0x15: IF (Stack[-1] == 0) GOTO 0x1a; Pop(1)
                    0x16: Push("Can't find fire particle system")
                    0x17: @ Trace(Stack[-1])
                    0x18: Pop(1)
                    0x19: Return(); Pop(2)
                    0x1a: Push(CVector(0.0, 0.0, 0.0))
                    0x1b: Push(CVector(0.0, 1.0, 0.0))
                    0x1c: Push((float)0.0)
                    0x1d: @@ AddSource(Stack[-3], Stack[-2], Stack[-1])
                    0x1e: Pop(3)
                    0x1f: @@ Enable()
                    0x20: Pop(0)
                        0x21: @ Hold()
                        0x22: Pop(0)
                        0x23: GOTO 0x21

For each entry point, a graph is constructed separately (i.e. there are three graphs in total). The left indentation denotes the beginning of a child block.

The block at 0x26 has two ‘children’: a function call at 0x0 and the next consecutive instruction 0x29. In the initial graph, the body of the called function is attached to the ‘parent’ block, but only once per graph.

The block at 0x9 contains two child blocks: the next consecutive 0xc and a transition to 0x11. Because of this transition, the block (with no control transfer instructions inside it) at 0xf is divided into two blocks. Importantly, the block at 0x11 is concurrently a ‘child’ of the block 0xf as the next consecutive one. But since the rule is “never enter the same block twice”, the graph will be displayed with such a weird nesting order.

Stack simulator

The main problem with decompiled code pertains to relative addresses on the stack. A variable is accessed via an index from the end of the stack, and adding a new variable to the end of the stack breaks old references. The only way to figure out which variable is used where is to go through the execution graph and simulate all operations on the stack.

To do this, I had to elaborate the assembly code decompiler by adding the following information to each instruction: (1) number of variables it takes from the stack; and (2) type of variables it pushes onto the stack.

class CInstructionPop:
    def __init__(self, r):
        self.OpCode = 'Pop'
        self.VarIn = r.uint32()
        self.PopCount = self.VarIn
    def __repr__(self):
        return f'Pop({self.VarIn})'
class CInstructionAdd:
    def __init__(self, r):
        self.OpCode = 'Add'
        self.Var1 = r.uint32()
        self.Var2 = r.uint32()
        self.TaskVar = r.int8() # signed
        self.PopCount = self.TaskVar & 0x3F
        self.PushType = [VAR_TYPE.INT]
    def __repr__(self):
        if self.TaskVar >= 0:
            var_1 = f'Stack[-{self.Var1}]'
        else:
            var_1 = f'Stack[{self.Var1} + StackPtr]'
        if self.TaskVar & 0x40:
            var_2 = f'Stack[{self.Var2} + StackPtr]'
        else:
            var_2 = f'Stack[-{self.Var2}]'
        return f'Pop({self.PopCount}); Push({var_1} + {var_2});'

The stack simulator goes from the top of the graph. Each instruction adds or removes values from the stack. The trick is that I take two stack snapshots for each instruction: before and after the opcode is executed. And then I add them as metadata to the instruction body. At the next stage of analysis, these stack snapshots will help me to identify the variable used in the instruction.

0x6: PushEmpty(object, object)
>before: []
>after: [var_0_object, var_1_object]
0x7: PushEmpty(bool)
>before: [var_0_object, var_1_object]
>after: [var_0_object, var_1_object, var_2_bool]

For instance, at the very beginning of the function 0x6, the stack is empty, but the first command pushes two variables of the object type onto it. The next instruction adds a variable of the bool type.

The stack simulator remembers the number of the last created variable; when a new variable is pushed onto the stack, it assigns the next ordinal number to it. A simple solution involving a global counter turned out to be really useful. Each variable receives a unique number, and its usage can be tracked!

Prior to entering the child block, a temporary copy of the stack is created. It’s passed to it as the current state. This ensures that the first child block cannot affect the state of the stack in the second code block. Such a trick was borrowed from Angr: if there is a conditional transition, the script goes through each branch in sequence. This way, it visits each block exactly once and calculates the current state of the stack for each of them.

0x8: Call 0x2c
>before: [var_0_object, var_1_object, var_2_bool]
>after: [var_0_object, var_1_object, var_2_bool]
    0x2c: PushEmpty(bool, bool)
    >before: [var_0_object, var_1_object, var_2_bool]
    >after: [var_0_object, var_1_object, var_2_bool, var_3_bool, var_4_bool]

A function call receives a copy of the stack from the caller code. This information will later help to determine the number of arguments passed to the function. It can be assumed that a function call doesn’t change the state of the stack. Fortunately, the language design ensures that the caller cleans up arguments of the called function (i.e. the above assumption is correct). Otherwise, I would have to write a fully-functional emulator for the called code.

Passes

The graph consists of blocks, and each block contains a reference to ‘child’ blocks and a list of assembly instructions. The idea is that information about the code in a higher-level language can be added to instructions. In other words, each low-level instruction should have a handler that replaces it with a higher-level instruction.

if isinstance(instr.opcode, CInstructionMovB):
    node.instructions[i].hl = HLInstructionMovB(instr)
if isinstance(instr.opcode, CInstructionPow):
    node.instructions[i].hl = HLInstructionPow(instr)
if isinstance(instr.opcode, CInstructionPop):
    node.instructions[i].hl = HLInstructionNop()

For example, interaction with the stack affects the stack simulator, but it must not be reflected in the C code. Therefore, CInstructionPop is replaced by HLInstructionNop, an object of the high-level NOP (No OPeration) instruction class that won’t be taken into account when C code is composed.

Interaction with the graph is implemented using a set of ‘passes’ (I borrowed this idea from the LLVM architecture). A pass is a function that is iteratively applied to each block in the graph; it changes the block on the fly or collects information about its code. In addition, the called functions should be ‘detached’ into separate graphs.

The first pass replaces instructions with their high-level language analogues. But it includes several special instances.

if isinstance(instr.opcode, CInstructionJump):
    node.instructions[i].hl = HLInstructionJump(instr)
    self.insert_label_list.append(instr.opcode.VarIn)
if isinstance(instr.opcode, CInstructionJumpB):
    node.instructions[i].hl = HLInstructionJumpB(instr)
    self.insert_label_list.append(instr.opcode.VarIn)

Control transfer instructions within the same function jump to a specific address. There are no addresses in C code; so, I add a new instruction to the code: a label with a unique number. On the next pass, the collected information about labels will be converted into new instructions.

def pass_InsertLabels(self, top_node):
    if hasattr(self, 'insert_label_list'):
        for i in self.insert_label_list:
            self.insert_label(i, top_node)
        self.insert_label_list = []

The pass that records variables that are in use requires special attention:

def pass_RecordUsedVars(self, node):
    node.Created = []
    node.Used = []
    for i in node.instructions:
        if hasattr(i, 'hl'):
            if hasattr(i.hl, 'Created'):
                node.Created += i.hl.Created
            if hasattr(i.hl, 'Used'):
                node.Used += i.hl.Used
    node.Created = list(set(node.Created))
    node.Used = list(set(node.Used))

Within each high-level instruction, the following fields containing metadata are collected: (1) what variables it uses; and (2) what variables were created in it.

class HLInstructionPush:
    def __init__(self, instr):
        opcode = instr.opcode
        stack = opcode.stack_snapshot_after
        stack_before = opcode.stack_snapshot_before
        self.VarIn = stack_before[-opcode.VarIn]
        self.VarOut = stack[-1]
        self.Used = [self.VarIn]
        self.Created = [self.VarOut]
    def __repr__(self):
        return f'{self.VarOut} = {self.VarIn};'

The original instruction contains an index for the variable used by it on the stack. This index is compared with the previously obtained stack snapshot to find out which variable was used and write it to the metadata.

Many instructions require information from both stack snapshots (i.e. before and after their execution) because offsets written in the instruction are valid only for the old snapshot; while the output variable created by the instruction is present only in the new snapshot.

Metadata from the Created and Used fields are used to determine input arguments of functions:

def get_func_args(self, top_node):
    Created = []
    Used = []
    for node, i in list(parse_graph(top_node, [])):
        if hasattr(node, 'Created'):
            Created += node.Created
            Used += node.Used
    # Used, but not created
    Args = list(set(Used) - set(Created))
    Args.sort(key=lambda x: x.index)
    return Args

The script collects all variables created and used inside the function body and subtracts the created variables from the used ones. The remaining variables were obviously passed by external code. There is no other way to understand which variables from the stack act as input arguments for a given function.

A separate pass adds a prologue and epilogue to each function (i.e. instructions that will be rendered into the text as func_44(var_2_bool){ ... }). The remaining pass adds left indents for the code.

The final tree printout looks as follows:

task_0_event_5()
{
    StopGroup0(); // 0x3 Func
    return 0; // 0x5 Return
}
task_1_event_6()
{
    TaskCall(0); // 0x27 TaskCall
    func_0(); // 0x28 Call
    TaskReturn(); // 0x29 TaskReturn
    return 0; // 0x2b Return
}
main()
{
    var_0_object = Obj(); var_1_object = Obj(); // 0x6 PushV
    var_2_bool = 0; // 0x7 PushV
    func_44(var_2_bool); // 0x8 Call
    var_5_bool = var_2_bool == 0; // 0xa Not
    if(var_5_bool == 0) goto Label_17; // 0xb JumpB
    TaskCall(0); // 0xd TaskCall
    func_0(); // 0xe Call
    TaskReturn(); // 0xf TaskReturn
Label_17:
    var_6_string = "fire"; // 0x11 PushS
    FindParticleSystem(var_6_string, var_1_object); // 0x12 Func
    var_7_bool = var_1_object == 0; // 0x14 NullEq
    if(var_7_bool == 0) goto Label_26; // 0x15 JumpB
    var_8_string = "Can't find fire particle system"; // 0x16 PushS
    Trace(var_8_string); // 0x17 Func
    return 2; // 0x19 Return
Label_26:
    var_9_cvector = CVector(0.0, 0.0, 0.0); // 0x1a PushVec
    var_10_cvector = CVector(0.0, 1.0, 0.0); // 0x1b PushVec
    var_11_float = 0.0; // 0x1c PushF
    AddSource(var_9_cvector, var_10_cvector, var_11_float); // 0x1d ObjFunc
    Enable(); // 0x1f ObjFunc
Label_33:
    Hold(); // 0x21 Func
    goto Label_33; // 0x23 Jump
}
func_0()
{
    Hold(); // 0x0 Func
    return 0; // 0x2 Return
}
func_44(var_2_bool)
{
    var_3_bool = 0; var_4_bool = 0; // 0x2c PushV
    IsLoaded(var_4_bool); // 0x2d Func
    var_2_bool = var_4_bool; // 0x2f Mov
    return 2; // 0x30 Return
}

For debugging purposes, I specify the address and opcode of the original assembler instruction in comments. Of course, decompilation errors are possible, but the code looks quite adequate (at least, types specified when variables are created correspond to the ways these variables are used).

Code compression

I decompiled all scripts in a similar way and identified several errors. Let’s examine the item_milk.bin script discussed in the previous article:

main()
{
    var_0_bool = 0; var_1_string = ""; var_2_float = 0; var_3_float = 0; var_4_float = 0; // 0x0 PushV
    var_1_string = "hunger"; // 0x1 MovS
    var_2_float = -0.17; // 0x2 MovF
    var_3_float = 0; // 0x3 MovI
    var_4_float = 1; // 0x4 MovI
    func_8(var_0_bool, var_1_string, var_2_float, var_3_float, var_4_float); // 0x5 Call
    return 0; // 0x7 Return
}
func_8(var_0_bool, var_1_string, var_2_float, var_3_float, var_4_float)
{
    var_5_bool = 0; var_6_float = 0; var_7_bool = 0; var_8_float = 0; // 0x8 PushV
    HasProperty(var_1_string, var_7_bool); // 0x9 Func
    var_9_bool = var_7_bool == 0; // 0xb Not
    if(var_9_bool == 0) goto Label_15; // 0xc JumpB
    var_0_bool = 0; // 0xd MovB
    return 4; // 0xe Return
Label_15:
    GetProperty(var_1_string, var_8_float); // 0xf Func
    var_10_float = 0; var_11_float = 0; var_12_float = 0; var_13_float = 0; // 0x11 PushV
    var_11_float = var_8_float + var_2_float; // 0x12 Add2
    var_12_float = var_3_float; // 0x13 Mov
    var_13_float = var_4_float; // 0x14 Mov
    func_27(var_10_float, var_11_float, var_12_float, var_13_float); // 0x15 Call
    SetProperty(var_1_string, var_10_float); // 0x17 Func
    var_0_bool = 1; // 0x19 MovB
    return 4; // 0x1a Return
}
func_27(var_10_float, var_11_float, var_12_float, var_13_float)
{
    var_14_bool = var_11_float < var_12_float; // 0x1c LT
    if(var_14_bool == 0) goto Label_32; // 0x1d JumpB
    var_10_float = var_12_float; // 0x1e Mov
    return 0; // 0x1f Return
Label_32:
    var_15_bool = var_11_float > var_13_float; // 0x20 GT
    if(var_15_bool == 0) goto Label_36; // 0x21 JumpB
    var_10_float = var_13_float; // 0x22 Mov
    return 0; // 0x23 Return
Label_36:
    var_10_float = var_11_float; // 0x24 Mov
    return 0; // 0x25 Return
}

Functions get names based on the address of their first instruction. In the original code, these functions had meaningful names, but they were lost in the course of compilation. The script reconstruction process resembles working in IDA Pro: the analyst adds names of variables and functions. Apparently, decompilation cannot be performed in a different way.

main()
{
    var_0_bool = 0;
    func_8(var_0_bool, "hunger", -0.17, 0, 1);
    return;
}
func_8(var_0_bool, var_1_string, var_2_float, var_3_float, var_4_float)
{
    var_7_bool = 0; var_8_float = 0;
    HasProperty(var_1_string, var_7_bool);
    if(var_7_bool != 0) goto Label_15;
    var_0_bool = 0;
    return;
Label_15:
    GetProperty(var_1_string, var_8_float);
    var_10_float = 0;
    func_27(var_10_float, var_8_float + var_2_float, var_3_float, var_4_float);
    SetProperty(var_1_string, var_10_float);
    var_0_bool = 1;
    return;
}
func_27(var_10_float, var_11_float, var_12_float, var_13_float)
{
    if(var_11_float >= var_12_float) goto Label_32;
    var_10_float = var_12_float;
    return;
Label_32:
    if(var_11_float <= var_13_float) goto Label_36;
    var_10_float = var_13_float;
    return;
Label_36:
    var_10_float = var_11_float;
    return;
}

The code can be significantly simplified by moving definitions of variables to the places where these variables are used. This process can be automated using RD analysis, but the problem is that I don’t have definitions of the APIs used (e.g. SetProperty). Without function prototypes, it’s impossible to say which arguments a function uses by value and which ones, by reference. For example, HasProperty takes the first string argument as a value, and writes the return value by reference to the second one. But let’s keep code compression for later…

Conclusions

Writing a cross-compiler turned out to be a difficult, but quite feasible task. The next step could be the inverse challenge: create a compiler that translates C code into assembly bytecode. Yes, decompilation isn’t ideal, but tons of scripts suitable for data mining are already available.

See you soon!

info

A few words about decompilation

Control flow graph

Function parsing

Graph construction example

Stack simulator

Passes

Code compression

Conclusions

01.06.2022 — First contact. Attacks on chip-based cards

20.04.2023 — Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

09.02.2022 — First contact: An introduction to credit card security

07.07.2023 — VERY bad flash drive. BadUSB attack in detail

01.06.2022 — F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

03.03.2023 — Infiltration and exfiltration. Data transmission techniques used in pentesting

12.01.2022 — Post-quantum VPN. Understanding quantum computers and installing OpenVPN to protect them against future threats

21.02.2023 — SIGMAlarity jump. How to use Sigma rules in Timesketch

01.06.2022 — Routing nightmare. How to pentest OSPF and EIGRP dynamic routing protocols

09.02.2022 — Kernel exploitation for newbies: from compilation to privilege escalation

01.06.2022 —
First contact. Attacks on chip-based cards

20.04.2023 —
Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

09.02.2022 —
First contact: An introduction to credit card security

07.07.2023 —
VERY bad flash drive. BadUSB attack in detail

01.06.2022 —
F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

03.03.2023 —
Infiltration and exfiltration. Data transmission techniques used in pentesting

12.01.2022 —
Post-quantum VPN. Understanding quantum computers and installing OpenVPN to protect them against future threats

21.02.2023 —
SIGMAlarity jump. How to use Sigma rules in Timesketch

01.06.2022 —
Routing nightmare. How to pentest OSPF and EIGRP dynamic routing protocols

09.02.2022 —
Kernel exploitation for newbies: from compilation to privilege escalation