Anger management. Welcome to Angr, a symbolic emulation framework

Angr is an unbelievably powerful emulator. This crossplatform tool supports all most popular architectures; using it, you can search for vulnerabilities both in PE32 on Linux and in router firmware on Windows. Let’s examine this binary analysis framework in more detail using Linux as an example.

A symbolic emulator makes it possible to reverse the direction of searches for security holes. For instance, the AFL brute-force fuzzer tries all possible input data. By contrast, Angr, goes through all possible execution paths and recreates the input data required to reach the code section under investigation.

As an example, let’s examine the following piece of code:

#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[])
{
    if (argc == 2)
    {
        if (strcmp(argv[1], "secret") == 0)
        {
            printf("You did it!\n");
        }
        else
        {
            printf("Better luck next time\n");
        }
    }
}

A complete enumeration of six characters would take 256⁶ attempts. But for Angr, there is only one fork: go left or go right. And it goes in both directions! All you have to do is tell it when to stop and compute the input data.

import sys
import angr
import claripy
project = angr.Project('get_pass.bin')
arg = claripy.BVS('arg', 8*10)
initial_state = project.factory.entry_state(args=['./a.out', arg])
initial_state.options.add('SYMBOL_FILL_UNCONSTRAINED_MEMORY')
def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return b'You did it' in stdout_output
simulation = project.factory.simgr(initial_state)
simulation.explore(find=is_successful)
if simulation.found:
    solution_state = simulation.found[0]
    solution = solution_state.solver.eval(arg, cast_to=bytes).decode()
    print('Password:', solution)

You run the script and get the desired password in a second.

`$ python angr_get_pass.py`
`Password: secret`

For more details, you can review the documentation; this article just explains what you need to get started.

First, install Angr (Python 3.8 or later is required):

pip install angr

Angr operates in interactive mode using console. In many situations, the documentation doesn’t cover all its functionality, and if you really want to master this tool, spend some time on fiddling around with it. For each item, you can get interactive help from docstring using the help(project) (or project? if you use iPython) command.

Help is navigated in the same way as man: you move using the arrow keys and exit with q.

Class fields and methods can be handily viewed using autocompletion: you add a period to the class name and press Tab several times.

Basic concepts

Everything begins with the Project class; you create it to start interacting with Angr. The project is responsible for file load and primary analysis.

You pass the path to the file under investigation and create a new state using factory.entry_state to start simulation. The factory creates instances of the main classes. The state is a SimState object, essentially a snapshot of a virtual machine. It contains a block of code, memory, registers, call stack and other elements implemented as plugins to SimState.

initial_state.regs.rip
<BV64 0x401080>

Data used in the simulation are stored in the bitvector bit array. It’s limited in length and can be overflowed (i.e. it behaves like a processor register). Its size is always specified in bits.

There are two main types: BVV (bit-vector value) and BVS (bit-vector symbol). The first type represents specific numerical values. The second one contains only name and size. This is the basis for symbolic execution, and there is no specific value there.

Basic Blocks

Each state is associated with a specific block of code. A local basic block is a set of instructions that ends with a control transfer instruction. Each emulation step takes you to the next block, thus, creating a new state.

The simulation manager operates only with block addresses, not with code contained in these blocks. This must be taken into account; otherwise, the manager won’t be able to find the desired state.

initial_state.block().pp()
       _start:
401080  endbr64
401084  xor     ebp, ebp
401086  mov     r9, rdx
401089  pop     rsi
40108a  mov     rdx, rsp
40108d  and     rsp, 0xfffffffffffffff0
401091  push    rax
401092  push    rsp
401093  xor     r8d, r8d
401096  xor     ecx, ecx
401098  lea     rdi, [main]
40109f  call    qword ptr [0x403fd8]

This is a disassembled basic block. Its assembler instructions are used to create intermediate VEX code. It can be said that this code is generated for the virtual processor used by analyzers.

initial_state.block().vex.pp()
IRSB {
  00 | ------ IMark(0x401080, 4, 0) ------
  01 | ------ IMark(0x401084, 2, 0) ------
  02 | PUT(rbp) = 0x0000000000000000
  03 | ------ IMark(0x401086, 3, 0) ------
  04 | t30 = GET:I64(rdx)
  05 | PUT(r9) = t30
  06 | PUT(rip) = 0x0000000000401089
  07 | ------ IMark(0x401089, 1, 0) ------
  08 | t4 = GET:I64(rsp)
  09 | t3 = LDle:I64(t4)
  10 | t31 = Add64(t4,0x0000000000000008)
  11 | PUT(rsi) = t3
  12 | ------ IMark(0x40108a, 3, 0) ------
  13 | PUT(rdx) = t31
  14 | ------ IMark(0x40108d, 4, 0) ------
  15 | t5 = And64(t31,0xfffffffffffffff0)
  16 | PUT(rip) = 0x0000000000401091
  17 | ------ IMark(0x401091, 1, 0) ------
  18 | t8 = GET:I64(rax)
  19 | t33 = Sub64(t5,0x0000000000000008)
  20 | PUT(rsp) = t33
  21 | STle(t33) = t8
  22 | PUT(rip) = 0x0000000000401092
  23 | ------ IMark(0x401092, 1, 0) ------
  24 | t35 = Sub64(t33,0x0000000000000008)
  25 | PUT(rsp) = t35
  26 | STle(t35) = t33
  27 | ------ IMark(0x401093, 3, 0) ------
  28 | PUT(r8) = 0x0000000000000000
  29 | ------ IMark(0x401096, 2, 0) ------
  30 | PUT(cc_op) = 0x0000000000000013
  31 | PUT(cc_dep1) = 0x0000000000000000
  32 | PUT(cc_dep2) = 0x0000000000000000
  33 | PUT(rcx) = 0x0000000000000000
  34 | ------ IMark(0x401098, 7, 0) ------
  35 | PUT(rdi) = 0x0000000000401169
  36 | PUT(rip) = 0x000000000040109f
  37 | ------ IMark(0x40109f, 6, 0) ------
  38 | t21 = LDle:I64(0x0000000000403fd8)
  39 | t51 = Sub64(t35,0x0000000000000008)
  40 | PUT(rsp) = t51
  41 | STle(t51) = 0x00000000004010a5
  42 | t53 = Sub64(t51,0x0000000000000080)
  43 | ====== AbiHint(0xt53, 128, t21) ======
  NEXT: PUT(rip) = t21; Ijk_Call
}

The intermediate code is represented in the Intermediate Representation Super-Block. Let’s examine its contents:

IMark indicates the size and address of the original instruction;
PUT writes a value to a register;
GET:I64 receives the register value; the postfix indicates its size in bits;
t30 is a virtual variable (short for temp) and its number;
LDle – Load in little-endian. Gets a value at the specified address;
STle – Store in little-endian. Writes a value to the specified address;
Add64 – addition operation;
Sub64 – subtraction operation;
And64 – logical AND operation; and
NEXT represents the transition to the next block and its type Ijk_Call.

The code is optimized, and the rip instruction counter doesn’t change with each instruction.

Instead of storing EFLAGS, the intermediate code records:

cc_op – code of operation performed with flags; and
cc_dep1, cc_dep2 and cc_ndep – auxiliary values.

Based on the source code, VEX, 0x13 is X86G_CC_OP_DECB (i.e. a decrement of one byte). Specific flag values are calculated as the need arises.

Simulation control

You can control the simulation manually, which is convenient for debugging.

>>> next = initial_state.step()
>>> type(next)
<class 'angr.engines.successors.SimSuccessors'>

>>> next.successors
[<SimState @ 0x529dc0>]

The initial_state.step() function returns a SimSuccessors object that contains a list of states that occur after the emulation of the current block.

If a fork (i.e. conditional jump) occurs at the end of a block, two new states are created. This happens when the choice of path depends on a symbolic variable. If a jump depends only on known values, only one option is available (which is equivalent to an unconditional jump).

The simulation manager controls the code execution. The SimulationManager object is created in project.factory.simgr. In fact, this is a wrapper that runs step() in a loop. It decides in which branches to continue the emulation and in which branches to stop it.

In the course of execution, all states are stored in special ‘stashes’:

simulation.stashes
defaultdict(list,
{'active': [],
'stashed': [<SimState @ 0x401080>],
'pruned': [],
'unsat': [],
'errored': [],
'deadended': [],
'unconstrained': []})

Stashes is a dictionary consisting of regular Python lists. Before the start, states are stored in stashed; during the simulation, in active; and after the code execution, in deadended. This list isn’t complete, and you can create new stashes for your needs.

When you start simulation using simulation.explore, the following arguments can be passed:

find takes the address of the desired state or a function and returns Yes or No (in response to the question whether the desired state/function has been found); and
avoid works in a similar way, but takes addresses to be avoided.

The arguments below belong to simulation.step, but can be passed to simulation.explore. This isn’t explicitly stated in the documentation; so, I suggest reviewing the source code.

step_func is called after each step (useful for debugging);
filter_func specifies the stash to put the found state into; and
selector_func asks whether to execute the current state.

Simulation ends when the state from find is found or the code from all branches has been executed. Then simulation.explore returns control. If the desired state has been found, it’s placed into the found stash.

For clarity, let’s slightly alter the code:

def next_step(simulation):
    print({k: v for k, v in simulation.stashes.items() if v})
    return simulation
simulation = project.factory.simgr(initial_state)
simulation.explore(find=is_successful, step_func=next_step)

Now you can see the state at each step:

{'active': [<SimState @ 0x529dc0>]}
{'active': [<SimState @ 0x401169>]}
{'active': [<SimState @ 0x401182>]}
{'active': [<SimState @ 0x401070>]}
{'active': [<SimState @ 0x5a82e0>]}
{'active': [<SimState @ 0x40119f>]}
{'active': [<SimState @ 0x4011a3>, <SimState @ 0x4011b4>]}
{'active': [<SimState @ 0x401060>, <SimState @ 0x401060>]}
{'active': [<SimState @ 0x580e50>, <SimState @ 0x580e50>]}
{'active': [<SimState @ 0x4011b2>, <SimState @ 0x4011c3>]}
{'active': [<SimState @ 0x4011d4>], 'found': [<SimState @ 0x4011b2>]}

After reaching the fork, the emulator went both ways. And it finished its job after filling the found stash.

Automatic path selection

The ‘normal’ explore goes ‘broadwise’ by placing all forks to the stash for simultaneous execution. However, loops and forks create many possible routes; as a result, the number of potential paths can increase exponentially. To solve this problem, you can use exploration techniques: search strategies that decide where to go and in what order.

simulation.use_technique(angr.exploration_techniques.DFS())

The Depth first search technique deals with only one state at a time and places the rest of them to a queue. The deferred stash acts as such a queue.

simulation.use_technique(angr.exploration_techniques.LengthLimiter(10))

This search technique limits the maximum emulation depth for each path (i.e. performs a fast, but shallow emulation).

There are plenty of ready-made techniques, but they are poorly documented; so, check the original scripts for more info.

Symbolic execution

Symbolic variables are computed using the Z3 library. As you remember, symbolic variables don’t contain specific values. The emulator only knows their size and constraints imposed in the course of execution. For instance, let’s create a 64-bit variable called x and add the following constraint: x + 2 == 5.

x = state.solver.BVS('x', 64)
state.solver.add(x + 2 == 5)
state.solver.eval(x)

If you ask the program to calculate a feasible solution, you’ll get 0x3. Similar constraints are imposed on symbolic variables after each block. As an example, let’s examine simple branching.

void test(int var)
{
    if(var > 5)
    {
        printf("A");
    }
    else
    {
        printf("B");
    }
}

After finding the address in the debugger, you emulate the beginning of this function. The argument is passed using EDI substitution, and a 32-bit symbolic variable is placed to it.

state = project.factory.blank_state(addr=0x401149)
state.regs.edi = claripy.BVS('func_arg', 32)
state_left, state_right = state.step().successors
print('Left:', state_left.solver.constraints)
print('Right:', state_right.solver.constraints)

There is a constraint: to go left, the variable must be less than or equal to five.

`Left: [<Bool (0x0 .. func_arg_49_8) <=s 0x5>]`
`Right: [<Bool (0x0 .. func_arg_49_8) >s 0x5>]`

For comparison, let’s try to pass a specific value.

state.regs.edi = claripy.BVV(1337, 32)
print(state.step().successors)
print('Constraints:', state.step().successors[0].solver.constraints)

No doubt, 1337 is greater than 5. In such a case, there is only one possible choice without any constraints.

`[<SimState @ 0x40115e>]`
`Constraints: []`

Symbolic functions also impose certain constraints, but sometimes it’s difficult to comprehend them.

>>> solution_state.solver.constraints

Suffice it to know that this is a mathematical formula, and a possible solution has to be found for it.

Symbolic functions

So, Angr emulates the application code, but what about external modules? The emulator uses hooks to substitute calls of library functions with Python SimProcedures stubs simulating the execution of the required functions. The same thing happens in case of a syscall. If the required stub cannot be found, the standard ReturnUnconstrained stub is executed.

By default, all external functions are intercepted. Angr’s behavior can be configured when you create a new project. If necessary, you can emulate individual functions or all external code, but this adversely affects performance.

Let’s examine the list of set hooks.

>>> for addr, proc in project._sim_procedures.items():
>>>     print(hex(addr), proc.display_name)
0x529dc0 __libc_start_main
0x580e50 puts
0x5a82e0 strcmp
0x8181d0 __tls_get_addr
0x900000 LinuxLoader
# (...)

Ready-made stubs inherit the SimProcedure class; see the original scripts for examples.

class puts(angr.SimProcedure):
    def run(self, string):
        stdout = self.state.posix.get_fd(1)
        if stdout is None:
            return -1
        strlen = angr.SIM_PROCEDURES["libc"]["strlen"]
        length = self.inline_call(strlen, string).ret_expr
        out = stdout.write(string, length)
        stdout.write_data(self.state.solver.BVV(b"\n"))
        return (out + 1)[31:0]

Hooks can be set by specifying the address:

project.hook(0x401000, my_stub())

Or the symbol name:

project.hook_symbol('fgets', my_fgets())

In Angr, symbols tie names to specific addresses. They are created when the executable file is loaded.

print(project.loader.find_symbol('printf'))

<Symbol "printf" in libc.so.6 at 0x5606f0>

Out of curiosity, you can list them all:

for item in project.loader.symbols: print(item)

For example, those listed below were taken from debug symbols of an ELF file:

<Symbol "_start" in get_pass at 0x401080>
<Symbol "main" in get_pass at 0x401169>
<Symbol "_fini" in get_pass at 0x4011dc>

Serial number recovery

To ensure a fair game, let’s take the real crackme for Windows.

$ wine SerialGen.exe

"SerialGen" crackme by DosX

USERNAME >>> user
SERIAL KEY >>> password

[-] Wrong serial. License is expired :(

The challenge is to generate the serial number for a specific username. After reconstructing the pseudocode in Binary Ninja, you can see the buffer size and input method.

int32_t var_c = 1;
printf("\n    "SerialGen" crackme by Dos…");
char* eax = malloc(0x20);
fgets(eax, 0x20, _iob);
eax[(strlen(eax) - var_c)] = 0;
if (strlen(eax) != i_1)
{
    printf(" SERIAL KEY >>> ");
    char* eax_7 = malloc(0x20);
    fgets(eax_7, 0x20, _iob);
    eax_7[(strlen(eax_7) - var_c)] = 0;

The fgets function takes both strings to a buffer 0x20 bytes in length. It computes the string size and writes a zero byte at the end (i.e. at the address computed as BEGINNING + SIZE). The real fgets returns a string with two bytes at the end: a line break character and a zero. Remember addresses of the blocks that display the success and failure messages.

import sys
import angr
username = 'xakep.ru'
path_to_binary = "./SerialGen.exe"
project = angr.Project(path_to_binary)
initial_state = project.factory.entry_state(
    add_options={
        angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
        angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS,
    }
)
old_hook = project.symbol_hooked_by('fgets')
class FgetsHook(angr.SimProcedure):
    def run(self, str_addr, count, stream):
        self.state.memory.store(str_addr, username.encode() + b'\n\x00')
        self.project.hook_symbol('fgets', old_hook, replace=True)
project.hook_symbol('fgets', FgetsHook(), replace=True)
simulation = project.factory.simgr(initial_state)
simulation.explore(find=0x004012e3, avoid=0x004012fa)
if simulation.found:
    solution_state = simulation.found[0]
    solution = solution_state.posix.dumps(sys.stdin.fileno()).decode()
    print(username, solution.upper())

Symbolic variables aren’t explicitly defined; by default, the standard input stream is implemented as SimPacketsStream. It can be treated as a set of variables created as the need arises.

`>>> solution_state.posix.stdin.content`
`[(<BV248 packet_0_stdin_4_248>, <BV32 0x1f>)]`

To pass the username, replace the original fgets hook with your own one that returns the desired string bypassing STDIN. As soon as your hook is triggered, you restore the old one. It’s more convenient to substitute function results than the input stream. As soon as the simulation manager finds a block whose address is 0x004012e3, you specify the STDIN value using posix.dumps.

`$ python crackme.py`
`xakep.ru SERIAL-00E19-01147-03D9A-11620`
`$ wine SerialGen.exe`
`    "SerialGen" crackme by DosX`
`USERNAME   >>> xakep.ru`
`SERIAL KEY >>> SERIAL-00E19-01147-03D9A-11620`
`[+] Welcome, xakep.ru! :)`

The solution seems to be simple, but I had to spend plenty of time on debugging. First, I didn’t immediately notice that fgets preserves the line break. Second, sprintf from Angr returns a lowercase string; while sprintf from Wine, an uppercase string. After checking the code in Windows, you can see that Wine was right. I have already reported this bug. As you can see, even the original Angr stubs can contain errors.

Conclusions

Angr is an amazingly powerful tool, but symbolic execution has its limitations. I managed to ‘crack’ the CRC32 key in a couple of minutes, but the formula used to recover data from an MD5 hash turned out to be beyond the Z3 capacity;

Of course, Angr cannot simulate large applications. But if you find a weak place in a program, it will help you to find the ‘right’ input data required for fuzzing.

Now you are aware of the basic Angr functionality. In the next article, I will provide practical examples to give an insight into vulnerability identification and exploitation.

2023.03.26 — Poisonous spuds. Privilege escalation in AD with RemotePotato0

This article discusses different variations of the NTLM Relay cross-protocol attack delivered using the RemotePotato0 exploit. In addition, you will learn how to hide the signature of an…

Full article →

2022.02.09 — Kernel exploitation for newbies: from compilation to privilege escalation

Theory is nothing without practice. Today, I will explain the nature of Linux kernel vulnerabilities and will shown how to exploit them. Get ready for an exciting journey:…

Full article →

2022.04.04 — Elephants and their vulnerabilities. Most epic CVEs in PostgreSQL

Once a quarter, PostgreSQL publishes minor releases containing vulnerabilities. Sometimes, such bugs make it possible to make an unprivileged user a local king superuser. To fix them,…

Full article →

2023.02.21 — Pivoting District: GRE Pivoting over network equipment

Too bad, security admins often don't pay due attention to network equipment, which enables malefactors to hack such devices and gain control over them. What…

Full article →

2022.06.01 — Log4HELL! Everything you must know about Log4Shell

Up until recently, just a few people (aside from specialists) were aware of the Log4j logging utility. However, a vulnerability found in this library attracted to it…

Full article →

2022.06.03 — Playful Xamarin. Researching and hacking a C# mobile app

Java or Kotlin are not the only languages you can use to create apps for Android. C# programmers can develop mobile apps using the Xamarin open-source…

Full article →

2022.01.13 — Bug in Laravel. Disassembling an exploit that allows RCE in a popular PHP framework

Bad news: the Ignition library shipped with the Laravel PHP web framework contains a vulnerability. The bug enables unauthorized users to execute arbitrary code. This article examines…

Full article →

2022.06.01 — WinAFL in practice. Using fuzzer to identify security holes in software

WinAFL is a fork of the renowned AFL fuzzer developed to fuzz closed-source programs on Windows systems. All aspects of WinAFL operation are described in the official documentation,…

Full article →

2022.01.11 — Pentest in your own way. How to create a new testing methodology using OSCP and Hack The Box machines

Each aspiring pentester or information security enthusiast wants to advance at some point from reading exciting write-ups to practical tasks. How to do this in the best way…

Full article →

2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…

Full article →