
A symbolic emulator makes it possible to reverse the direction of searches for security holes. For instance, the AFL brute-force fuzzer tries all possible input data. By contrast, Angr, goes through all possible execution paths and recreates the input data required to reach the code section under investigation.
As an example, let’s examine the following piece of code:
#include <stdio.h>#include <string.h>int main(int argc, char* argv[]){ if (argc == 2) { if (strcmp(argv[1], "secret") == 0) { printf("You did it!\n"); } else { printf("Better luck next time\n"); } }}
A complete enumeration of six characters would take 2566 attempts. But for Angr, there is only one fork: go left or go right. And it goes in both directions! All you have to do is tell it when to stop and compute the input data.
import sysimport angrimport claripyproject = angr.Project('get_pass.bin')arg = claripy.BVS('arg', 8*10)initial_state = project.factory.entry_state(args=['./a.out', arg])initial_state.options.add('SYMBOL_FILL_UNCONSTRAINED_MEMORY')def is_successful(state): stdout_output = state.posix.dumps(sys.stdout.fileno()) return b'You did it' in stdout_outputsimulation = project.factory.simgr(initial_state)simulation.explore(find=is_successful)if simulation.found: solution_state = simulation.found[0] solution = solution_state.solver.eval(arg, cast_to=bytes).decode() print('Password:', solution)
You run the script and get the desired password in a second.
`$ python angr_get_pass.py``Password: secret`
For more details, you can review the documentation; this article just explains what you need to get started.
First, install Angr (Python 3.8 or later is required):
pip install angr
Angr operates in interactive mode using console. In many situations, the documentation doesn’t cover all its functionality, and if you really want to master this tool, spend some time on fiddling around with it. For each item, you can get interactive help from docstring
using the help(
(or project?
if you use iPython) command.
Help is navigated in the same way as man
: you move using the arrow keys and exit with q
.
Class fields and methods can be handily viewed using autocompletion: you add a period to the class name and press Tab several times.
Basic concepts
Everything begins with the Project
class; you create it to start interacting with Angr. The project is responsible for file load and primary analysis.
You pass the path to the file under investigation and create a new state using factory.
to start simulation. The factory creates instances of the main classes. The state is a SimState
object, essentially a snapshot of a virtual machine. It contains a block of code, memory, registers, call stack and other elements implemented as plugins to SimState
.
initial_state.regs.rip
<BV64 0x401080>
Data used in the simulation are stored in the bitvector
bit array. It’s limited in length and can be overflowed (i.e. it behaves like a processor register). Its size is always specified in bits.
There are two main types: BVV
(bit-vector value) and BVS
(bit-vector symbol). The first type represents specific numerical values. The second one contains only name and size. This is the basis for symbolic execution, and there is no specific value there.
Basic Blocks
Each state is associated with a specific block of code. A local basic block is a set of instructions that ends with a control transfer instruction. Each emulation step takes you to the next block, thus, creating a new state.
The simulation manager operates only with block addresses, not with code contained in these blocks. This must be taken into account; otherwise, the manager won’t be able to find the desired state.
initial_state.block().pp()
_start:
401080 endbr64
401084 xor ebp, ebp
401086 mov r9, rdx
401089 pop rsi
40108a mov rdx, rsp
40108d and rsp, 0xfffffffffffffff0
401091 push rax
401092 push rsp
401093 xor r8d, r8d
401096 xor ecx, ecx
401098 lea rdi, [main]
40109f call qword ptr [0x403fd8]
This is a disassembled basic block. Its assembler instructions are used to create intermediate VEX code. It can be said that this code is generated for the virtual processor used by analyzers.
initial_state.block().vex.pp()
IRSB {
00 | ------ IMark(0x401080, 4, 0) ------
01 | ------ IMark(0x401084, 2, 0) ------
02 | PUT(rbp) = 0x0000000000000000
03 | ------ IMark(0x401086, 3, 0) ------
04 | t30 = GET:I64(rdx)
05 | PUT(r9) = t30
06 | PUT(rip) = 0x0000000000401089
07 | ------ IMark(0x401089, 1, 0) ------
08 | t4 = GET:I64(rsp)
09 | t3 = LDle:I64(t4)
10 | t31 = Add64(t4,0x0000000000000008)
11 | PUT(rsi) = t3
12 | ------ IMark(0x40108a, 3, 0) ------
13 | PUT(rdx) = t31
14 | ------ IMark(0x40108d, 4, 0) ------
15 | t5 = And64(t31,0xfffffffffffffff0)
16 | PUT(rip) = 0x0000000000401091
17 | ------ IMark(0x401091, 1, 0) ------
18 | t8 = GET:I64(rax)
19 | t33 = Sub64(t5,0x0000000000000008)
20 | PUT(rsp) = t33
21 | STle(t33) = t8
22 | PUT(rip) = 0x0000000000401092
23 | ------ IMark(0x401092, 1, 0) ------
24 | t35 = Sub64(t33,0x0000000000000008)
25 | PUT(rsp) = t35
26 | STle(t35) = t33
27 | ------ IMark(0x401093, 3, 0) ------
28 | PUT(r8) = 0x0000000000000000
29 | ------ IMark(0x401096, 2, 0) ------
30 | PUT(cc_op) = 0x0000000000000013
31 | PUT(cc_dep1) = 0x0000000000000000
32 | PUT(cc_dep2) = 0x0000000000000000
33 | PUT(rcx) = 0x0000000000000000
34 | ------ IMark(0x401098, 7, 0) ------
35 | PUT(rdi) = 0x0000000000401169
36 | PUT(rip) = 0x000000000040109f
37 | ------ IMark(0x40109f, 6, 0) ------
38 | t21 = LDle:I64(0x0000000000403fd8)
39 | t51 = Sub64(t35,0x0000000000000008)
40 | PUT(rsp) = t51
41 | STle(t51) = 0x00000000004010a5
42 | t53 = Sub64(t51,0x0000000000000080)
43 | ====== AbiHint(0xt53, 128, t21) ======
NEXT: PUT(rip) = t21; Ijk_Call
}
The intermediate code is represented in the Intermediate Representation Super-Block. Let’s examine its contents:
-
IMark
indicates the size and address of the original instruction; -
PUT
writes a value to a register; -
GET:
receives the register value; the postfix indicates its size in bits;I64 -
t30
is a virtual variable (short fortemp
) and its number; -
LDle
– Load in little-endian. Gets a value at the specified address; -
STle
– Store in little-endian. Writes a value to the specified address; -
Add64
– addition operation; -
Sub64
– subtraction operation; -
And64
– logical AND operation; and -
NEXT
represents the transition to the next block and its typeIjk_Call
.
The code is optimized, and the rip
instruction counter doesn’t change with each instruction.
Instead of storing EFLAGS
, the intermediate code records:
-
cc_op
– code of operation performed with flags; and -
cc_dep1
,cc_dep2
andcc_ndep
– auxiliary values.
Based on the source code, VEX, 0x13
is X86G_CC_OP_DECB
(i.e. a decrement of one byte). Specific flag values are calculated as the need arises.
Simulation control
You can control the simulation manually, which is convenient for debugging.
>>>
>>>
<
>>>
[<
The initial_state.
function returns a SimSuccessors
object that contains a list of states that occur after the emulation of the current block.
If a fork (i.e. conditional jump) occurs at the end of a block, two new states are created. This happens when the choice of path depends on a symbolic variable. If a jump depends only on known values, only one option is available (which is equivalent to an unconditional jump).
The simulation manager controls the code execution. The SimulationManager
object is created in project.
. In fact, this is a wrapper that runs step(
in a loop. It decides in which branches to continue the emulation and in which branches to stop it.
In the course of execution, all states are stored in special ‘stashes’:
simulation.stashes
defaultdict(list,
{'active': [],
'stashed': [<SimState @ 0x401080>],
'pruned': [],
'unsat': [],
'errored': [],
'deadended': [],
'unconstrained': []})
Stashes is a dictionary consisting of regular Python lists. Before the start, states are stored in stashed
; during the simulation, in active
; and after the code execution, in deadended
. This list isn’t complete, and you can create new stashes for your needs.
When you start simulation using simulation.
, the following arguments can be passed:
-
find
takes the address of the desired state or a function and returns Yes or No (in response to the question whether the desired state/function has been found); and -
avoid
works in a similar way, but takes addresses to be avoided.
The arguments below belong to simulation.
, but can be passed to simulation.
. This isn’t explicitly stated in the documentation; so, I suggest reviewing the source code.
-
step_func
is called after each step (useful for debugging); -
filter_func
specifies the stash to put the found state into; and -
selector_func
asks whether to execute the current state.
Simulation ends when the state from find
is found or the code from all branches has been executed. Then simulation.
returns control. If the desired state has been found, it’s placed into the found
stash.
For clarity, let’s slightly alter the code:
def next_step(simulation): print({k: v for k, v in simulation.stashes.items() if v}) return simulationsimulation = project.factory.simgr(initial_state)simulation.explore(find=is_successful, step_func=next_step)
Now you can see the state at each step:
{'active': [<SimState @ 0x529dc0>]}
{'active': [<SimState @ 0x401169>]}
{'active': [<SimState @ 0x401182>]}
{'active': [<SimState @ 0x401070>]}
{'active': [<SimState @ 0x5a82e0>]}
{'active': [<SimState @ 0x40119f>]}
{'active': [<SimState @ 0x4011a3>, <SimState @ 0x4011b4>]}
{'active': [<SimState @ 0x401060>, <SimState @ 0x401060>]}
{'active': [<SimState @ 0x580e50>, <SimState @ 0x580e50>]}
{'active': [<SimState @ 0x4011b2>, <SimState @ 0x4011c3>]}
{'active': [<SimState @ 0x4011d4>], 'found': [<SimState @ 0x4011b2>]}
After reaching the fork, the emulator went both ways. And it finished its job after filling the found
stash.
Automatic path selection
The ‘normal’ explore
goes ‘broadwise’ by placing all forks to the stash for simultaneous execution. However, loops and forks create many possible routes; as a result, the number of potential paths can increase exponentially. To solve this problem, you can use exploration
: search strategies that decide where to go and in what order.
simulation.use_technique(angr.exploration_techniques.DFS())
The Depth
technique deals with only one state at a time and places the rest of them to a queue. The deferred
stash acts as such a queue.
simulation.use_technique(angr.exploration_techniques.LengthLimiter(10))
This search technique limits the maximum emulation depth for each path (i.e. performs a fast, but shallow emulation).
There are plenty of ready-made techniques, but they are poorly documented; so, check the original scripts for more info.
Symbolic execution
Symbolic variables are computed using the Z3 library. As you remember, symbolic variables don’t contain specific values. The emulator only knows their size and constraints imposed in the course of execution. For instance, let’s create a 64-bit variable called x
and add the following constraint: x
.
x = state.solver.BVS('x', 64)state.solver.add(x + 2 == 5)state.solver.eval(x)
If you ask the program to calculate a feasible solution, you’ll get 0x3. Similar constraints are imposed on symbolic variables after each block. As an example, let’s examine simple branching.
void test(int var){ if(var > 5) { printf("A"); } else { printf("B"); }}
After finding the address in the debugger, you emulate the beginning of this function. The argument is passed using EDI substitution, and a 32-bit symbolic variable is placed to it.
state = project.factory.blank_state(addr=0x401149)state.regs.edi = claripy.BVS('func_arg', 32)state_left, state_right = state.step().successorsprint('Left:', state_left.solver.constraints)print('Right:', state_right.solver.constraints)
There is a constraint: to go left, the variable must be less than or equal to five.
`Left: [<Bool (0x0 .. func_arg_49_8) <=s 0x5>]``Right: [<Bool (0x0 .. func_arg_49_8) >s 0x5>]`
For comparison, let’s try to pass a specific value.
state.regs.edi = claripy.BVV(1337, 32)print(state.step().successors)print('Constraints:', state.step().successors[0].solver.constraints)
No doubt, 1337 is greater than 5. In such a case, there is only one possible choice without any constraints.
`[<SimState @ 0x40115e>]``Constraints: []`
Symbolic functions also impose certain constraints, but sometimes it’s difficult to comprehend them.
>>>
<
Suffice it to know that this is a mathematical formula, and a possible solution has to be found for it.
Symbolic functions
So, Angr emulates the application code, but what about external modules? The emulator uses hooks to substitute calls of library functions with Python SimProcedures
stubs simulating the execution of the required functions. The same thing happens in case of a syscall. If the required stub cannot be found, the standard ReturnUnconstrained
stub is executed.
By default, all external functions are intercepted. Angr’s behavior can be configured when you create a new project. If necessary, you can emulate individual functions or all external code, but this adversely affects performance.
Let’s examine the list of set hooks.
>>> for addr, proc in project._sim_procedures.items():>>> print(hex(addr), proc.display_name)0x529dc0 __libc_start_main0x580e50 puts0x5a82e0 strcmp0x8181d0 __tls_get_addr0x900000 LinuxLoader# (...)
Ready-made stubs inherit the SimProcedure
class; see the original scripts for examples.
class puts(angr.SimProcedure): def run(self, string): stdout = self.state.posix.get_fd(1) if stdout is None: return -1 strlen = angr.SIM_PROCEDURES["libc"]["strlen"] length = self.inline_call(strlen, string).ret_expr out = stdout.write(string, length) stdout.write_data(self.state.solver.BVV(b"\n")) return (out + 1)[31:0]
Hooks can be set by specifying the address:
project.hook(0x401000, my_stub())
Or the symbol name:
project.hook_symbol('fgets', my_fgets())
In Angr, symbols tie names to specific addresses. They are created when the executable file is loaded.
print(
<
Out of curiosity, you can list them all:
for item in project.loader.symbols: print(item)
For example, those listed below were taken from debug symbols of an ELF file:
<Symbol "_start" in get_pass at 0x401080>
<Symbol "main" in get_pass at 0x401169>
<Symbol "_fini" in get_pass at 0x4011dc>
Serial number recovery
To ensure a fair game, let’s take the real crackme for Windows.
$
"SerialGen"
USERNAME >>>
SERIAL
[
The challenge is to generate the serial number for a specific username. After reconstructing the pseudocode in Binary Ninja, you can see the buffer size and input method.
int32_t var_c = 1;printf("\n "SerialGen" crackme by Dos…");char* eax = malloc(0x20);fgets(eax, 0x20, _iob);eax[(strlen(eax) - var_c)] = 0;if (strlen(eax) != i_1){ printf(" SERIAL KEY >>> "); char* eax_7 = malloc(0x20); fgets(eax_7, 0x20, _iob); eax_7[(strlen(eax_7) - var_c)] = 0;
The fgets
function takes both strings to a buffer 0x20 bytes in length. It computes the string size and writes a zero byte at the end (i.e. at the address computed as BEGINNING + SIZE). The real fgets
returns a string with two bytes at the end: a line break character and a zero. Remember addresses of the blocks that display the success and failure messages.
import sysimport angrusername = 'xakep.ru'path_to_binary = "./SerialGen.exe"project = angr.Project(path_to_binary)initial_state = project.factory.entry_state( add_options={ angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY, angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS, })old_hook = project.symbol_hooked_by('fgets')class FgetsHook(angr.SimProcedure): def run(self, str_addr, count, stream): self.state.memory.store(str_addr, username.encode() + b'\n\x00') self.project.hook_symbol('fgets', old_hook, replace=True)project.hook_symbol('fgets', FgetsHook(), replace=True)simulation = project.factory.simgr(initial_state)simulation.explore(find=0x004012e3, avoid=0x004012fa)if simulation.found: solution_state = simulation.found[0] solution = solution_state.posix.dumps(sys.stdin.fileno()).decode() print(username, solution.upper())
Symbolic variables aren’t explicitly defined; by default, the standard input stream is implemented as SimPacketsStream
. It can be treated as a set of variables created as the need arises.
`>>> solution_state.posix.stdin.content``[(<BV248 packet_0_stdin_4_248>, <BV32 0x1f>)]`
To pass the username, replace the original fgets
hook with your own one that returns the desired string bypassing STDIN. As soon as your hook is triggered, you restore the old one. It’s more convenient to substitute function results than the input stream. As soon as the simulation manager finds a block whose address is 0x004012e3
, you specify the STDIN value using posix.
.
`$ python crackme.py``xakep.ru SERIAL-00E19-01147-03D9A-11620``$ wine SerialGen.exe`` "SerialGen" crackme by DosX``USERNAME >>> xakep.ru``SERIAL KEY >>> SERIAL-00E19-01147-03D9A-11620``[+] Welcome, xakep.ru! :)`
The solution seems to be simple, but I had to spend plenty of time on debugging. First, I didn’t immediately notice that fgets
preserves the line break. Second, sprintf
from Angr returns a lowercase string; while sprintf
from Wine, an uppercase string. After checking the code in Windows, you can see that Wine was right. I have already reported this bug. As you can see, even the original Angr stubs can contain errors.
Conclusions
Angr is an amazingly powerful tool, but symbolic execution has its limitations. I managed to ‘crack’ the CRC32 key in a couple of minutes, but the formula used to recover data from an MD5 hash turned out to be beyond the Z3 capacity;
Of course, Angr cannot simulate large applications. But if you find a weak place in a program, it will help you to find the ‘right’ input data required for fuzzing.
Now you are aware of the basic Angr functionality. In the next article, I will provide practical examples to give an insight into vulnerability identification and exploitation.

2022.01.12 — First contact. Attacks against contactless cards
Contactless payment cards are very convenient: you just tap the terminal with your card, and a few seconds later, your phone rings indicating that…
Full article →
2023.02.21 — Herpaderping and Ghosting. Two new ways to hide processes from antiviruses
The primary objective of virus writers (as well as pentesters and Red Team members) is to hide their payloads from antiviruses and avoid their detection. Various…
Full article →
2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security
This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…
Full article →
2022.02.15 — EVE-NG: Building a cyberpolygon for hacking experiments
Virtualization tools are required in many situations: testing of security utilities, personnel training in attack scenarios or network infrastructure protection, etc. Some admins reinvent the wheel by…
Full article →
2023.04.04 — Serpent pyramid. Run malware from the EDR blind spots!
In this article, I'll show how to modify a standalone Python interpreter so that you can load malicious dependencies directly into memory using the Pyramid…
Full article →
2022.06.02 — Climb the heap! Exploiting heap allocation problems
Some vulnerabilities originate from errors in the management of memory allocated on a heap. Exploitation of such weak spots is more complicated compared to 'regular' stack overflow; so,…
Full article →
2023.03.26 — Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques
Chances are high that you had dealt with DHCP when configuring a router. But are you aware of risks arising if this protocol is misconfigured on a…
Full article →
2023.02.13 — First Contact: Attacks on Google Pay, Samsung Pay, and Apple Pay
Electronic wallets, such as Google Pay, Samsung Pay, and Apple Pay, are considered the most advanced and secure payment tools. However, these systems are also…
Full article →
2023.02.21 — SIGMAlarity jump. How to use Sigma rules in Timesketch
Information security specialists use multiple tools to detect and track system events. In 2016, a new utility called Sigma appeared in their arsenal. Its numerous functions will…
Full article →
2022.04.04 — Elephants and their vulnerabilities. Most epic CVEs in PostgreSQL
Once a quarter, PostgreSQL publishes minor releases containing vulnerabilities. Sometimes, such bugs make it possible to make an unprivileged user a local king superuser. To fix them,…
Full article →