What is fuzzing
If you aren’t familiar with this software testing technique, check our previous articles:
Similar to AFL, WinAFL collects code coverage information. For this purpose, it uses three techniques:
- dynamic instrumentation with DynamoRIO;
- static instrumentation with Syzygy; and
- tracing with IntelPT.
Let’s focus on the classical first variant since it’s the easiest and most straightforward one.
WinAFL fuzzes programs as follows:
- You pass the offset of the so called ‘target’ function contained in the binary as one of the arguments;
- WinAFL is injected into the program and waits for the target function to execute;
- WinAFL starts recording code coverage information.
- When WinAFL exits the target function, it pauses the program, substitutes the input file, overwrites the RIP/EIP with the address of the function start, and continues; and
- When the number of such iterations reaches some maximum (you determine it yourself), WinAFL restarts the program.
Such an approach allows you to avoid wasting extra time on the program launch and initialization and significantly increases the fuzzing speed.
Requirements to the function
The logic used in WinAFL has a number of simple requirements to the target function used for fuzzing. The target function must:
- Open the input file;
- Parse this file and finish its work as neatly as possible (i.e. close the file and all open handles, not change global variables, etc.). In reality, it’s not always possible to find an ideal parsing function (see below); and
- Return normally. The execution must reach the point of return from the function chosen for fuzzing.
Precompiled binaries are available in the WinAFL repository on GitHub, but for some reason, they refuse to work on my computer. Therefore, to avoid any issues, let’s compile WinAFL together with the latest DynamoRIO version. Fortunately, WinAFL can be easily compiled on any machine.
- Download and install Visual Studio 2019 Community Edition (when installing, select “Develop classic C++ applications”.
- While Visual Studio is installing, download the latest DynamoRIO release.
- Download WinAFL source code from its repository.
- After installing Visual Studio, you’ll see in the Start menu shortcuts opening the Visual Studio command prompt: (1) x86 Native Tools Command Prompt for VS 2019; and (2) x64 Native Tools Command Prompt for VS 2019. Select the one you need based on the bitness of the program you’re going to fuzz.
Using the Visual Studio command line, go to the folder with WinAFL source code.
To compile the 32-bit version, execute the following commands:mkdir build32cd build32cmake -G"Visual Studio 16 2019" -A Win32 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=0 -DUSE_COLOR=1cmake --build . --config Release
For the 64-bit version:mkdir build64cd build64cmake -G"Visual Studio 16 2019" -A x64 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=0 -DUSE_COLOR=1cmake --build . --config Release
In my case, these commands look as follows:cd C:\winafl_build\winafl-master\mkdir build32cd build32cmake -G"Visual Studio 16 2019" -A Win32 .. -DDynamoRIO_DIR=C:\winafl_build\DynamoRIO-Windows-8.0.18915\cmake -DINTELPT=0 -DUSE_COLOR=1cmake --build . --config Release
After the compilation, the folder
<will contain working WinAFL binaries. Copy them and the folder with DynamoRIO to the virtual machine you are going to use for fuzzing.
WinAFL dir>\ build< 32/ 64>\ bin\ Release
Selecting a suitable target for fuzzing
AFL was developed to fuzz programs that parse files. Although WinAFL can be applied to programs that use other input methods, the easiest way is to choose a target that uses files as input.
If, like me, you opt for extra challenge, you can try fuzzing network programs. In this case, you’ll have to use
custom_net_fuzzer. from WinAFL or write your own wrapper.
custom_net_fuzzer works pretty slowly because it sends network requests to its target, and additional time is spent on their processing.
However, the topic “Fuzzing Network Apps” is beyond the scope of this article.
- the ideal target deals with files;
- it takes the file path as a command line argument; and
- the module containing functions you want to fuzz must not be compiled statically. Otherwise, WinAFL would instrument numerous library functions. This won’t bring you any additional findings, but will slow down the fuzzing process significantly.
Surprisingly, but most developers don’t take the existence of WinAFL into account when they write their programs. So, if your target doesn’t meet the above criteria, you can still adapt it to WinAFL if you want to.
Finding a function for fuzzing inside the program
I have described an ideal target, but the real one may be far from this ideal; so, I used as an example a statically compiled program from my old stocks; its main executable file is 8 MB in size.
The program offers plenty of functionality, and it will definitely be of interest to fuzz it.
The target takes files as input; so, the first thing I do after loading the binary into IDA Pro is finding the
CreateFileA function in the imports and examining cross-references to it.
As you can see, it’s used in four functions. Instead of reversing each of them statically, let’s use the debugger to see which function is called to parse files.
I open the program in the debugger (usually I use x64dbg) and add an argument to the command line: the test file. Where did I get it from? Just opened the program, set the maximum number of options for the document and saved it to disk.
Then I select the
kernelbase. library on the Symbols tab and set breakpoints at exports of the
CreateFile* functions are ‘officially’ provided by the
kernelbase. library. But if you look closely, this library contains only
jmp to the respective functions of
I prefer to set breakpoints exactly at exports in the respective library. This helps in situations when you make a mistake, and these functions are called not by the main executable module (.exe), but, for instance, by some of your target libraries. It’s also useful if your program tries to call a function using
After setting the breakpoints, I continue executing the program and see how it makes the first call to
CreateFileA. But if you pay attention to the arguments, you’ll realize that the target wants to open some of its service files, not the test file.
I resume the program execution and continue it until I see the path to my test file in the list of arguments.
I switch to the Call Stack tab and see that
CreateFileA is called not from the test program, but from the
CFile:: function in the
Since I am just looking for a function to fuzz, I have to keep in mind that it must take the path to the input file, do something with this file, and terminate as neatly as possible. So, my strategy is to go up the call stack until I find a suitable function.
I copy the return address from
CFile::, follow it in IDA, look at the function, and immediately see that it takes two arguments that are subsequently used as arguments in two
Based on the
CFile:: prototypes from the MSDN documentation, the
a2 variables are file paths. Note that in IDA, the file path is passed to the
CFile:: function as the second argument because
thiscall is used.
virtual BOOL Open(LPCTSTR lpszFileName,UINT nOpenFlags,CFileException* pError = NULL);virtual BOOL Open(LPCTSTR lpszFileName,UINT nOpenFlags,CAtlTransactionManager* pTM,CFileException* pError = NULL);
This function looks very interesting and deserves a detailed examination. I set breakpoints at its beginning and end to examine its arguments and understand what happens to them by the end of its execution.
Then I restart the program and see that the two arguments are the paths to my test file and a temporary file.
Time to examine contents of these files. Based on the contents of the test file, it is compressed, or encrypted, or encoded in some way.
The temporary file is empty.
I wait until the function execution is completed and see that my test file is still encrypted, while the temporary file is still empty. So, I remove breakpoints from this function and continue monitoring calls to
CreateFileA. The next call to
CreateFileA gives me the following call stack.
The function that calls
CFile:: turns out to be very similar to the previous one. I set breakpoints at its beginning and end and see what happens.
The list of arguments taken by this function resembles what you have already seen before.
The breakpoint set at the end of this function triggers, and you can see the decrypted, or rather unpacked contents of the test file in the temporary file.
In other words, this function unpack files. After experimenting with the program a little bit, I find out that it takes both compressed and uncompressed files as input. This is good because it’s always preferable to fuzz uncompressed files: the code coverage is much better and the chance to discover more interesting features is higher.
Let’s see if it’s possible to find a function that does something to an already decrypted file.
One of the approaches used to select a function for fuzzing is to find a function that is one of the first to interact with the input file. Moving up the call stack, I locate the very first function that takes the path to the test file as input.
The function selected for fuzzing must be completely executed; therefore, I set a breakpoint at the end of this function to make sure that this requirement is met and press the F9 button in the debugger.
I also make sure that this function closes all open files after the return. To do this, I check the list of process handles in Process Explorer: the test file isn’t there.
As you can see, this function meets the WinAFL requirements. Now let’s do some fuzzing!
WinAFL arguments and pitfalls
My arguments for WinAFL look something like this. Let’s examine the most important of them in order.
afl-fuzz.exe -i c:\inputs -o c:\winafl_build\out-plain -D C:\winafl_build\DynamoRIO-Windows-8.0.18915\bin32 -t 40000 -x C:\winafl_build\test.dict -f test.test -- -coverage_module target.exe -fuzz_iterations 1000 -target_module target.exe -target_offset 0xA4390 -nargs 3 -call_convention thiscall -- "C:\Program Files (x86)\target.exe" "@@"
All arguments are divided into three groups separated from each other by two dashes.
The first group represents WinAFL arguments:
D– path to DynamoRIO binaries;
t– maximum timeout for one fuzzing iteration. If the target function isn’t completely executed within this time, WinAFL will conclude that the program is frozen and restart it;
x– path to the dictionary; and
f– using this parameter, you can pass the name and extension of the input file. This is useful when the program decides how to parse the file depending on its extension.
The second group represents arguments for the
winafl. library that instruments the target process:
coverage_module– module that records coverage (there may be more than one);
target_module– module containing the target function that will be fuzzed (can be only one);
target_offset– virtual offset of the function to be fuzzed from the start of the module;
fuzz_iterations– number of fuzzing iterations restarting the program. The lesser this value is, the more often WinAFL will restart the program, which takes extra time. However, if a program is fuzzed for a long time without restarting, unwanted side effects may accumulate;
call_convention– the following values are supported:
nargs– number of arguments the fuzzed function takes. The
thispointer (used in the
thiscallcalling convention) is also considered an argument.
The third group represents the path to the program. WinAFL will change
@@ to the full path to the input file.
Boosting WinAFLL capacity: adding a dictionary
Your goal is to increase the number of paths found per second. To do so, you can parallelize the fuzzer, play with the number of
fuzz_iterations, or try to fuzz in a smarter way. And a dictionary will help you in that.
WinAFL can recover the syntax of the target’s data format (e.g. AFL was able to synthesize valid JPEG files without any additional information). It uses the detected syntax units to generate new cases for fuzzing. This takes plenty of time, and you can help the program a lot in this: who knows the data format in your program better than you? To do that, you have to create a dictionary in the format
<. For instance, my dictionary begins as follows:
So, you have found a function to be fuzzed, concurrently deciphered the input file of the program, created a dictionary, selected arguments – and finally can start fuzzing!
And the first minutes of fuzzing bring first crashes! But the things don’t always run so smoothly. Some WinAFL features that can facilitate (or hinder) the fuzzing process are addressed below.
As said above, the function selected for fuzzing shouldn’t have side effects. But in real life, developers often forget to add such ‘perfect’ functions to their programs, and you have to deal with what you have.
Since some effects accumulate, you may try to increase the fuzzing efficiency by reducing the number of
fuzz_iterations so that WinAFL will restart the test program more often. This adversely affects the speed but reduces the number of side effects.
If WinAFL refuses to run, try running it in the debug mode. To do so, add the
-debug parameter to the arguments of the instrumentation library. After that, you will see in the current directory a text log. If the program operates normally, it should have the same numbers of lines
In . In addition, there must be the phrase:
Don’t forget to disable the debug mode! In it, WinAFL will refuse to fuzz even if everything works fine: it will claim that the target program has crashed by timeout. Don’t trust WinAFL and turn debugging off.
Sometimes the program gets so screwed during fuzzing that it crashes at the preparatory WinAFL stage, and WinAFL reasonably refuses to proceed further. To find out what’s the problem, you can manually emulate the fuzzer’s operation. Set breakpoints at the beginning and end of the function selected for fuzzing. When the program execution reaches the end of the function, edit the arguments, align the stack, change the RIP/EIP to the beginning of the function, etc. – until something breaks.
Stability is a very important parameter. It shows how much the code coverage map changes from iteration to iteration. If it’s 100%, then the program behaves exactly the same at each iteration; if it’s 0%, then each iteration is completely different from the previous one. Of course, you need this value to be somewhere in the middle. The creator of AFL believes that you should aim at some 85%. In the above example, stability was 9.5%. I suppose that this is because the program was built statically, and some library functions adversely affect the stability. Perhaps multithreading affects it, too.
Set of input files
The greater is the code coverage, the higher is the chance to find a bug. The maximum code coverage can be achieved by creating a suitable set of input files. If you intent to fuzz parsers of some well-known file formats, Google can help you a lot. Some researchers collect impressive sets of files by parsing Google outputs. Such a set of files can be subsequently minimized using the
[ script available in the WinAFL repository. However, if you (like me) prefer parsers of proprietary file formats, the search engine won’t help you much. To generate a set of interesting files, you’ll have to experiment with the program for a while.
Disabling error messages
My program was quite talkative and displayed pop-up messages claiming that the format of input files is wrong.
To fix this issue, patch the program or the library used by it.