Automating IDA Pro. Polishing debugger with coarse file

Debugger management plugins make it possible to delegate routine debugging and vulnerability identification tasks to the machine. Using practical examples, I am going to demonstrate how to bypass anti-debugging, identify paths to vulnerable functions, and highlight important code sections. You will write a plugin in C++, examine the built-in IDC scripting language, write your own scripts in IDAPython, and learn how to apply them to several files at a time.

info

I strongly recommend you to review “The IDA Pro Book” by Chris Eagle. It’s outdated in terms of API, but still answers most questions.

Plugin in C++

The first way to extend IDA’s capabilities involves compiled DLLs. To build a plugin, you need the C++ SDK. To download it legally, you have to purchase an IDA Pro license (illegal ways are beyond the scope of this article).

Let’s try to build and run a test plugin in Visual Studio 2019 (or newer). Create an empty project for Windows (not for the console) and specify the configuration type: dynamic library. In additional catalogues of include files, specify the path to the folder containing included headers: idasdk90\include. You can simply drag the included *.lib files from idasdk90\lib\x64_win_vc_64 to the list of project files, and the compiler will pick them up from there.

#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <ida.hpp>
#include <idp.hpp>
#include <loader.hpp>
#include <funcs.hpp>
class MyPlugmod : public plugmod_t
{
public:
    MyPlugmod()
    {
        msg("MyPlugmod: Constructor called.\n");
    }
    virtual ~MyPlugmod()
    {
        msg("MyPlugmod: Destructor called.\n");
    }
    virtual bool idaapi run(size_t arg) override
    {
        msg("MyPlugmod.run() called with arg: %d\n", arg);
        int n = get_func_qty();
        for (int i = 0; i < n; i++) {
            func_t* pfn = getn_func(i);
            if (pfn == nullptr)
                continue;
            qstring name;
            get_func_name(&name, pfn->start_ea);
            msg("Function %s at address 0x%llX\n", name.length() ? name.c_str() : "-UNK-", pfn->start_ea);
        }
        return true;
    }
};
static plugmod_t* idaapi init(void)
{
    return new MyPlugmod();
}
static const char comment[] = "The plugin displays a list of functions along with the addresses.";
static const char help[] = "Information is not provided";
static const char wanted_name[] = "List functions";
static const char wanted_hotkey[] = "Ctrl+Q";
plugin_t PLUGIN =
{
  IDP_INTERFACE_VERSION,
  PLUGIN_MULTI,         // flags
  init,                 // initialize
  nullptr,              // terminate
  nullptr,              // invoke the plugin
  comment,
  help,
  wanted_name,
  wanted_hotkey
};

The architecture is standard. The export specifies a reference to the PLUGIN structure. It contains the plugin description, the desired hotkey, and references to callbacks. The init function returns a reference to an instance of the MyPlugmod class. It contains the run function, which will be started when the plugin is invoked. The callback code gets the number of functions recognized by IDA using get_func_qty and goes through each function to obtain its name (using getn_func) and start address. After that, it outputs the collected information to the console.

Copy the built dll to the IDA Professional 9.0 SP1\plugins folder. A new item, “List functions”, will appear in the Edit → Plugins menu. Alternatively, the plugin can be invoked using the Ctrl-Q combination specified in it.

Compatibility issues

The key problem with plugins is the extremely poor compatibility between older and newer SDK versions. A plugin compiled using one version most probably won’t work on a slightly newer SDK (and significant effort will be required to build it). In other words, you have to modify and build a new version of your plugin for each IDA Pro release. Of course, no one does this. As a result, many plugins became unavailable over time.

Such an approach involves a commercial interest. The SDK provides extremely scant documentation (basically only the names of arguments and functions). To find out how a specific API function works, you have to search for someone else’s code or contact support, which costs money. It can be said that poor compatibility is a kind of protection against piracy. Too bad, the main victim of this approach is the end user.

IDC

Support for scripts written in the IDC language was introduced with the second version of IDA Pro in 1994.

IDC is a C-like language without strong typing. But unlike C, it doesn’t have references: all arguments are passed by value. The language supports most C expressions except for+=. User-defined functions are declared using the static keyword. Library functions are documented, but provide less control compared to the C++ SDK. There are exceptions, classes, and syntactic sugar (e.g. concatenating two strings with + and getting a substring using str[0:2] slicing).

#include <idc.idc>
static main()
{
  auto ea,x;
  for ( ea=get_next_func(0); ea != BADADDR; ea=get_next_func(ea) )
  {
    msg("Function at %08lX: %s", ea, get_func_name(ea));
    x = get_func_flags(ea);
    if ( x & FUNC_NORET ) msg(" Noret");
    if ( x & FUNC_FAR   ) msg(" Far");
    msg("\n");
  }
}

The code is similar to the example provided in the previous section. It gets ea (Effective Address) of the first function via get_next_func and continues requesting addresses of subsequent functions in a loop until the API returns the BADADDR constant. The function name is returned by get_func_name; flags with metadata are received using get_func_flags. The same msg outputs data to the console.

The main advantage of IDC scripts is their out-of-the-box support. In all other respects, they are fatally obsolete. Previously, all noteworthy solutions were written in the C++ SDK, but today most plugins and one-time scripts available in public repositories are written in Python.

IDAPython

This plugin integrates Python into IDA Pro. Its first version was released in 2004. Its author Dyce (Gergely Erdelyi) developed IDAPython at the expense of his then-employer, the Finnish company F-Secure that produces an antivirus of the same name. In mid-2010s, due to lack of time, the project was transferred to Hex-Rays. IDAPython is still supported by the Hex-Rays employee 0xeb (Elias Bachaalany). Starting with version 5.4, the plugin became part of IDA Pro.

Today, IDAPython is the main language used to write new plugins; while old ones are ported to it. The above-mentioned backward compatibility issues have also contributed to its popularity. In addition, IDAPython is well-documented.

IDAPython code makes it possible to perform the same operations as the C++ SDK. It’s easy to notice that the function names are the same. In fact, IDAPython is a thin and handy wrapper for low-level APIs. Therefore, in IDAPython you can write loaders for unknown file formats or plugins that support their own windows in the GUI.

import idaapi
import idautils
import idc
class ListFunctionsPlugin(idaapi.plugin_t):
    flags = idaapi.PLUGIN_UNL
    comment = "The plugin displays a list of functions along with the addresses."
    help = "Information is not provided"
    wanted_name = "List Functions v2"
    wanted_hotkey = "Alt-F8"
    def init(self):
        print("[ListFunctionsPlugin] Constructor called.")
        return idaapi.PLUGIN_OK
    def run(self, arg):
        for func_ea in idautils.Functions():
            func_name = idc.get_func_name(func_ea)
            print(f"0x{func_ea:08X}: {func_name}")
    def term(self):
        print("[ListFunctionsPlugin] Destructor called.")
def PLUGIN_ENTRY():
    return ListFunctionsPlugin()

As usual, let’s examine the plugin that displays the list of functions. Copy the code to the IDA Professional 9.0 SP1\plugins folder, restart IDA, and you’ll get a new menu item: List Functions v2. Alternatively, the plugin can be started by pressing Alt-F8.

This is a complete analogue of the C++ plugin; the only difference is that Python features handy wrappers from the idautils library that provide the list of function addresses. The name get_func_name is familiar from the previous example. Instead of the PLUGIN structure, class fields inherited from idaapi.plugin_t are used. The class has three predefined functions: a constructor, a destructor, and run (i.e. function containing code that is executed when the plugin is invoked).

Writing scripts in IDAPython

Below are a few examples illustrating how you can make your life easier with IDAPython. Scripts from disk are executed via File → Script File; short scripts from the clipboard can be executed via File → Script Command.

Highlighting CALL

import idautils
import idc
CALL_COLOR = 0xFFDDCC
for seg_ea in idautils.Segments():
    for head in idautils.Heads(seg_ea, idc.get_segm_end(seg_ea)):
        if idc.is_code(idc.get_full_flags(head)):
            mnem = idc.print_insn_mnem(head)
            if mnem.lower() == "call":
                idc.set_color(head, idc.CIC_ITEM, CALL_COLOR)

Run the script, and you’ll see that all strings containing the CALL instruction are highlighted in pale blue.

The idautils.Segments() function returns a list of segments: addresses of the beginning of each section in PE32. Then idautils.Heads returns all elements inside the designated addresses. This can be either code or data. The script checks whether the element is a piece of code using idc.is_code. Then it gets the mnemonics from the instruction using idc.print_insn_mnem and compares it with the desired CALL. And finally, idc.set_color highlights the element in the specified color. The colors are specified in the BBGGRR format (i.e. to make it pure blue, you have to specify FF0000).

Substituting WinAPI results

Let’s try to bypass simple anti-debugging. For example, the program under examination compares the IsDebuggerPresent result with 1. An obvious solution is to erase the BeingDebugged flag from the PEB (Process Environment Block), but to make this example more interesting, let’s try to substitute the WinAPI result on the fly.

import idc
import idaapi
IAT_NAME = "__imp__IsDebuggerPresent@0"
RETURN_VALUE = 0
global_hook = None
class ExitHook(idaapi.DBG_Hooks):
    def __init__(self, target_addr):
        super().__init__()
        self.target_addr = target_addr
        self.ret_addr = None
    def dbg_bpt(self, tid, ea):
        print(f"[+] Breakpoint hit at 0x{ea:X}")
        if ea == self.target_addr:
            esp = idc.get_reg_value("esp")
            print(f"[+] ESP: 0x{esp:X}")
            self.ret_addr = idc.get_wide_dword(esp)
            print(f"[+] Captured return address: 0x{self.ret_addr:X}")
            idc.add_bpt(self.ret_addr)
        elif self.ret_addr and ea == self.ret_addr:
            print(f"[+] Return point hit. Overwriting EAX with {RETURN_VALUE}")
            idc.set_reg_value(RETURN_VALUE, "EAX")
            idc.del_bpt(self.ret_addr)
            self.ret_addr = None
        return 0
def setup_hook():
    global global_hook
    imp_ptr = idc.get_name_ea_simple(IAT_NAME)
    if imp_ptr == idc.BADADDR:
        print(f"[-] Import {IAT_NAME} not found.")
        return
    target_addr = idc.get_wide_dword(imp_ptr)
    print(f"[+] Real ExitProcess address: 0x{target_addr:X}")
    idc.add_bpt(target_addr)
    global_hook = ExitHook(target_addr)
    global_hook.hook()
    print("[+] Hook installed. Start or resume process (F9).")
setup_hook()

The easiest way is to use idc.get_name_ea_simple to get a reference to IAT_NAME: four bytes in the import section where the loader will write the target address of the called WinAPI function. Next, the script takes the current address of the function using idc.get_wide_dword and sets a breakpoint on it using idc.add_bpt.

The ExitHook class that inherits idaapi.DBG_Hooks is used to create a hook. This is a modern way to set hooks on various debug events, including breakpoints. The trick is that the class instance must be global; if you create it locally in setup_hook, the garbage collector will delete it at the moment the function terminates. IDA won’t say anything, but the set hook won’t work.

In the dbg_bpt breakpoint handler, the script checks whether it has stopped at the address of the intercepted WinAPI, and gets the return address from the stack. Then it sets the second breakpoint on it. When it’s triggered, the script replaces the value of the EAX register with zero and deletes the hook that is no longer required.

#include <windows.h>
int APIENTRY wWinMain(_In_ HINSTANCE hInstance,
                     _In_opt_ HINSTANCE hPrevInstance,
                     _In_ LPWSTR    lpCmdLine,
                     _In_ int       nCmdShow)
{
    if (IsDebuggerPresent())
    {
        MessageBoxA(0, "debugger", "!", 0);
    }
}

Now you can run the script at the start of the test application — and Voila! IsDebuggerPresent returns zero, and the message isn’t displayed.

Searching for paths to unsafe functions

If the import contains functions that can potentially cause a buffer overflow, it’s worth checking whether user data can be passed with them. To do this, you have to create a graph by traversing from a given function to all functions that call it, then to all functions that call them, and so on up to the topmost level.

import idautils
import idc
import idaapi
import ida_funcs
from collections import defaultdict
IMPORT_NAME = "lstrcpyW"
callers_map = defaultdict(set)
calls_to_func = []
def get_func_name(ea):
    return idc.get_func_name(ea) or f"sub_{ea:X}"
def get_func_start_ea(ea):
    f = ida_funcs.get_func(ea)
    return f.start_ea if f else ea
def find_import_address():
    ea = idc.get_name_ea_simple(IMPORT_NAME)
    if ea == idc.BADADDR:
        print(f"[-] Import {IMPORT_NAME} not found.")
        return None
    print(f"[+] Found {IMPORT_NAME} import at: 0x{ea:X}")
    return ea
def find_calls_to_import(imp_addr):
    print("[*] Scanning for calls to imported function...")
    for func_ea in idautils.Functions():
        for insn_ea in idautils.FuncItems(func_ea):
            if idc.print_insn_mnem(insn_ea).lower() != "call":
                continue
            op_type = idc.get_operand_type(insn_ea, 0)
            if op_type in [idc.o_mem, idc.o_displ]:
                target = idc.get_operand_value(insn_ea, 0)
                if target == imp_addr:
                    calls_to_func.append(func_ea)
                    callers_map[imp_addr].add(func_ea)
    print(f"[+] Found {len(calls_to_func)} direct callers of {IMPORT_NAME}")
def build_call_graph():
    print("[*] Building global call graph...")
    for func_ea in idautils.Functions():
        for insn_ea in idautils.FuncItems(func_ea):
            if idc.print_insn_mnem(insn_ea).lower() != "call":
                continue
            target = idc.get_operand_value(insn_ea, 0)
            if ida_funcs.get_func(target):
                callers_map[target].add(func_ea)
def build_paths(target_ea, path=None, visited=None):
    if path is None:
        path = []
    if visited is None:
        visited = set()
    path = [target_ea] + path
    visited.add(target_ea)
    if target_ea not in callers_map or not callers_map[target_ea]:
        yield path
    else:
        for caller in callers_map[target_ea]:
            if caller not in visited:
                yield from build_paths(caller, path, visited.copy())
def print_path_tree(path):
    for depth, ea in enumerate(path):
        print("  " * depth + f"- {get_func_name(ea)}")
def main():
    imp_addr = find_import_address()
    if not imp_addr:
        return
    find_calls_to_import(imp_addr)
    build_call_graph()
    print("\n[+] All unique call paths to lstrcpyW:\n")
    seen_paths = set()
    for caller in calls_to_func:
        for path in build_paths(caller):
            norm_path = tuple(get_func_start_ea(ea) for ea in path)
            if norm_path not in seen_paths:
                seen_paths.add(norm_path)
                print_path_tree(path)
    if not seen_paths:
        print("[-] No paths found.")
main()

Let me briefly explain what’s going on. The find_calls_to_import function creates a call map (i.e. records all places from where lstrcpyW is called). Then build_call_graph continues compiling this map by going through all functions and collecting all calls. Finally, build_paths builds all the possible paths to the desired address based on the received data. The top-level code filters the received paths to output only unique combinations.

The script output looks as shown below:

[+] Found lstrcpyW import at: 0x5A9170
[*] Scanning for calls to imported function...
[+] Found 13 direct callers of lstrcpyW
[*] Building global call graph...

[+] All unique call paths to lstrcpyW:

- sub_48CCA0
- sub_490F90
- sub_48D020
- sub_41B820

- sub_48CCA0
- sub_490F90
- sub_48D020

- ?__scrt_common_main_seh@@YAHXZ
- _WinMain@16
- sub_525BB0

There are thirteen unique calls, but all of them are made from three functions.

Applying scripts to multiple targets

IDA Pro can be controlled from the command line. All you have to do is run it in headless mode (i.e. without displaying graphics) by executing idat64.exe instead of ida64.exe:

idat64.exe -A -S"path\to\script.py" path\to\target.exe

The -A key starts IDA in a standalone mode without dialog boxes. The -S key specifies the path to the script. Then the path to the analyzed file is specified.

import ida_auto
import idc
ida_auto.auto_wait()
with open("result.txt", "w") as f:
    f.write("something")
idc.qexit(0)

The executed script must wait for the analysis to complete; ida_auto.auto_wait is used for this purpose. The script output is written to an external file. After that, IDA terminates.

Conclusions

Scripts with access to internal APIs significantly expand IDA’s capabilities and effectively transform this decompiler into a universal tool suitable for any task. IDA can act as a code analyzer, situationally enhanced debugger, etc., etc. Furthermore, if you run IDA from the command line, it will become a weapon of mass destruction for enemy code!

Good luck!

info

Plugin in C++

Compatibility issues

IDC

IDAPython

Writing scripts in IDAPython

Highlighting CALL

Substituting WinAPI results

Searching for paths to unsafe functions

Applying scripts to multiple targets

Conclusions

01.06.2022 — F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

03.06.2022 — Vulnerable Java. Hacking Java bytecode encryption

22.01.2023 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

07.07.2023 — Evil Ethernet. BadUSB-ETH attack in detail

03.06.2022 — Playful Xamarin. Researching and hacking a C# mobile app

08.06.2023 — Cold boot attack. Dumping RAM with a USB flash drive

15.02.2022 — First contact: How hackers steal money from bank cards

26.03.2023 — Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques

09.02.2022 — Dangerous developments: An overview of vulnerabilities in coding services

15.02.2022 — Reverse shell of 237 bytes. How to reduce the executable file using Linux hacks

01.06.2022 —
F#ck AMSI! How to bypass Antimalware Scan Interface and infect Windows

03.06.2022 —
Vulnerable Java. Hacking Java bytecode encryption

22.01.2023 —
Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

07.07.2023 —
Evil Ethernet. BadUSB-ETH attack in detail

03.06.2022 —
Playful Xamarin. Researching and hacking a C# mobile app

08.06.2023 —
Cold boot attack. Dumping RAM with a USB flash drive

15.02.2022 —
First contact: How hackers steal money from bank cards

26.03.2023 —
Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques

09.02.2022 —
Dangerous developments: An overview of vulnerabilities in coding services

15.02.2022 —
Reverse shell of 237 bytes. How to reduce the executable file using Linux hacks