Silent call. Concealing NTAPI calls from security tools

Since recently, EDR systems have been increasingly frequently using call stack tracing to detect malicious applications and envenom red teamer’s lives. Let’s analyze this powerful technique and find a way to fool EDR and call NTAPI covertly — so that even stack unwinding won’t expose your calls.

warning

This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.

WinAPI indirect calls

Why invent something new if such a technique as WinAPI Indirect Calls already exists? It has certain advantages, and it’s easy to implement and well-documented. Let me briefly remind you of its operating principle.

The essence of the Indirect Syscalls technique is that NTAPI is called not directly via ntdll.dll, but using special code that emulates a NTAPI function call: a stack of arguments is formed; the number of the system service used for this call is saved to eax; and then a jump to the address of the syscall instruction in ntdll.dll is performed. The code looks something like this:

// The argument stack has already been formed; the WinAPI call from ntdll.dll is performed next
mov r10, rcx
mov eax, <syscall_number>
jmp <address_of_syscall_instruction_in_ntdll>

From the outside, it looks fine, and it had worked until EDR started using Event Tracing for Windows (ETW) to unwind the system call stack. Too bad, even though the Indirect Syscalls technique makes it possible to avoid detection at the user mode level, it still leaves traces in the call stack. EDR can now unwind this stack, analyze it, and detect such anomalies.

Stack, frames, and EDR

To proceed further, a basic understanding of stack frames is required: how are they generated and how do they work. A stack frame is a memory area in the stack allocated for the execution of each function contained in the program. It contains such data as function parameters, local variables, register values, and return address. Below is a simple example of such code:

void func1() {
    int x = 10;
    func2(x);
    ...
    ...
}
void func2(int y) {
    int z = y + 5;
}

When func2(x) is called, the following frame is generated:

RSP is the z variable;
RBP is the base pointer of the previous frame;
RBP + 8 is the pointer to the next instruction in func1() (return address); and
RBP + 16 is a parameter of the func2 function.

In other words, stack frames are generated every time a function is called, and they contain plenty of useful information, including a pointer to the previous frame (you can consider it a function). This is how EDR reverses the control flow if it detects suspicious behavior in the system. Needless to say that commonly used obfuscation methods for WinAPI calls won’t protect you from this detection technique; so, let’s try to forge a special frame to fool EDR.

info

Event Tracing for Windows (ETW) is a technology embedded in Windows OS; it collects and records event-related data from various system components and applications. ETW provides powerful telemetry tools, including collection of information about the call stack, which is essential for diagnostics, performance monitoring, and security analysis.

How EDR monitors the stack

The thing is that EDR knows how the stack should look in any given situation, and if its structure deviates from the norm, an alert is triggered. Among other things, EDR checks:

where the jump to syscall was made from. If the jump was made from a memory region that doesn’t belong to ntdll.dll (e.g. the heap or allocated memory), this is considered an anomaly;
access rights to the memory region from where the jump was made. If the region has PAGE_EXECUTE_READWRITE rights, this might indicate an indirect call or shellcode; and
where the execution flow will return after the system call is completed. If the returned address points to user memory, this is also considered an anomaly.

Now let’s see how the correct (from the EDR point of view) call stack structure looks when memory is allocated from a user application:

0x00007ffb`12345678  ntdll!NtAllocateVirtualMemory
0x00007ffb`12345555  kernel32!VirtualAlloc
0x00007ffb`12345432  MyProgram!Main

The MyProgram application allocates memory from the main function using VirtualAlloc that calls the NtAllocateVirtualMemory NTAPI from ntdll.dll. But what happens if a NTAPI is called manually (i.e. you form a stack and registers for the call)? In such a situation, the program makes an indirect call:

mov r10, rcx
// Number of NtAllocateVirtualMemory in the service table
mov eax, 0x18
// Making a syscall at a certain address in ntdll.dll
jmp 0x00007ffb`12345678

EDR captures the call stack:

0x00007ffb`12345678  ntdll!NtAllocateVirtualMemory
// Address in user memory (where you’ve generated arguments) from where the jump to the system call was performed inside ntdll.dll
0x000001a3`45678901  [RX Region]
0x00007ffb`12345432  MyProgram!Main

You can see that the return address points to ntdll.dll, but the stack begins in user memory, which triggers an alert!

It turns out that indirect calls are not as difficult to detect as it might seem. Well, let’s find a solution!

Calling NTAPI without traces

One of my previous articles describes a code injection technique called PoolParty that uses the Windows Thread Pools mechanism. Now I am going to show how to call a NTAPI using the same mechanism. It must be noted that this isn’t the only Windows mechanism that can be used to proxy NTAPI calls, but just a more or less researched one.

The TpAllocWork, TpPostWork, and TpReleaseWork functions are of special interest in this regard: they are used to manage tasks in Windows Thread Pools. These functions make it possible to create, start, and release tasks that are processed by the Windows Thread Pools, which facilitates parallel code execution. Needless to say that they are absolutely undocumented and contained in ntdll.dll. Let’s use them to call a NTAPI bypassing stack monitoring. But first, let’s examine their prototypes.

The TpAllocWork function creates a new task that can be executed by the thread pool. When you create a task, you have to specify a callback function that will be executed when the thread pool starts processing this task.

PTP_WORK TpAllocWork(
    PTP_WORK_CALLBACK WorkCallback,
    PVOID Context,
    PTP_CALLBACK_ENVIRON CallbackEnviron
);

Parameters:

WorkCallback is a pointer to the callback function that will be called to execute the task;
Context is a pointer to the user context that is passed to the callback function; and
CallbackEnviron is a pointer to the TP_CALLBACK_ENVIRON structure that configures environment for the task. If the value of this parameter is nullptr, the default environment is used.

After execution, TpAllocWork returns a pointer to PTP_WORK (work object of the thread pool) in the case of success or nullptr in case of an error.

The next function is TpPostWork: it posts the created task to the thread pool queue for execution. This function activates the execution of the task created by TpAllocWork.

void TpPostWork(
    PTP_WORK Work
);

The only parameter of PTP_WORK Work is a pointer to the work object of the thread pool that was created using TpAllocWork.

The last function is TpReleaseWork: it releases the task object. This function is required to release resources allocated to the task created using TpAllocWork. If the task is queued or running, the TpReleaseWork call will wait until the task is completed and then release the resources.

void TpReleaseWork(
    PTP_WORK Work
);

Similar to the previous function, TpReleaseWork has only one argument: PTP_WORK Work.

An API call performed using these functions is formed as follows:

typedef NTSTATUS(NTAPI* ALLOCWORK)(PTP_WORK* ptpWork, PTP_WORK_CALLBACK ptpCallback, PVOID arg, PTP_CALLBACK_ENVIRON CallbackEnv);
typedef VOID(NTAPI* POSTWORK)(PTP_WORK);
typedef VOID(NTAPI* RELEASEWORK)(PTP_WORK);
...
VOID MakeCall(PTP_WORK_CALLBACK work_callback, PVOID args) {
    PTP_WORK ptpWork = NULL;
    ((TPALLOCWORK)ptr_TpAllocWork)(&ptpWork, (PTP_WORK_CALLBACK)work_callback, args, NULL);
    ((TPPOSTWORK)ptr_TpPostWork)(ptpWork);
    ((TPRELEASEWORK)ptr_TpReleaseWork)(ptpWork);
    WaitForSingleObject((HANDLE)-1, 0x1000);
}

PTP_WORK_CALLBACK work_callback is a pointer to an assembly language insertion that will make the NTAPI call; while PVOID args is the argument pack of the called function.

The argument pack is formed as a regular structure containing function parameters. The only exception is that the first argument in this structure is a pointer to NtAllocateVirtualMemory. In this particular case, it looks as follows:

// Argument structure
typedef struct _ZWALLOCATEVIRTUALMEMORY_ARG {
    // The first argument in the structure is a pointer to the NtAllocateVirtualMemory function.
    UINT_PTR    pNtAllocateVirtualMemory;
    HANDLE      ProcessHandle;
    PVOID*      BaseAddress;
    ULONG_PTR   ZeroBits;
    PSIZE_T     RegionSize;
    ULONG       AllocationType;
    ULONG       Protect;
} ZWALLOCATEVIRTUALMEMORY_ARG, * PZWALLOCATEVIRTUALMEMORY_ARG;
...
PVOID NtAllocateVirtualMemory(HANDLE hProcess) {
    PVOID alloc_addr = NULL;
    SIZE_T alloc_size = 0x1000;
    ZWALLOCATEVIRTUALMEMORY_ARG AllocateVirtualMemory_Arg = { 0 };
        // Here you get a pointer to NtAllocateVirtualMemory
    AllocateVirtualMemory_Arg.pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(GetModuleHandleA("ntdll"), "NtAllocateVirtualMemory");
    AllocateVirtualMemory_Arg.ProcessHandle = hProcess;
    AllocateVirtualMemory_Arg.BaseAddress = &alloc_addr;
    AllocateVirtualMemory_Arg.ZeroBits = 0;
    AllocateVirtualMemory_Arg.RegionSize = &alloc_size;
    AllocateVirtualMemory_Arg.AllocationType = MEM_RESERVE;
    AllocateVirtualMemory_Arg.Protect = PAGE_EXECUTE_READWRITE;
    MakeCall((PTP_WORK_CALLBACK)NtAllocateVirtualMemoryCallback, &AllocateVirtualMemory_Arg);
    return alloc_addr;
}

work_callback is written in MASM, and its purpose is to parse the arguments, form a stack for NtAllocateVirtualMemory, and call the function. In x64, arguments are passed to the function as follows: the first four arguments are passed via the rcx, rdx, r8, and r9 registers; if the number of arguments is greater, the subsequent arguments are passed via the stack using the rsp register.

NtAllocateVirtualMemoryCallback proc
    mov rbx, rdx
    ; Get NTAPI address from the structure
    mov rax, [rbx]
    ; Receive all its remaining arguments
    mov rcx, [rbx + 8]
    mov rdx, [rbx + 10h]
    xor r8, r8
    mov r9, [rbx + 18h]
    mov r10, [rbx + 20h]
    mov [rsp + 30h], r10
    mov r10, 1000h
    mov [rsp + 28h], r10
    ; Call NtAllocateVirtualMemory
NtAllocateVirtualMemoryCallback endp
    jmp rax

Note that an assembler function can be called from C++ code by declaring it with extern in one of the headers.

extern "C" VOID CALLBACK NtAllocateVirtualMemoryCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_WORK Work);

After that, you can call NtAllocateVirtualMemory(HANDLE hProcess), and the stack will look absolutely ‘innocent’ because the function call will be performed by the Windows Thread Pools mechanism.

Nothing in the stack exposes your actions

Conclusions

Congrats! Now you know what is stack unwinding, how it helps to detect the execution of malicious code, and how this detection mechanism can be fooled. Importantly, Windows itself helps you to avoid detection. Can this technique be considered a silver bullet? Of course, not! As soon as EDR systems start monitoring callbacks in functions that are of interest to them, this secret will be exposed. But, as always, the battle between the shield and the sword continues!

warning

WinAPI indirect calls

Stack, frames, and EDR

info

How EDR monitors the stack

Calling NTAPI without traces

Conclusions

19.04.2023 — Kung fu enumeration. Data collection in attacked systems

02.06.2022 — Blindfold game. Manage your Android smartphone via ABD

21.02.2023 — Herpaderping and Ghosting. Two new ways to hide processes from antiviruses

20.07.2023 — Evil modem. Establishing a foothold in the attacked system with a USB modem

29.07.2023 — Invisible device. Penetrating into a local network with an 'undetectable' hacker gadget

03.03.2023 — Infiltration and exfiltration. Data transmission techniques used in pentesting

04.04.2023 — Serpent pyramid. Run malware from the EDR blind spots!

21.02.2023 — SIGMAlarity jump. How to use Sigma rules in Timesketch

20.04.2023 — Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

09.02.2022 — F#ck da Antivirus! How to bypass antiviruses during pentest

19.04.2023 —
Kung fu enumeration. Data collection in attacked systems

02.06.2022 —
Blindfold game. Manage your Android smartphone via ABD

21.02.2023 —
Herpaderping and Ghosting. Two new ways to hide processes from antiviruses

20.07.2023 —
Evil modem. Establishing a foothold in the attacked system with a USB modem

29.07.2023 —
Invisible device. Penetrating into a local network with an 'undetectable' hacker gadget

03.03.2023 —
Infiltration and exfiltration. Data transmission techniques used in pentesting

04.04.2023 —
Serpent pyramid. Run malware from the EDR blind spots!

21.02.2023 —
SIGMAlarity jump. How to use Sigma rules in Timesketch

20.04.2023 —
Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows

09.02.2022 —
F#ck da Antivirus! How to bypass antiviruses during pentest