Disassembling REvil. The notorious ransomware hides WinAPI calls

Some unknown hackers have recently attacked Travelex foreign exchange company using REvil ransomware. This trojan employs simple but efficient obfuscation techniques that conceal its WinAPI calls from the victim. Let’s see how the encoder works.

As usual, I load a sample into DiE and review the output.

REvil in DiE

REvil in DiE

DiE believes that the file is not packed. But let’s check the entropy of its sections.

Entropy of REvil sections

Entropy of REvil sections

Based on the section names, the file is packed with UPX; however, the entropy of these sections looks pretty weird. Why hasn’t DiE recognized the packer? Here is one of the possible reasons: the UPX signature could be purposively altered to confuse disassemblers. In any event, this is a packed file; so, I load it to the x64dbg debugger, set a breakpoint at the VirtualAlloc function located not far from the entry point, and launch the trojan.

INFO

The unpacking mechanisms are pretty standard; so, if you have encountered an unknown packer, always set breakpoints at the following WinAPI functions:

  • VirtualAlloc (the function allocates memory for the payload);
  • VirtualProtect (the function specifies memory access attributes);
  • CreateProcessInternalW (when a new process is created, the control is ultimately passed to this function); and
  • ResumeThread (the function is used to resume the thread execution after an injection).

After reaching VirtualAlloc, the breakpoint activates, and I get inside this function. After its execution, I return to the debugger and see the following picture:

Then I note an interesting piece of code in the end of VirtualAlloc:

It is necessary to keep in mind that after the execution of the VirtualAlloc function, the address of the allocated memory is stored in eax. So, I set a breakpoint at this jump, switch to the dump (the address is in eax), and see what happens in the allocated memory. For that purpose, I set a one-time breakpoint at the beginning of data writing to the memory, and the debugger stops at the data writing cycle. A part of this cycle looks as follows:

I start rolling the cycle manually, and a painfully familiar signature appears in the memory:

I resume the program execution in the debugger, stop at jmp eax, make a step forward – and finally get inside the unpacked file! Now I can dump it and load to IDA Pro. After performing this simple procedure, I see the code of the start function:

Time to examine the functions and capabilities of the malware; I get into the first call and see the following code there:

I note a call to the sub_406A4D subprogram and then a call in the format: call dword_41CB64. I realize that if everything is left “as is”, the application would crash at this point because dword_41CB64 points to a table looking as follows (this is just a part of this table!):

In addition, the import table is empty in this sample: the WinAPI functions are called dynamically, function names are not stored openly, and the program likely uses their hashes. In other words, WinAPI calls are obfuscated. So, I get inside the sub_406A4D function using the debugger, see one unconditional transfer there and proceed into sub_405BCD. In the beginning of this function, I notice some interesting code:

The sub_405DCF function immediately attracts my attention. It contains plenty of code; so, I switch to the decompiled pseudocode (I am not an assembler guru and feel more comfortable dealing with the IDA Pro pseudocode).

The function is too massive to be listed here in full; however some of its components must be examined in detail. The execution of sub_405DCF can be divided into two phases. The first phase involves the transformation of the existing hash sums specified in the program. The second phase involves the retrieval of functions’ names from the system libraries export table, their hashing, and comparison with templates taken from the table discussed above.

In pseudocode, the parsing of the system libraries export table looks as follows:

Why has this piece of pseudocode attracted my attention? First and foremost, because of such eye-catching offsets as 0x3C or 0x78. In addition, the v13 variable operating with these values is transformed into the DWORD* form indicating that this is an offset, too. Overall, everything indicates that this the header of a PE file:

I see the 0x3c offset that corresponds to the e_lfanew field. I continue moving forward along e_lfanew and notice the following field at the offset 0x78 (see the pseudocode):

This means that the function is reading the export table, i.e. WinAPI functions are called dynamically.

To make IDA Pro understand the export table structure, I have to declare the table in Local Types by pressing Shift + F1. Then I select Convert to struct* in the context menu on the v17 variable. The export table structure of the PE file looks as follows:

The fields: *AddressOfFunctions, *AddressOfNames, and *AddressOfNameOrdinals are in use. It is clear from the pseudocode that the hashes are generated on the basis of the ‘incomplete’ hashes present in the code as follows:

Yes, hashes used in the body of the sample are not ‘complete’ yet and must be converted into the ‘proper’ format. After getting rid of all the fat and bloat, I get the following algorithm:

where hash is the hash from the table passed as an argument. A good thing is that IDA highlights identical variables; otherwise, the sample analysis would take forever. In the pseudocode, this hash is stored in a variable called a1 that acts as a function argument.

If I apply this algorithm to the hashes specified in the code (remember the table?), I will get the ‘correct’ hashes to be compared with the ones retrieved from the export table of the system library (to be specific, from the names of exported functions). In Python, the pseudocode generating hash on the basis of the function’s symbolic name looks as follows:

Calling the function:

So, all I have to do now is apply the hash_api_true algorithm to the entire table of pseudohashes present in the sample and produce a table of ‘correct’ hashes. Then I apply the hash_from_name algorithm to the list of WinAPI functions (that consists of their regular symbolic names) to get the hashed names of functions. And finally, I intercompare these two lists, thus, decoding the names and hashes. To expedite this process, I use a special Python scrip for IDA.

Could it be done faster?

In this particular case, REvil produces the entire table of deobfuscated functions at once. Therefore, after loading the sample into a debugger, you may execute the subprogram that retrieves and deobfuscates the WinAPI functions; then the debugger will automatically insert decrypted names of the API functions into the code. After that, you can make a dump and continue working with it; the functions will be present at their locations. But this method is suitable not for all situations. For instance, if the deobfuscation is performed not for the entire list of functions at once, but for each function separately at the time when it’s called, this technique won’t work.

Conclusions

Now you know how to restore WinAPI calls obfuscated using calls to their hashes. As you can see, this obfuscation technique can be easily negated using a debugger or a disassembler. Furthermore, mathematical manipulations with the hash don’t make the reverse engineering impossible. Such tricks can be easily detected in the pseudocode; all they can do is slow down the examination of a sample by a few minutes. The sole purpose of such manipulations is to fool automatic detection systems; if you perform a manual analysis, they are totally useless.


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">