Vulnerable Java. Hacking Java bytecode encryption

Java code is not as simple as it seems. At first glance, hacking a Java app looks like an easy task due to a large number of available decompilers. But if the code is protected by bytecode encryption, the problem becomes much more complicated. In this article, I will explain in detail how to circumvent this protection mechanism.

A novice programmer who has learned something about Java has the false impression that coding in this language is very simple. A novice hacker might get the same impression about cracking Java programs. The job seems to be easy: you take a ZIP archiver, unpack JAR files, choose a decompiler to your taste, and decompile the resultant CLASS files (either one at a time or the entire project at once). Voila! You’ve got the source code of the project on a silver platter!

warning

This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.

Of course, sometimes you have to fiddle around with obfuscation or JVM bytecode. Still, it’s much less boring than dealing with native code protected by Themida or even dotnet apps…

But in fact, Java is no easier than the above-mentioned technologies, and such simple cases occur pretty rarely. This article describes one of the techniques used to protect Java code from decompiling and explains how to circumvent it.

Imagine, for instance, that you deal with some graphical program whose license is validated on a remote server at startup. If there is no valid license or the server is unavailable, the program kindly asks you to try again or closes. For training purposes, let’s try to make it work even if the license cannot be validated.

Upon closer examination, you notice that the program’s executable module is a simple Java Runtime Environment loader; it calls javaw using a very long command line that contains a list of JAR modules and libraries. At the end of this list, you see the name of the main class.

You run a JAR search and find the class file. Too bad, all the decompilers at your disposal refuse to work with this file, and even dirtyJOE doesn’t support its format. You open the file in a HEX editor and realize that dirtyJOE is 100% right: instead of a normally compiled CLASS file, you see only the CAFEBABE signature and high-entropy white noise of packed or encrypted data.

Encrypted Main.class file in a HEX editor
Encrypted Main.class file in a HEX editor

Apparently, when the program is executed, something interferes into the JVM bytecode loading process and decrypts it on the fly. But how can this be implemented in Java?

To answer this question, some Java machine theory is required. Since it’s as cross-platform as .NET (in fact, even more cross-platform in a certain way), JVM bytecode is not interpreted, but compiled once into platform-specific native code when the class is loaded. This process is called JIT (just in time) compilation. So, similar to .NET, you can attach the x64dbg debugger to the javaw.exe process while the program is running; this allows you to debug the compiled native code.

It must be admitted though that this process is very labor-consuming since the compiled code looks like a nightmare (unlike, for instance, a JIT compilation output of the above-mentioned .NET). It’s highly optimized, multithreaded, and nimble, but extremely unfriendly from the reversing perspective.

JIT-compiled native code
JIT-compiled native code

Of course, some leads are present there. For instance, if you examine the jvm.dll, java.dll, jli.dll, and other modules, you’ll find there many standard basic functions that simplify the debugging process. Maybe someday I address them in another article… A lot of useful materials on this topic can be found on the Internet, for instance, Understanding How Graal Works – a Java JIT Compiler Written in Java. But right now your goal is to understand the bytecode encryption and decryption process and restore the Java source code from the encrypted one. In .NET, it’s possible to find the entry point of the JIT compiler; so, let’s try to perform this operation in Java as well.

Again, I have to present some theory without getting into specifics. Usually, two main interfaces are used to implement bytecode substitution in the JVM, and each of these interfaces utilizes its own approach. The first interface is called JVMCI (JVM compiler interface); it’s used to connect the native Java JIT compiler (that is written in Java as well). For obvious reasons, this variant is useless in this particular case (since all classes, starting with the main one, are encrypted).

The second one, JVM Tool Interface (JVMTI), seems to be exactly what you need, so let’s take a closer look at it. JVMTI is a useful interface designed for interaction with the JVM virtual machine. It enables you to extend the VM functionality without altering the code. Full description of this tool is beyond the scope of this article; for more information, see the official documentation.

All useful features of this interface are implemented through the so-called agents (i.e. external plugins). They have many functions, but the main thing is that they give you full access to the loaded bytecode and control over it. This is exactly what you need. Agents are loaded from javaw, and you have to specify special parameters in the manifest or in the command line. For instance, the most common agent type is javaagent. Being written in Java, such agents are enabled using the appropriate javaagent:agent.jar command line. This agent type also has full access to the bytecode, and its substitution is often used for code obfuscation and modification – but not in this particular case.

The app under investigation uses a native JVMTIAgent; to detect its call, you have to review the command line and find the following parameter in it: -agentlib:JavaLoader. JVMTIAgent is a dynamic library (in Windows, it’s a DLL; in Linux, SO), and the following functions are exported from it:

  • JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved); – this function is called when an agent starts if this agent it specified in the command line parameters -agentpath: or -agentlib: (as in this case);
  • JNIEXPORT jint JNICALL Agent_OnAttach(JavaVM* vm, char* options, void* reserved); – this function is called if the agent isn’t loaded at startup; in such a case, you first connect to the target process and then send a command to the respective target process to load the agent; and 
  • JNIEXPORT void JNICALL Agent_OnUnload(JavaVM *vm); – this function is optionally called to shutdown an agent.

You have already found in the work directory a dynamic library with the original name JavaLoader.dll; so, you load it into the IDA disassembler, find the functions that are of interest to you, and confirm that this is indeed JVMTIAgent that decrypts the bytecode when the required class is loaded. But how does it do this?

Time to open the above-mentioned JVM Tool Interface specification again. The general operation principle of an agent is as follows: it installs its own custom callback handlers for certain events. In this particular case, you are interested in the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event that is called immediately after loading the bytecode array of the required class from the file, but before JIT compilation of this class. The installation of such a handler is implemented as follows:

JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
jvmtiEventCallbacks callbacks;
jvmtiEnv * jvmtienv = jvmti(agent);
jvmtiError jvmtierror;
memset(&callbacks, 0, sizeof(callbacks));
callbacks.ClassFileLoadHook = &eventHandlerClassFileLoadHook; // New handler for the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event
jvmtierror = (*jvmtienv)->SetEventCallbacks( jvmtienv, &callbacks, sizeof(callbacks)); // Install handlers
jvmtierror = (*jvmtienv)->SetEventNotificationMode(jvmtienv, JVMTI_ENABLE,
JVMTI_EVENT_CLASS_FILE_LOAD_HOOK,
(jthread)NULL); // Permit handling of the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event

You examine the code of the Agent_OnLoad procedure in IDA and locate the required place.

Handler of the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event is installed in the JavaLoader agent
Handler of the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event is installed in the JavaLoader agent

As you can see, the local variable [rsp+1C0h+var_170] includes the callbacks structure that is cleared at the address 18002C0F8. Then the ClassFileLoadHook handler is pushed into it at 18002C102, and finally the SetEventCallbacks call is made: call qword ptr [rax+3C8h]. Congrats! You’ve found a callback function that decrypts the class bytecode prior to its compilation. In this particular case, it’s sub_18002BDC0. The description of this handler provided in the specification is as follows:

void JNICALL
eventHandlerClassFileLoadHook(
jvmtiEnv * jvmtienv,
JNIEnv * jnienv,
jclass class_being_redefined,
jobject loader,
const char* name, // Class name
jobject protectionDomain,
jint class_data_len, // Class size
const unsigned char* class_data, // Encrypted bytecode loaded from file
jint* new_class_data_len, // Size of decrypted data
unsigned char** new_class_data // Decrypted data
)

So, you run the program in the x6dbg debugger and set a breakpoint at this handler. The breakpoint will be triggered every time a class is loaded and decrypted. At the entry to the class, the name parameter (register rsi) will show the class name, and at the end, the handler will save the decoded bytecode in new_class_data that can be dumped and decompiled. The rest is paperwork: first of all, enable the logging of loaded classes (log text {s:rsi}).

Breakpoint set at the handler
Breakpoint set at the handler

Now the sequence of classes loaded by the program is recorded in the debugger log; you can use it to find the class that sends the license validation request to the remote server. Imagine that you’ve found this class, and it’s called, let’s say, com/coreui/app/license/wizard/licenseWizard. By adding a stop condition to this breakpoint (strcmp(utf8(rsi),"com/coreui/app/license/wizard/licenseWizard")), you can pause the program execution at the required class, manually dump it to a file, and then decompile using any decompiler you like.

Reverse engineering makes it possible to locate the place for a one-byte patch: all you have to do to eliminate the license validation procedure is short-circuit the j() method by adding ret (B1) to the bytecode at the offset 1B36 from the beginning of the class. Generally speaking, the goal is achieved – provided that you know how to edit bytecode without recompiling it (hint: use dirtyJOE). But in this particular case, it’s not that simple.

The problem is that you have to edit encrypted code. Analysis of the JavaLoader cryptor performed in IDA shows that it’s encrypted not with XOR, but with an asymmetric algorithm using elliptic curves. Furthermore, since the algorithm is asymmetric, you cannot even reencrypt the edited code if you don’t know the private key.

The situation is tragicomic: you found a way to bypass the license validation procedure, but it only works in the debugger. In other words, you have to pause the program before it loads the required class, decrypt it, then manually patch the bytecode, and them resume its execution. Of course, this process can be automated with a script, but it still looks unsportsmanlike for a self-respecting hacker. So, how to get out of this unpleasant situation?

The most hardcore way is to decrypt ALL classes and kill the cryptor for good. To do so, you have to write a Java app that loads all the classes from the list and a x64dbg script that dumps them to disk. If you have plenty of free time and a strong financial motivation, you can even reverse the decryption algorithm and write your own decryptor without a debugger and Java. This would be an ideal solution: the protector is completely removed, and then project source code is restored (provided that there is no obfuscator – although in this particular case it exists).

Fortunately, such a labor-consuming solution is excessive for your purposes. You need to run the program as simply and quickly as possible. And a simple and elegant way to do so is to inject the patch directly into the agent’s body. After examining the code in IDA for some more time, you find the right place for the patch immediately after the bytecode decryption:

// Bytecode decryption, returns to rax the length of the decrypted block or 0 in case of a failure
18002BF0C E8 0F FA FF FF call sub_18002B920
18002BF11 8B D0 mov edx, eax
18002BF13 85 C0 test eax, eax
18002BF15 7F 10 jg short loc_18002BF27 // If 0, then error
18002BF17 48 8B D5 mov rdx, rbp // rbp - class name
18002BF1A 48 8D 0D 57 76 0E 00 lea rcx, aDecryptionFail ; "Decryption failed: %s\n"
18002BF21 E8 CA FC FF FF call sub_18002BBF0 // Exit the program with an error message
18002BF26 CC db 0CCh
18002BF27 loc_18002BF27:
18002BF27 49 8B 0F mov rcx, [r15] // Add the CAFEBABE signature to bytecode
18002BF2A 48 8B 84 24 A0 00 00 00 mov rax, [rsp+A0] // Pointer to the decrypted bytecode
18002BF32 48 89 08 mov [rax], rcx
18002BF35 83 C2 08 add edx, 8

Since you have no doubt that everything is decrypted correctly, you can omit the decryption correctness validation. After all, is anything is wrong, the program simply won’t work. So, you can put this ‘free space’ a few dozen bytes in size to good use. Another assumption: you don’t have enough space to check the full class name and have to limit this check to the last 4 bytes. Obviously, there is no other class in the project with a 45-byte name that ends with -ard. So, after decrypting the bytecode, you have to check the class name ending with -ard in the given position and, if successful, change the byte at the offset 1B36 in the decrypted bytecode to B1:

18002BF13 | 48:8B8424 A0000000 | mov rax,qword ptr ss:[rsp+A0] // Pointer to the decrypted bytecode
18002BF1B | 817D 2E 61726400 | cmp dword ptr ss:[rbp+2A],647261 // Class name+42=="ard"\0?
18002BF22 | 75 08 | jne javaloader.18002BF2C
18002BF24 | C680 361B0000 B1 | mov byte ptr ds:[rax+1B36],B1 // If yes, change the required byte in the bytecode to ret
18002BF2B | 90 | nop
18002BF2C | 49:8B0F | mov rcx,qword ptr ds:[r15]
18002BF2F | 90 | nop
18002BF30 | 90 | nop
18002BF31 | 90 | nop

Voila! It’s working! You have mastered one of the techniques used to conceal bytecode in Java apps. As you understand, there are plenty of such techniques, and the number of possible variants and modifications depends only on the developer’s twisted imagination. Hopefully, this article will give an inquisitive mind a line of thinking in a nonstandard situation. In the future, I intend to addrees the most interesting such situations in more detail.


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>