A novice programmer who has learned something about Java has the false impression that coding in this language is very simple. A novice hacker might get the same impression about cracking Java programs. The job seems to be easy: you take a ZIP archiver, unpack JAR files, choose a decompiler to your taste, and decompile the resultant CLASS files (either one at a time or the entire project at once). Voila! You’ve got the source code of the project on a silver platter!
warning
This article is intended for security specialists operating under a contract; all information provided in it is for educational purposes only. Neither the author nor the Editorial Board can be held liable for any damages caused by improper usage of this publication. Distribution of malware, disruption of systems, and violation of secrecy of correspondence are prosecuted by law.
Of course, sometimes you have to fiddle around with obfuscation or JVM bytecode. Still, it’s much less boring than dealing with native code protected by Themida or even dotnet apps…
But in fact, Java is no easier than the above-mentioned technologies, and such simple cases occur pretty rarely. This article describes one of the techniques used to protect Java code from decompiling and explains how to circumvent it.
Imagine, for instance, that you deal with some graphical program whose license is validated on a remote server at startup. If there is no valid license or the server is unavailable, the program kindly asks you to try again or closes. For training purposes, let’s try to make it work even if the license cannot be validated.
Upon closer examination, you notice that the program’s executable module is a simple Java Runtime Environment loader; it calls javaw using a very long command line that contains a list of JAR modules and libraries. At the end of this list, you see the name of the main class.
You run a JAR search and find the class file. Too bad, all the decompilers at your disposal refuse to work with this file, and even dirtyJOE doesn’t support its format. You open the file in a HEX editor and realize that dirtyJOE is 100% right: instead of a normally compiled CLASS file, you see only the CAFEBABE
signature and high-entropy white noise of packed or encrypted data.
Apparently, when the program is executed, something interferes into the JVM bytecode loading process and decrypts it on the fly. But how can this be implemented in Java?
To answer this question, some Java machine theory is required. Since it’s as cross-platform as .NET (in fact, even more cross-platform in a certain way), JVM bytecode is not interpreted, but compiled once into platform-specific native code when the class is loaded. This process is called JIT (just in time) compilation. So, similar to .NET, you can attach the x64dbg debugger to the javaw.
process while the program is running; this allows you to debug the compiled native code.
It must be admitted though that this process is very labor-consuming since the compiled code looks like a nightmare (unlike, for instance, a JIT compilation output of the above-mentioned .NET). It’s highly optimized, multithreaded, and nimble, but extremely unfriendly from the reversing perspective.
Of course, some leads are present there. For instance, if you examine the jvm.
, java.
, jli.
, and other modules, you’ll find there many standard basic functions that simplify the debugging process. Maybe someday I address them in another article… A lot of useful materials on this topic can be found on the Internet, for instance, Understanding How Graal Works – a Java JIT Compiler Written in Java. But right now your goal is to understand the bytecode encryption and decryption process and restore the Java source code from the encrypted one. In .NET, it’s possible to find the entry point of the JIT compiler; so, let’s try to perform this operation in Java as well.
Again, I have to present some theory without getting into specifics. Usually, two main interfaces are used to implement bytecode substitution in the JVM, and each of these interfaces utilizes its own approach. The first interface is called JVMCI (JVM compiler interface); it’s used to connect the native Java JIT compiler (that is written in Java as well). For obvious reasons, this variant is useless in this particular case (since all classes, starting with the main one, are encrypted).
The second one, JVM Tool Interface (JVMTI), seems to be exactly what you need, so let’s take a closer look at it. JVMTI is a useful interface designed for interaction with the JVM virtual machine. It enables you to extend the VM functionality without altering the code. Full description of this tool is beyond the scope of this article; for more information, see the official documentation.
All useful features of this interface are implemented through the so-called agents (i.e. external plugins). They have many functions, but the main thing is that they give you full access to the loaded bytecode and control over it. This is exactly what you need. Agents are loaded from javaw, and you have to specify special parameters in the manifest or in the command line. For instance, the most common agent type is javaagent. Being written in Java, such agents are enabled using the appropriate javaagent:
command line. This agent type also has full access to the bytecode, and its substitution is often used for code obfuscation and modification – but not in this particular case.
The app under investigation uses a native JVMTIAgent; to detect its call, you have to review the command line and find the following parameter in it: -agentlib:
. JVMTIAgent is a dynamic library (in Windows, it’s a DLL; in Linux, SO), and the following functions are exported from it:
-
JNIEXPORT
– this function is called when an agent starts if this agent it specified in the command line parametersjint JNICALL Agent_OnLoad( JavaVM *vm, char *options, void *reserved) ; -agentpath:
or-agentlib:
(as in this case); -
JNIEXPORT
– this function is called if the agent isn’t loaded at startup; in such a case, you first connect to the target process and then send a command to the respective target process to load the agent; andjint JNICALL Agent_OnAttach( JavaVM* vm, char* options, void* reserved) ; -
JNIEXPORT
– this function is optionally called to shutdown an agent.void JNICALL Agent_OnUnload( JavaVM *vm) ;
You have already found in the work directory a dynamic library with the original name JavaLoader.
; so, you load it into the IDA disassembler, find the functions that are of interest to you, and confirm that this is indeed JVMTIAgent that decrypts the bytecode when the required class is loaded. But how does it do this?
Time to open the above-mentioned JVM Tool Interface specification again. The general operation principle of an agent is as follows: it installs its own custom callback handlers for certain events. In this particular case, you are interested in the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK
event that is called immediately after loading the bytecode array of the required class from the file, but before JIT compilation of this class. The installation of such a handler is implemented as follows:
JNIEXPORT jint JNICALL Agent_OnLoad(JavaVM *vm, char *options, void *reserved) { jvmtiEventCallbacks callbacks; jvmtiEnv * jvmtienv = jvmti(agent); jvmtiError jvmtierror; memset(&callbacks, 0, sizeof(callbacks)); callbacks.ClassFileLoadHook = &eventHandlerClassFileLoadHook; // New handler for the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event jvmtierror = (*jvmtienv)->SetEventCallbacks( jvmtienv, &callbacks, sizeof(callbacks)); // Install handlers jvmtierror = (*jvmtienv)->SetEventNotificationMode(jvmtienv, JVMTI_ENABLE, JVMTI_EVENT_CLASS_FILE_LOAD_HOOK, (jthread)NULL); // Permit handling of the JVMTI_EVENT_CLASS_FILE_LOAD_HOOK event
You examine the code of the Agent_OnLoad
procedure in IDA and locate the required place.
As you can see, the local variable [
includes the callbacks
structure that is cleared at the address 18002C0F8
. Then the ClassFileLoadHook
handler is pushed into it at 18002C102
, and finally the SetEventCallbacks
call is made: call
. Congrats! You’ve found a callback function that decrypts the class bytecode prior to its compilation. In this particular case, it’s sub_18002BDC0
. The description of this handler provided in the specification is as follows:
void JNICALLeventHandlerClassFileLoadHook( jvmtiEnv * jvmtienv, JNIEnv * jnienv, jclass class_being_redefined, jobject loader, const char* name, // Class name jobject protectionDomain, jint class_data_len, // Class size const unsigned char* class_data, // Encrypted bytecode loaded from file jint* new_class_data_len, // Size of decrypted data unsigned char** new_class_data // Decrypted data)
So, you run the program in the x6dbg debugger and set a breakpoint at this handler. The breakpoint will be triggered every time a class is loaded and decrypted. At the entry to the class, the name
parameter (register rsi
) will show the class name, and at the end, the handler will save the decoded bytecode in new_class_data
that can be dumped and decompiled. The rest is paperwork: first of all, enable the logging of loaded classes (log text {
).
Now the sequence of classes loaded by the program is recorded in the debugger log; you can use it to find the class that sends the license validation request to the remote server. Imagine that you’ve found this class, and it’s called, let’s say, com/
. By adding a stop condition to this breakpoint (strcmp(
), you can pause the program execution at the required class, manually dump it to a file, and then decompile using any decompiler you like.
Reverse engineering makes it possible to locate the place for a one-byte patch: all you have to do to eliminate the license validation procedure is short-circuit the j(
method by adding ret (
to the bytecode at the offset 1B36
from the beginning of the class. Generally speaking, the goal is achieved – provided that you know how to edit bytecode without recompiling it (hint: use dirtyJOE). But in this particular case, it’s not that simple.
The problem is that you have to edit encrypted code. Analysis of the JavaLoader cryptor performed in IDA shows that it’s encrypted not with XOR, but with an asymmetric algorithm using elliptic curves. Furthermore, since the algorithm is asymmetric, you cannot even reencrypt the edited code if you don’t know the private key.
The situation is tragicomic: you found a way to bypass the license validation procedure, but it only works in the debugger. In other words, you have to pause the program before it loads the required class, decrypt it, then manually patch the bytecode, and them resume its execution. Of course, this process can be automated with a script, but it still looks unsportsmanlike for a self-respecting hacker. So, how to get out of this unpleasant situation?
The most hardcore way is to decrypt ALL classes and kill the cryptor for good. To do so, you have to write a Java app that loads all the classes from the list and a x64dbg script that dumps them to disk. If you have plenty of free time and a strong financial motivation, you can even reverse the decryption algorithm and write your own decryptor without a debugger and Java. This would be an ideal solution: the protector is completely removed, and then project source code is restored (provided that there is no obfuscator – although in this particular case it exists).
Fortunately, such a labor-consuming solution is excessive for your purposes. You need to run the program as simply and quickly as possible. And a simple and elegant way to do so is to inject the patch directly into the agent’s body. After examining the code in IDA for some more time, you find the right place for the patch immediately after the bytecode decryption:
// Bytecode decryption, returns to rax the length of the decrypted block or 0 in case of a failure18002BF0C E8 0F FA FF FF call sub_18002B92018002BF11 8B D0 mov edx, eax18002BF13 85 C0 test eax, eax18002BF15 7F 10 jg short loc_18002BF27 // If 0, then error18002BF17 48 8B D5 mov rdx, rbp // rbp - class name18002BF1A 48 8D 0D 57 76 0E 00 lea rcx, aDecryptionFail ; "Decryption failed: %s\n"18002BF21 E8 CA FC FF FF call sub_18002BBF0 // Exit the program with an error message18002BF26 CC db 0CCh18002BF27 loc_18002BF27:18002BF27 49 8B 0F mov rcx, [r15] // Add the CAFEBABE signature to bytecode18002BF2A 48 8B 84 24 A0 00 00 00 mov rax, [rsp+A0] // Pointer to the decrypted bytecode18002BF32 48 89 08 mov [rax], rcx18002BF35 83 C2 08 add edx, 8
Since you have no doubt that everything is decrypted correctly, you can omit the decryption correctness validation. After all, is anything is wrong, the program simply won’t work. So, you can put this ‘free space’ a few dozen bytes in size to good use. Another assumption: you don’t have enough space to check the full class name and have to limit this check to the last 4 bytes. Obviously, there is no other class in the project with a 45-byte name that ends with -ard
. So, after decrypting the bytecode, you have to check the class name ending with -ard
in the given position and, if successful, change the byte at the offset 1B36
in the decrypted bytecode to B1
:
18002BF13 | 48:8B8424 A0000000 | mov rax,qword ptr ss:[rsp+A0] // Pointer to the decrypted bytecode18002BF1B | 817D 2E 61726400 | cmp dword ptr ss:[rbp+2A],647261 // Class name+42=="ard"\0?18002BF22 | 75 08 | jne javaloader.18002BF2C18002BF24 | C680 361B0000 B1 | mov byte ptr ds:[rax+1B36],B1 // If yes, change the required byte in the bytecode to ret18002BF2B | 90 | nop18002BF2C | 49:8B0F | mov rcx,qword ptr ds:[r15]18002BF2F | 90 | nop18002BF30 | 90 | nop18002BF31 | 90 | nop
Voila! It’s working! You have mastered one of the techniques used to conceal bytecode in Java apps. As you understand, there are plenty of such techniques, and the number of possible variants and modifications depends only on the developer’s twisted imagination. Hopefully, this article will give an inquisitive mind a line of thinking in a nonstandard situation. In the future, I intend to addrees the most interesting such situations in more detail.