Puzzle solving. Writing custom JavaScript deobfuscator

Today, I am going to demonstrate that JavaScript obfuscation can be removed even in situations when sophisticated deobfuscators are useless. You will learn an effective research technique that can be applied to obfuscated code and write your own deobfuscator.

In my humble opinion, script obfuscation is a much more exciting area than assembler and other low-level magic. Plenty of ready-made deobfuscators are available online, but what if none of them can help you in a specific situation? Under such circumstances, you have no choice but to remove obfuscation yourself, and this article explains how this can be done.

Automatic deobfuscators

Let’s take some JavaScript web app as an example. It’s some three megabytes in size and mostly consists of severely obfuscated code that begins as shown below.

The end of the code looks as follows.

The characteristic names of identifiers (_0x58cd18, _0x2f8935_0x321d33, _0x1e0595) imply that this code has been obfuscated using obfuscator.io. However, all attempts to deobfuscate it using this standard online deobfuscator fail regardless of the settings used: readable code doesn’t appear in the right window.

Deobfuscation attempts involving other publicly available tools, including the de4js universal deobfuscator, don’t produce meaningful results either.

It seems that automatic deobfuscators are useless, and you have to do the job manually.

info

Of course, available automatic deobfuscation tools are not limited to the above-mentioned programs. For instance, you could try to optimize the code using Llama or employ the webcrack project that can automatically deobfuscate such code… But let’s pretend that all automatic tools have failed – such a situation is much more interesting!

Manual deobfuscation

First of all, let’s apply JS Beautifier to the raw code to make it more readable. If you go through the now-structured code, you’ll notice numerous function calls with ten hexadecimal constants acting as parameters:

_0x4a0111(0xa0c, 0xc0b, 0x13ef, 0x1e3e, 0x15e2, 0x1a29, 0x1b08, 0x94b, 0x968, 0x753)
_0x114a88(-0x6d, 0x126c, 0x621, 0xa59, -0x5f2, 0x72a, 0x6cf, 0x8a, 0xbf9, -0x4b4)
_0x27e22f(0xbf2, 0x670, 0xe4, 0x132e, 0x1267, 0xbf9, -0xb6, 0x697, 0x51f, 0x6da)
_0x1e51ce(0xe58, 0x1b5e, 0x2457, 0x191a, 0x224a, 0x133c, 0xf61, 0x1c11, 0x128d, 0xc77)
_0x33055f(0x16f4, 0x1704, 0xbaf, 0x231d, 0x163e, 0x161a, 0xca1, 0x15ba, 0x1c3f, 0x1649)
_0x1b164c(0x485, -0x398, 0x1e0, 0xf51, 0xcdd, 0x2de, 0xfea, 0x82f, -0x54a, 0x37)
...

It’s logical to assume that these are encrypted constants that should be somehow translated into a readable form. As usual, let’s start from the end. The code ends with the following fragment:

...
} catch (_0x321d33) {
        console[_0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073)](_0x321d33);
    }
});

Logic suggests that this is a console.log(_0x321d33) (i.e. _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log"). Let’s try to restore meat from mince by looking for the string function _0x1e0595 in the code:

function _0x1e0595(_0x4bc581, _0x4ecbba, _0x1d5a39, _0x50dcae, _0x403da8, _0x4ad34e, _0x2b446e, _0x3b51da, _0x44854e, _0x1491c6) {
    return _0x340121(_0x1491c6 - 0x5ff, _0x4ecbba - 0xb8, _0x1d5a39 - 0xb2, _0x50dcae - 0x81, _0x403da8 - 0x80, _0x4ad34e - 0x1b, _0x2b446e - 0x99, _0x44854e, _0x44854e - 0x72, _0x1491c6 - 0x10f);
}

As you can see, it refers to another identifier called _0x340121. Let’s find it, too:

function _0x340121(_0x5ae465, _0x101079, _0x1d662f, _0x55f16c, _0x4029db, _0x3a7a06, _0x1d53e1, _0x5b0eb3, _0x4c47fe, _0x445726) {
    return _0x3a86(_0x5ae465 - -0x26c, _0x5b0eb3);
}

_0x340121, in turn, refers to something called _0x3a86. Fortunately, this is the last (or more precisely, first) link in this chain:

function _0x3a86(_0x37610f, _0x5cbb3a) {
    const _0x1214fd = _0x5e2d();
    return _0x3a86 = function(_0x3aa59b, _0x1ad7b1) {
        _0x3aa59b = _0x3aa59b - (-0x10 * -0x80 + 0x69 * -0x1 + 0xa * -0xb5);
        _0x4072cc = _0x1214fd[_0x3aa59b];
        return _0x4072cc;
    }, _0x3a86(_0x37610f, _0x5cbb3a);
}

So far, everything is simple. All that remains is to find the array of string constants returned by _0x5e2d:

function _0x5e2d(){
  const _0x552e21=['t?id=','mjSUq','wcYjB','pljgg','ct:\x20<','accou','nlMqS',
                   ...
                   'se,\x22s','xESCJ'];
  _0x5e2d=function(){
      return _0x552e21;
  };
  return _0x5e2d();
}

It seems that the minimum piece of code that generates obfuscated strings using the _0x1e0595 function has been isolated:

function _0x5e2d(){
  const _0x552e21=['t?id=','mjSUq','wcYjB','pljgg','ct:\x20<','accou','nlMqS',
                   ...
                   'se,\x22s','xESCJ'];
  _0x5e2d=function(){
      return _0x552e21;
  };
  return _0x5e2d();
}
function _0x3a86(_0x37610f, _0x5cbb3a) {
    const _0x1214fd = _0x5e2d();
    return _0x3a86 = function(_0x3aa59b, _0x1ad7b1) {
        _0x3aa59b = _0x3aa59b - (-0x10 * -0x80 + 0x69 * -0x1 + 0xa * -0xb5);
        _0x4072cc = _0x1214fd[_0x3aa59b];
        return _0x4072cc;
    }, _0x3a86(_0x37610f, _0x5cbb3a);
}
function _0x340121(_0x5ae465, _0x101079, _0x1d662f, _0x55f16c, _0x4029db, _0x3a7a06, _0x1d53e1, _0x5b0eb3, _0x4c47fe, _0x445726) {
    return _0x3a86(_0x5ae465 - -0x26c, _0x5b0eb3);
}
function _0x1e0595(_0x4bc581, _0x4ecbba, _0x1d5a39, _0x50dcae, _0x403da8, _0x4ad34e, _0x2b446e, _0x3b51da, _0x44854e, _0x1491c6) {
    return _0x340121(_0x1491c6 - 0x5ff, _0x4ecbba - 0xb8, _0x1d5a39 - 0xb2, _0x50dcae - 0x81, _0x403da8 - 0x80, _0x4ad34e - 0x1b, _0x2b446e - 0x99, _0x44854e, _0x44854e - 0x72, _0x1491c6 - 0x10f);
}

Later, such code searches for each similar function should be automated somehow (despite their large number, all of them are of the same type). But for now, let’s make sure that everything is correct. Press F12 in the browser and paste the isolated code fragment to the console to calculate the expression _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073).

At this point, you can see that the returned string is not log, but quite the opposite: awal (although the log string is also present in the original array). This means that either the above calculations were incorrect, or the authors of this obfuscator are smarter than one could expect…

Let’s examine the code in more detail. The upper code fragment starting with the comment IT IS NOT SAFE TO MAKE CHANGES IN THE CODE BELOW intricately shuffles the array of string constants _0x552e21 after its initialization.

Interestingly, the while condition contains the (!![])==true construct. A closer examination reveals that such constants, together with the inverse variant, (![])==false, frequently occur throughout the obfuscated code. Make a note to yourself to replace them in the code using global replacement, then insert the fragment shown in the previous screenshot at the beginning of the ‘core’ code, and run the test again. This time, everything matches, and the result is correct: _0x1e0595(0x4f3, 0x854, 0x1210, 0x19e1, 0x3ca, 0x992, 0x665, 0xf98, 0x185b, 0x1073) == "log".

This is where the exciting and breathtaking research stage ends and routine coding begins.

Writing deobfuscator

The plan is as follows.

Similar to the above-described analysis of the _0x1e0595 function, let’s form the ‘core’ of functions that will decode string constants. To do this, you have to search for all functions that match the template:

function _0x??????(_0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????) {
        return _0x3a86(??????, ??????);
    }

Using regular expressions, this can be implemented in JavaScript as follows:

...
var reg = /function (_0x[a-f0-9]*)\(_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*\)\{return _0x3a86\([^)]*\);\}/g;
var functions = [], found,names=[];
while (found = reg.exec(string)) {
    functions.push(found[0]);
    names.push(found[1]);
}
...

where string is the source code; the output is functions (i.e. function code that should be added to the ‘core’ code and concurrently removed from the source code); and names is the list of function names.

Let’s search for functions having the following format:

function _0x??????(_0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????, _0x??????) {
        return <Name1>(??????, ??????,??????, ??????,??????, ??????,??????, ??????,??????, ??????);
    }

where Name1 is the function name from the names list obtained in step 1. The code looks as follows:

var reg = /function (_0x[a-f0-9]*)\(_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*\)\{return _0x3a86\([^)]*\);\}/g;
var functions = [], found,names=[];
while (found = reg.exec(string)) {
    functions.push(found[0]);
    names.push(found[1]);
}
var functions1 = [], names1=[];
for (var i=0;i<names.length;i++)
{
    var reg1 = new RegExp("function (_0x[a-f0-9]*)\\(_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*\\)\\{return "+names[i]+"\\([^)]*\\);\\}", "g");
    while (found = reg1.exec(string)) {
       functions1.push(found[0]);
       names1.push(found[1]);
     }
}

As a result, you get new lists containing functions (functions1) and their names (names1) that are also added to the core and removed from the source code.

This procedure (i.e. search for functions, and every time the names list is substituted with the names1 list obtained in the previous step) is repeated until the list becomes empty at the next step. The final code looks something like this:

var reg = /function (_0x[a-f0-9]*)\(_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*\)\{return _0x3a86\([^)]*\);\}/g;
var functions = [], found,names=[];
while (found = reg.exec(string)) {
    functions.push(found[0]);
    names.push(found[1]);
}
var names2=names.slice();
while (true)
{
  var functions1 = [], names1=[];
  for (var i=0;i<names2.length;i++)
 {
    var reg1 = new RegExp("function (_0x[a-f0-9]*)\\(_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*,_0x[a-f0-9]*\\)\\{return "+names2[i]+"\\([^)]*\\);\\}", "g");
    while (found = reg1.exec(string)) {
       functions1.push(found[0]);
       names1.push(found[1]);
        functions.push(found[0]);
        names.push(found[1]);
     }
  }
  if (names1.length==0) break;
  names2=names1.slice();
}

At the output, functions are all the parasitic functions generated by the obfuscator; while names are their names.

Now that the ‘core’ is formed, you simply iterate over all the enumerable expressions in the format:

<Name1>(??????,??????,??????,??????,??????,??????,??????,??????,??????,??????)

Name1 is the function name from the names list obtained at previous stages. These names can be calculated using the following code:

var expressions=[];
var values=[];
for (var i=0;i<names.length;i++)
 {
    var reg1 = new RegExp(names[i]+"\\([^)]*,[^)]*,[^)]*,[^)]*,[^)]*,[^)]*[^)]*,[^)]*,[^)]*,[^)]*\\)", "g");
    while (found = reg1.exec(string)) {
       var test=found[0];
       var value=undefined;
       try
       {
         value=eval(test);
         expressions.push(found[0]);
         values.push(value);
        } catch (err) {}
       }
  }

The output represents an expressions array containing enumerable expressions in the format _0x4a0111(0xa0c, 0xc0b, 0x13ef, 0x1e3e, 0x15e2, 0x1a29, 0x1b08, 0x94b, 0x968, 0x753) and respective constants. All you have to do is replace the former ones in the obfuscated code with the latter ones using global replacement.

As a result, you get partially deobfuscated code where at least string constants and names of standard methods are presented in an explicit form. This code can be analyzed and edited; to make it more readable, some of its parts can be loaded to other deobfuscators.

Of course, it isn’t that the original app has been completely deobfuscated. To refine its code further, expressions like Class["MethodName"] can be converted to Class.MethodName. Below is a slightly more advanced version of the code in Object.MethodName:

const _0x17ef7d = {};
_0x17ef7d[MethodName]

Finally, you can perform a series of dictionary transformations like these:

const _0x45618a = {
            'aJqDi': function(_0x2b031d, _0x1fc5a3) {
                return _0x2b031d(_0x1fc5a3);
            },
            'FDTmk': function(_0x542b4c, _0x55bd01) {
                return _0x542b4c(_0x55bd01);
            },
            ...

Ultimately, you’ll get almost readable code missing only the original names of variables and functions.

Conclusions

As usual, I fibbed a little and described the simplest and fastest technique that can be used to partially deobfuscate JavaScript code. Of course, to write a fully-functional deobfuscator, a simple search and replacement of regular expressions won’t be enough. Generally speaking, you have to write your own JavaScript machine emulator (similar to the one implemented in the above-mentioned webcrack deobfuscator). In future articles, I intend to cover this topic; in the meanwhile, you can examine the source code of this project on your own: it contains plenty of interesting stuff.

Good luck in your endeavors!

2022.06.03 — Playful Xamarin. Researching and hacking a C# mobile app

Java or Kotlin are not the only languages you can use to create apps for Android. C# programmers can develop mobile apps using the Xamarin open-source…

Full article →

2023.03.26 — Poisonous spuds. Privilege escalation in AD with RemotePotato0

This article discusses different variations of the NTLM Relay cross-protocol attack delivered using the RemotePotato0 exploit. In addition, you will learn how to hide the signature of an…

Full article →

2022.02.15 — First contact: How hackers steal money from bank cards

Network fraudsters and carders continuously invent new ways to steal money from cardholders and card accounts. This article discusses techniques used by criminals to bypass security…

Full article →

2022.06.01 — Quarrel on the heap. Heap exploitation on a vulnerable SOAP server in Linux

This paper discusses a challenging CTF-like task. Your goal is to get remote code execution on a SOAP server. All exploitation primitives are involved with…

Full article →

2023.06.08 — Croc-in-the-middle. Using crocodile clips do dump traffic from twisted pair cable

Some people say that eavesdropping is bad. But for many security specialists, traffic sniffing is a profession, not a hobby. For some reason, it's believed…

Full article →

2023.03.03 — Nightmare Spoofing. Evil Twin attack over dynamic routing

Attacks on dynamic routing domains can wreak havoc on the network since they disrupt the routing process. In this article, I am going to present my own…

Full article →

2022.06.03 — Vulnerable Java. Hacking Java bytecode encryption

Java code is not as simple as it seems. At first glance, hacking a Java app looks like an easy task due to a large number of available…

Full article →

2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security

This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…

Full article →

2022.04.04 — Fastest shot. Optimizing Blind SQL injection

Being employed with BI.ZONE, I have to exploit Blind SQL injection vulnerabilities on a regular basis. In fact, I encounter Blind-based cases even more frequently…

Full article →

2023.07.29 — Invisible device. Penetrating into a local network with an 'undetectable' hacker gadget

Unauthorized access to someone else's device can be gained not only through a USB port, but also via an Ethernet connection - after all, Ethernet sockets…

Full article →