
In times gone by, plenty of advertising articles were written about Ruby. In-depth information about its concept and internal structure can be found in a book called Ruby under a microscope. But the purpose of this article is very practical: analyze the reversal of applications written in this exotic language.
Problem statement
Imagine that you have some application that reads and validates a binary file. You have to figure out its validation algorithm. The initial automatic analysis of the application (performed using Detect It Easy) doesn’t bring any meaningful results: the application seems to be compiled from C++ in Microsoft Visual Studio 2015. However, its dynamic libraries include an interesting file x64-msvcr100-ruby200.
, and the executable module features numerous references to it. The file positions itself as a Ruby interpreter written by its creator Yukihiro Matsumoto and even contains a link to the respective website.

In addition, the program includes some two and a half thousand *.
files; a closer examination shows that they represent unprotected Ruby scripts in text format. As you may have guessed, there is a catch: none of these files contain calls to the target file, which means that you have to load the program to the x64dbg debugger and search for a call to this file in the dynamics of the interpreted code.
Examining the program
Let’s follow the standard procedure (you might be familiar with it from my previous articles). Set a breakpoint on the function that reads the file kernel32.
. Of course, when the program loads, there are thousands of similar calls to all sorts of configuration files and libraries contained in the package; so, to make this task easier, let’s monitor the file reading process using the Process Monitor (ProcMon) program. Thanks to this tool, you can notice that, unlike all other files, the target file is read in blocks 0x10000
bytes in size. Therefore, to filter out all unwanted reading operations, you can set Size==0x10000
as a breakpoint condition.
Start the program and wait until it reaches the desired point. Now let’s analyze the call stack of the generated ReadFile
.

The top five calls represent the native file reading harness and are of no interest. But below you can clearly see two nested interpreter calls that could be familiar to you from the analysis of virtual machines of other scripting languages. To avoid repeating what I have stated many times in earlier articles, I will try to be brief. Even the most hardcore scripting language interpreters (e.g. Python) never analyze text semantics during the execution. To optimize the interpreter performance when a module (class, object, method, etc.) is loaded, the text is compiled into native or threaded byte code; this process is called JIT (just-in-time) compilation or dynamic compilation.
And Ruby, despite its self-proclaimed extravagance, performs exactly the same operation (more information on JIT compilation for different Ruby implementations can be found on the website patshaughnessy.net). Now let’s get back to the virtual machine under investigation. If you examine nested stack calls in the above screenshot, you’ll identify the main instruction fetch cycle for the threaded code (marked with arrows). In IDA, the code of this procedure (sub_18001B6E0
; the exported function is rb_vm_get_insns_address_table
) looks as follows.

As you can see, the opcode is 8 bytes in size; the virtual machine contains 83 assembler commands; and the above-mentioned function contains implementations of each of these commands. It turns out that the interpreter library even contains a disassembler for the compiled bytecode (the exported rb_iseq_disasm_insn
and rb_iseq_disasm
functions). Analysis of these functions makes it possible to find a command mnemonic table located at 180200500
:
00 nop
01 getlocal
02 setlocal
03 getspecial
04 setspecial
05 getinstancevariable
06 setinstancevariable
07 getclassvariable
08 setclassvariable
09 getconstant
0A setconstant
0B getglobal
0C setglobal
0D putnil
0E putself
0F putobject
10 putspecialobject
11 putiseq
12 putstring
13 concatstrings
14 tostring
15 toregexp
16 newarray
17 duparray
18 expandarray
19 concatarray
1A splatarray
1B newhash
1C newrange
1D pop
1E dup
1F dupn
20 swap
21 reput
22 topn
23 setn
24 adjuststack
25 defined
26 checkmatch
27 trace
28 defineclass
29 send
2A opt_send_simple
2B invokesuper
2C invokeblock
2D leave
2E throw
2F jump
30 branchif
31 branchunless
32 getinlinecache
33 onceinlinecache
34 setinlinecache
35 opt_case_dispatch
36 opt_plus
37 opt_minus
38 opt_mult
39 opt_div
3A opt_mod
3B opt_eq
3C opt_neq
3D opt_lt
3E opt_le
3F opt_gt
40 opt_ge
41 opt_ltlt
42 opt_aref
43 opt_aset
44 opt_length
45 opt_size
46 opt_empty_p
47 opt_succ
48 opt_not
49 opt_regexpmatch1
4A opt_regexpmatch2
4B opt_call_c_function
4C bitblt
4D answer
4E getlocal_OP__WC__0
4F getlocal_OP__WC__1
50 setlocal_OP__WC__0
51 setlocal_OP__WC__1
52 putobject_OP_INT2FIX_O_0_C_
53 putobject_OP_INT2FIX_O_1_C_
A closer examination makes it possible to identify the bytecode interpreter type: YARV.
Generating an error
So, you’ve identified the interpreter and found the byte code from where the target file is read. What’s next? It would be boring to reverse the file operation by analyzing the compiled byte code, even with a built-in disassembler. The best way is to find the text source of the Ruby script (although it cannot be ruled out that the interpreter operates with artificially compiled byte code). But for now, let’s put such extreme options aside. Instead, let’s try to determine the name of the method that initiates file reading.
The first way to do this quickly and without painstaking code analysis that comes to mind is rather primitive: generate an error in the hope that the interpreter prints the call and return stack in the error report. A brief examination of the interpreter loop code shows the following: when an invalid opcode (>
) appears, the interpreter generates an error and even prints the stack when such an occurs.

Let’s simulate such a situation. As soon as you reach the breakpoint set on reading the 0x10000
block, you disable this breakpoint and set breakpoints on the next threaded code instruction fetching (on the second last screenshot, it’s the highlighted address 18001B722
) and on the printf
function (that prints the debugged stack). When the program reaches the next instruction, change it to an invalid one (e.g. 84
) and see what printf
prints. And this approach works: in addition to obscure debugging information about blocks and frames at the time when the error occurs, it outputs the closest class and method.

However, this information seems to be useless: most probably, reading occurs when eval
from the HTMLExportDialog
class is called, and, as bad luck would have it, this class isn’t available in plain code.
Searching for alternatives
Let’s try to approach the issue from a different angle and analyze the call stack shown in the screenshot at the beginning of the article once again. The thread under investigation ‘stems’ from the function call from ntdll
, and it’s clear that the examined thread was started by some other thread. Going through the adjacent threads, you can find a similar candidate: this thread is quietly waiting for the end of the rb_eval_string_protect
("PM::
) function call.

At first glance, this discovery is also useless: as you might have guessed, neither PM nor RubyBridge are available in plain code. However, now you have a working hypothesis (and this hypothesis is partially supported by the application code analysis in IDA): the application communicates with the Ruby interpreter by calling rb_eval_string_protect
.
Therefore, let’s remove all previous breakpoints and set a new breakpoint at the entry point of this function. After restarting the program, you’ll immediately notice plenty of interesting things: multiple calls in the PM.
format where ModuleName
refers to obviously encrypted files with the *.
extension, including RubyBridge
, HTMLExportDialog
, and many others. And most importantly, you’ve finally found the PM.
implementation code that decrypts such files.

In other words, to protect their creation from evil hackers, the cunning developers have encrypted both the code and modules it decrypts using an asymmetric RSA algorithm, thus, making it impossible to change anything inside the script without a secret key. Fortunately, you don’t have to do this. To solve the task, all you have to do is find the validation algorithm for a given file, and this algorithm is hidden in one of the RSA encrypted scripts.
Furthermore, the major and most difficult part of the task has already been completed: you know the encryption algorithm for protected scripts, and the rest is paperwork. You can extract the public encryption key from the code and write a module decryption procedure; or you can set a breakpoint on decryption of the native RSA implementation library and dump the decrypted code… But as usual, let’s choose the line of least resistance.
In the previous screenshot, you can see that evx
is implemented via eval
; so, let’s set a breakpoint on the function x64-msvcr100-ruby200.
– and at its entry point, you get the decrypted code of the modules on a silver platter. A few simple manipulations – and you find the ultimate goal: file validation algorithm.

Conclusions
It’s unlikely that you encounter such a strangely implemented and cunningly protected application in real life. As said above, scripting languages are intended not for commercial packages: normally, they quietly do their job on closed servers. Still, the above-described experience would be useful should you decide to master virtual machines and interpreters or even design your own ones.
Good luck!

2023.02.21 — Pivoting District: GRE Pivoting over network equipment
Too bad, security admins often don't pay due attention to network equipment, which enables malefactors to hack such devices and gain control over them. What…
Full article →
2022.02.09 — First contact: An introduction to credit card security
I bet you have several cards issued by international payment systems (e.g. Visa or MasterCard) in your wallet. Do you know what algorithms are…
Full article →
2023.02.21 — Herpaderping and Ghosting. Two new ways to hide processes from antiviruses
The primary objective of virus writers (as well as pentesters and Red Team members) is to hide their payloads from antiviruses and avoid their detection. Various…
Full article →
2022.01.13 — Bug in Laravel. Disassembling an exploit that allows RCE in a popular PHP framework
Bad news: the Ignition library shipped with the Laravel PHP web framework contains a vulnerability. The bug enables unauthorized users to execute arbitrary code. This article examines…
Full article →
2023.01.22 — Top 5 Ways to Use a VPN for Enhanced Online Privacy and Security
This is an external third-party advertising publication. In this period when technology is at its highest level, the importance of privacy and security has grown like never…
Full article →
2022.06.01 — First contact. Attacks on chip-based cards
Virtually all modern bank cards are equipped with a special chip that stores data required to make payments. This article discusses fraud techniques used…
Full article →
2022.06.01 — Log4HELL! Everything you must know about Log4Shell
Up until recently, just a few people (aside from specialists) were aware of the Log4j logging utility. However, a vulnerability found in this library attracted to it…
Full article →
2023.03.03 — Nightmare Spoofing. Evil Twin attack over dynamic routing
Attacks on dynamic routing domains can wreak havoc on the network since they disrupt the routing process. In this article, I am going to present my own…
Full article →
2023.04.20 — Sad Guard. Identifying and exploiting vulnerability in AdGuard driver for Windows
Last year, I discovered a binary bug in the AdGuard driver. Its ID in the National Vulnerability Database is CVE-2022-45770. I was disassembling the ad blocker and found…
Full article →
2022.12.15 — What Challenges To Overcome with the Help of Automated e2e Testing?
This is an external third-party advertising publication. Every good developer will tell you that software development is a complex task. It's a tricky process requiring…
Full article →