Chum Bucket. How I hacked a 20-billion corporation using a free service

As you are likely aware, data breaches occur on a regular basis in this wild world. Each such incident is preceded by painstaking work: information collection and analysis, identification of security holes, selection of attack tools, etc. Today, I will reveal to our readers how I hacked the $20-billion TUI Group using publicly available free tools and my own wits.

Intelligence collection is the first stage in any pentesting research; it is performed right after the goal identification. Over time, this phase became an independent scientific discipline encompassing various tactics, methods, tools, and services simplifying the routine procedures. This opens a number of opportunities to hackers, including:

  1. Hunting without a specific target or ’employer’;
  2. Searches for new large-scale threats;
  3. Quick prevalence assessments for specific threats; and
  4. Netstalking and studies performed for just fun.

Further in the text, you will find the list of the most popular services enabling you to implement full-scale research projects and basic techniques used to deal with these services. However, the main topic of this article is a service called the one that enabled me to hack TUI Group.


Information collection services can be conditionally divided into several categories based on the type of collected data and the application sphere. The most popular ones are:

  • Certificate Transparency – a registry of digital certificates (including new ones) bound to domain and subdomain names;
  • Chaos,,,, Сertspotter,, and – databases containing domain names, certificates, and other domain-related data;
  • OpenIntel – a project for monitoring of the global domain name system;
  • Internet-Wide Scan Data Repository – a public repository of scan results for Internet protocols and services across the Internet. The repository is hosted by the ZMap team. In addition to its own data sets, the team publishes data received from other similar projects. The resource provides an excellent opportunity to work with raw and comprehensive data sets;
  • Rapid7 OpenData – a similar project implemented by the developers of Metasploit Framework;
  • Shodan, Zoomeye, Censys,,,, and – search engines able to explore nearly the entire Internet topology; using them, you can search by service and protocol banners, their hashes, and content of HTML pages. It is also possible to find devices connected to a specific network or running applications of various types. The last version of Shodan even includes the possibility to search by CVE ID;
  • CommonCrawl – the repository of a multifunctional web crawler that collects plenty of interesting information;
  • GreyNoise, BinaryEdge,, and – a reservoir of knowledge on current threats. If you aren’t sure what to research or want to be aware of the most actual vulnerabilities, try these resources. GreyNoise trends and top lists provide information about techniques not discovered by information security specialists yet but already exploited by hackers; and
  • GrayHatWarfare – a searchable database that can be used to find public Amazon AWS servers. GrayHatWarfare employs several open-source scanning tools and aggregates the results. To the moment, GrayHatWarfare has located 279000 available S3 buckets and 4.5 million files.

Of course, this list is not complete: I purposively omitted some of the resources (e.g. due to their excessively narrow specialization. Scanning the entire Internet on your own would be very time- and labor-consuming – but GitHub offers a large selection of tools automating the interaction with nearly all the above-listed services. The purpose of this material is to draw your attention to missed cases and encourage your innovative thinking.

At the end of the article, I will briefly describe my experiments with seemingly trivial Shodan. I strongly believe that I have not violated any laws; so, I am not afraid to disclose my authorship and some sensitive details.

Modern trends

The analysis of high-profile data leaks that had occurred over the last 1.5 years makes it possible to identify the current trends and most popular targets of cyberattack:

  • MongoDB servers;
  • Rsync daemons;
  • Elasticsearch;
  • DigitalOcean;
  • Azure Blobs; and
  • Google Storage.

Misconfigured servers and apps that don’t require authentication are exposed to the most severe threats. The same open-source tools available on GitHub can also be used for scanning. Some attackers use Shodan in their searches, while others scan the global network manually.

As you probably noticed, the above list of targets doesn’t include AWS S3 buckets. The analysis of the chronology of information leaks affecting buckets shows a significant reduction in the frequency of such incidents after 2018. A number of factors have contributed to this: media hype, measures taken by Amazon, bug hunters equipped with scanners, etc.

Of course, you can purchase a hosting service, get the latest software, and join the scanner race – but I prefer not to look for easy ways! Anyone can tweak the streams and play with the settings of an app, but really exciting things start when you employ your wits.

Disadvantaged Grayhat

Despite numerous publications on thematic sites, for a long time, there were no apps or libraries designed for interaction with the GrayHatWarfare API. All I could find on GitHub was a poorly written web parser in Python mechanize.

The main reason behind such a situation was that the service is very expensive, while the functionality of a free account is severely limited. The API query language is so simple and concise that you don’t really need to write code for interaction with it. Still, I decided to create a tool for work with GrayHatWarfare and concurrently implement multithreading and bypass the service delivery limits.

Bypassing limitations of the free account

Searches by all indexed files are limited to 2000 results. However, files in an individual bucket can be browsed nearly without restrictions, especially if you are looking for specific file extensions and use negative keywords. So, I replaced one method with another one and implemented the enumeration of all available bucket IDs. As a result, a search for all existing .zip files now takes only 20-30 minutes. This is exactly how long it took me to send 91000 requests to the API without a single failure!

Search logic and instructions

So, you can search for files with any extensions. Furthermore, the API allows to add keywords to queries – but only negative ones, otherwise the search fails. These words are checked in every single part of the full URL of the file you are looking for. Such a radicalism is justified: buckets contain plenty of junk: media files, open-source front-end and back-end modules, etc. But don’t be afraid to experiment: all discarded URLs will be saved in a separate trash.txt file. To add your own negative keywords, save them to the exclude.txt file.

The files returned by the search can be filtered by their size that has to be specified at the start of the program. To run the app, execute the following commands in the terminal:

~$ git clone
~$ cd grayhat2 && python3

With money, no brains are needed

File filtering by size is a built-in feature of the GrayHatWarfare API – but only for commercial accounts. In my implementation of the program, I can get the file sizes natively because I don’t perform a search per se, but simply browse through the content of buckets.

So, it turns out that any user with a paid account can request 1000 heaviest files from the API because such files often contain user data (here they are, the much-wanted ‘leaks’!) – and that’s it. Does that mean that you won’t be able to find anything of interest there? Of course, not!

Killing mainstream

First, I used GrayHatWarfare to search for .csv files over 500 MB in size. There were some interesting findings, but not enough to celebrate victory.

The second idea that came to my mind was to search for private RSA/SSH keys. And the fortune smiled upon me! I checked for two file extensions one after another: .priv and .key. To my surprise, after writing my Python script, I found three data leaks in an hour! Furthermore, on servers with private keys, I also found the following:

  • user data of a fitness app used by one million Play Market customers;
  • a secret token of an Amazon AWS account belonging to; and
  • most importantly: a secret Amazon AWS token and a private SSH key to the web app. This Italian startup with the initial value of $60 million currently belongs to TUI Group.

With regards to the first two incidents, I was unable to contact the resource owners, but I notified Google and Amazon about the leaks (although received no meaningful answers from them). TUI Group replied me on the next day and patched the security hole a week later.

Below is a step-by-step description of how I have got full access to the production EC2 instance of Musement with superuser rights. Interestingly, this incident still remains unnoticed. This is a warning bell: if a trend becomes the mainstream, it’s better to avoid it.

First results. What’s next?

The developers have graciously left the keys to me
The developers have graciously left the keys to me

After getting the access keys graciously left by the developers in their Python script, I successfully authenticated to the Amazon account. After trying various commands, I realized that I have read-only access to a limited number of features. In addition, the server I wanted to connect to over SSH allowed connections only from whitelisted IP addresses.

Collecting information about the infrastructure

So, I found on GitHub a list of popular AWS security tools composed to pentest Amazon and started with the simplest tricks. First, I retrieved all public addresses of EC2 machines and all IP addresses from the NACL (Network Access retrieved List) policies using the aws_public_ips utility.

Then I used the masscan utility to scan the found addresses for open ports in the range of 1-64 000. Concurrently, I launched two other classic utilities designed to collect more extensive and detailed information about the cloud infrastructure:

  • ScoutSuite2 – an AWS security audit tool checking all sections of the cloud and generating handy reports; and
  • pacu – an AWS exploitation framework designed to identify and exploit vulnerabilities in a cloud, including privilege escalation, persistence, and post-exploitation.

The results were astonishing – especially taking that the SGP (Security Group Policies) and IAM rights for the leaked account were configured correctly. First, pacu found a way to escalate my privileges (but I couldn’t use this possibility for ethical reasons). The suggested method involved the exploitation of the CloudTrail CSV Injection vulnerability. Being able to create trails (i.e. events), I could try to create one using a malicious Excel formula as its name. Of course, this attempt would fail, but the name would remain in the logs. If such a log in .csv format is imported into Excel, the malicious code can be executed on the admin’s PC.

ScoutSuite brought even better results. Below are partial examples of what it has retrieved from the cloud.

Superusers’ credentials on the Tomcat server
Some license keys
Some license keys
Brute-forcing Basic HTTP auth
Brute-forcing Basic HTTP auth

In addition, the S3 bucket itself contained some interesting back-end files. In total, I extracted more than 400 scripts and configs from the user data stored on the EC2 machines.

Bypassing AWS SGP protection

Scan results for EC2 machines
Scan results for EC2 machines

I scanned external IP addresses of the EC2 machines and found routers with default admin passwords supporting the VPN function. :: Huawei AR Web Platform :: Huawei AR Web Platform

Importantly, these routers were on the NACL whitelist for incoming and outgoing traffic on all ports, including SSH, and it was possible to route traffic through them.

The routers allowed me to route traffic
The routers allowed me to route traffic

At last, I was able to connect to the main production server with root privileges using the SSH key found earlier.

Connecting to the main production server
Connecting to the main production server


The company has promptly fixed the leak. Too bad, I neither received a reward nor even a thank you letter from it. Instead, I was notified that the corporation won’t sue me because, during the testing, I had followed the instructions received from them in response to my initial letter.

For me, this experience became an unpleasant reminder that IT companies serve the interests of industry giants and ignore aspects affecting the security of their products and end users. So, let’s make the world a safer place together!

Not only buckets…

I bet you are already downloading hawkeye. But don’t limit the scope of your pentesting experiments to GrayHat; for instance, you may try to combine Google Hacking Database (GHDB) with Shodan or play with its native tags in trends. This idea might bring some interesting results….

Have you ever heard about vulnerable LILIN digital video recorders? It was me who discovered them and started reversing their firmware. Therefore, I solemnly declare: Qihoo 360 blatantly lies about the number of vulnerable devices: it’s not 5K, but more than 300K. Here is the original dork:


Ultimately, I managed to sell the identified bugs. I tried to sell them on legal platforms like Zerodium, but in some places, they weren’t in demand. Apparently, information about this vulnerability has leaked to the public from one of these platforms. More information about the incident can be found on my GitHub page.

May be you have also heard about smart but vulnerable Sonos speakers? Yes, I was involved in the identification of this vulnerability, too. As you can see, the potential of publicly available data is truly unlimited! All you need are some wits and commonly used OSINT tools. Good luck!


One Response to “Chum Bucket. How I hacked a 20-billion corporation using a free service”

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>