Advanced OSINT Techniques: Exploring Modern Network Intelligence Methods

What do competitive intelligence, pentesting, and cyber incident investigations have in common? They all involve information gathering, primarily from open sources. But what do you do when the information found is insufficient, and traditional tools are ineffective? Let’s discuss hacks and techniques closely related to accessing restricted materials.

According to representatives of U.S. intelligence, about eighty percent of useful information comes from open sources. Only a small portion consists of intelligence gathered directly by agents. The difference is significant, but it is important to remember that the details obtained through intelligence efforts are crucial for confirming and verifying the broader picture.

This is well illustrated by the story of how the CIA during the Soviet era recreated the electrification scheme of the Urals and nuclear industry facilities based on a magazine photograph. Intelligence analysts focused on a single retouched image of a monitoring panel and then undertook a massive effort, collecting data from Soviet newspapers, magazines, and diplomatic reports. However, the decisive evidence came from aerial reconnaissance photographs, as these were crucial for verifying the conclusions about the location of power lines and factories.

Equally important is the ability to obtain “intel” when it comes to gathering information about individuals. This could involve investigating an infection to find the culprit or identifying a person to refine the attack surface. Let’s explore unusual cases of obtaining information when open source intelligence doesn’t yield the desired results.

Cracking a Telegram Bot

As you know, Telegram offers extensive functionality for working with bots. It’s quite simple to create a new bot, obtain a token to manage it, and use it with a ready-made script from GitHub. Just as easily, bot tokens can be compromised: you can find them through basic dorking. Sometimes, malware ends up using Telegram functions as a control panel—if only you knew how many stealer archives are stored in the messenger’s cloud!

What prevents someone from extracting all information from a bot using a leaked token? The primary safeguard is the Telegram API, which limits multiple connections from different locations for bots. Even if the bot is inactive or its owner ignores connection errors, we can only access new, previously unprocessed messages. Additionally, the bot does not have any record of past interactions, so there’s no way to read message history. Therefore, one must quietly wait for someone to send a message in order to intercept it.

Fortunately, the HTTP API for working with bots is just an extension of the original MTProto API, which allows you to interact with them as if they were human accounts. And there are already libraries available to work with this protocol! Personally, I recommend using Telethon, which recently implemented a layer for managing bots.

As you might expect, the capabilities with this method increase significantly. Now, we can connect without the bot owner’s knowledge and extract the message history. The first conversation, naturally, will be with the bot’s owner. I’ve written a simple script that retrieves conversations, saves media content, and stores the names and photos of correspondents.

By the way, you probably remember that recently Telegram introduced a feature allowing you to delete your messages on the recipient’s side. However, if you delete a message from a chat with a bot, it will remain with the bot.

Discovering a Phone Number Without Direct Contact

Imagine needing to find out the phone number tucked away in someone else’s pocket without drawing any attention. There are, of course, numerous ways to do this, ranging from using an IMSI catcher for a brute-force interception to employing social engineering. Before 2018, the Wi-Fi in the Moscow metro was even capable of revealing a phone number and the advertising profile of its owner through the MAC address.

Researchers from Hexway recently published information on how Apple devices communicate with each other via Bluetooth LE. For mutual identification, they utilize hashes of user account data: three bytes from the SHA-256 hash of the phone number, Apple ID, and email address. Yes, iPhones, MacBooks, and even headphones are essentially broadcasting their identity and operating mode in the radio space around them.

Fortunately, hashes aren’t sent every time, but only in specific cases, such as when opening password entry screens for wireless networks. However, that’s enough: suggest someone connect to a Wi-Fi network, and you can capture the hashes of their data.

Additionally, remember that a local phone number consists of eleven digits, and with three bytes, there will only be a few dozen collisions. This number reduces even further if we know the region and the provider. The researchers have created a convenient script that allows for generating hashes for the required ranges. Some enthusiasts have already created a ready-made database by cleaning out nonexistent provider codes. Once you have the list of numbers, you just need to narrow down the “suspects” by using HLR requests to check subscriber activity and verify if they have messenger accounts. A few iterations, and you’ll have the desired mobile number in your hands.

Also, don’t forget about the Apple ID and email inbox! Now you have all the necessary data to verify the account information of a person that you find in open sources.

Tracing Emails with a Network Honeypot

Now, let’s assume we’re in the opposite situation: we can only contact a person through the network, for example, using the Jabber messenger. We know that they are involved in malware development but lead a double life and have an official job. We are eager to uncover their identity, but they use only Tor and VPN. What can we latch onto in this case?

Just as each person has a unique handwriting, different people’s code often differs in small details. What if we have a script from a person of interest? We can search for key strings in public repositories in the hopes that they have written or reused their code elsewhere. This way, we might find their account, download all available repositories from it, and obtain a list of people and email addresses using a simple command.

$ git log —pretty="%an %ae%n%cn %ce" | sort | uniq

You wouldn’t believe how often people, out of sheer carelessness, commit under one account and push under another!

Searching this way is often imprecise, and there are surely other methods. What can reveal an active Git repository user? Here’s a hint: a public key, which the server uses to identify you! Often, this is the same SSH key generated in id_rsa, but living a double life means multiple identities and several accounts. Surely, one of them is used to connect to servers. Do you see where I’m going with this?

To complete the picture, let’s clarify two more things. First: all users’ public keys on GitHub and GitLab are stored openly and publicly accessible, simply by adding .keys to your profile’s URL. Second: when connecting, the SSH client cycles through all the keys that are explicitly specified for the server or added to the agent. This is the crux of our exploration.

We allow a person to connect to our server with a patched SSH that sends us all the keys. Then we search for the person in the database of all users of public repositories! The idea is simple yet brilliant and has already been implemented in some form. Type in the console

$ ssh whoami.filippo.io

The server will attempt to find your GitHub profile and simultaneously check for CVE-2016-0777.

The server code is open-source and is based on a framework for researching the reliability of GitHub users’ keys. While the data itself wasn’t published, others have replicated the data collection independently. However, it’s worth mentioning that it’s best to collect up-to-date user data yourself since accounts can be renamed, new accounts are created, and some change their passwords.

Uncovering Secrets in Telegram

How often have you tried to find information about a Telegram account? For me, it’s been far too often. Sometimes the origin becomes clearer with metadata such as a username (especially when it resembles a real person’s name) or photos from the account and the date they were set. However, sometimes even this information is missing, leaving only the First and Last name, which can be entirely uninformative.

Here’s how identity verification works in real-world searches. First of all, we determine the account’s identifier. It is hidden in official clients, but some unofficial ones display the ID when opening a profile. It’s more convenient to forward a person’s message to the @userinfobot, which will provide the desired digits.

However, sometimes the method may not work due to account privacy settings. But there’s a solution: using scripts, you can utilize APIs to track system fields in messages and events. This way, we can identify a person if they change their name, leave group chats, or go off the grid temporarily.

Next, we search for data using specialized search engines. For Telegram, search.buzz.im is the most useful as it indexes all public chats and even allows searching by the message author’s name. However, this won’t help us if the account owner recently changed their name. Searching by ID requires a lot of luck—it might have appeared as text in public chats and been indexed, but in practice, the likelihood of this is quite low.

We also perform searches by ID. Unfortunately, Telegram does not allow access to account data with just an identifier—this is a basic privacy feature. Therefore, there are no straightforward tools for direct identification, except for databases that perform reverse searches by ID for accounts whose data has been collected “legally” (meaning when data can be obtained through the API by having common chats with the person).

Who has access to such account databases? It’s clear that these are bots used to manage a vast number of public chats, private chats, and direct messages. There’s nothing stopping their developers from storing all available account data. 🙂

Now it’s time to discuss the practical part. For instance, in one of my cases, tracing the source of information led to a group bot designed for accumulating karma. Naturally, a database was used to update and synchronize values for each account. And, of course, the developers couldn’t resist the temptation to create a simple web interface using PHP to check karma based on a nickname, name, or user ID.

Finding the necessary information turned out to be even easier than expected: the search displayed a list of suitable accounts with names and IDs (autocompletion was working). Upon entering an identifier, the first entry from the database appeared, which is the first username under which the bot encountered it. Bingo!

But if you think that’s all, you’re quite mistaken. The autocomplete request had a SQL injection vulnerability that could have been exploited to extract the entire user database—accounts that had at least once joined groups with the bot. This included name change history, joining dates, and other information. I believe the lesson here speaks for itself.

Hidden Data from the Forum

What if you only know one account of a person on a site, and searching by username yields nothing? Registration implies that someone has left their personal information there and at the very least entered an email address, which is what we need for further investigation. The problem is that the privacy settings are at maximum, and only the user can see their data.

Let’s take our stealth level to the maximum: it’s not advisable to use social engineering techniques on a person, as they might get suspicious about the decoy account and erase traces in other important areas. Of course, one could try to swipe cookies from a forum administrator or moderator to gain elevated privileges and direct access to the database…

However, let me present a more elegant solution to this problem. But before I proceed, I must caution you: I do not recommend repeating these actions, if only because they go beyond passive information gathering.

In popular forum engines today, it’s becoming increasingly difficult to find XSS vulnerabilities. But does that mean they are completely absent? The author of this case study wondered the same and was proven right: an exploit was discovered in the phone number field of a user profile page, which was accessible to everyone. What does this mean for us?

As you might have guessed, this allows for user session hijacking, but even that might be unnecessary. Why log into a forum to view data that’s already obtained? Exactly, we’ll simply inject a JS script into the page, which will use AJAX to request information about the active user and then send it back to our server.

But remember, we agreed not to draw attention to the subject of our investigation. So let’s create conditions in which our XSS reaches them through someone else’s hands, such as through the accounts of their acquaintances. One more small touch is needed: besides the direct request for information, we will send an additional request via JS to store the XSS in the user’s field. Can you see the whole picture now?

We will create a highly attention-grabbing account and post a provocative message on a forum. I’ll leave the details of implementation to your imagination. It’s not crucial who reads it or if we get banned, as the account will act as the “patient zero.” Someone will check out our profile to find out more information and will unknowingly begin spreading a data-collection virus. This script will propagate in a wide-reaching manner, infecting friends, friends of friends, and even random contacts.

That’s exactly what happened. Shortly after, data about the targeted user, including the desired email, arrived at the command server, along with information on a good portion of the other forum members. It’s a pity that there was no calculation of how many degrees of separation it took for “the reward to find its hero.” Otherwise, it could have been possible to write a study on social graphs in forums.

By the way, it’s worth recalling that a similar method was recently used in VK, and it was done openly rather than for data collection purposes. Now imagine how long attackers could exploit such vulnerabilities covertly, gathering our data.

Conclusion

In this article, we discussed only five unconventional methods to obtain information that can’t be reached through open sources. There are always tricks and hacks, but keep in mind that someone may have already traveled this path before you: uploaded a database online, found a leak on a website, or created a Telegram bot for verification. Therefore, I recommend you compile a list of the most useful open information sources and go through it before digging deeper. For OSINT within the Russian internet, I recommend a bot with a comprehensive set of tools for all occasions — HowToFind bot.

And don’t forget to organize the data you’ve gathered. Often, a simple mind map can help reveal not-so-obvious connections and filter out the unnecessary. You might find that you won’t even need to use complex techniques to get the information you desire.

2023.03.26 — Attacks on the DHCP protocol: DHCP starvation, DHCP spoofing, and protection against these techniques

Chances are high that you had dealt with DHCP when configuring a router. But are you aware of risks arising if this protocol is misconfigured on a…

Full article →

2022.06.03 — Playful Xamarin. Researching and hacking a C# mobile app

Java or Kotlin are not the only languages you can use to create apps for Android. C# programmers can develop mobile apps using the Xamarin open-source…

Full article →

2022.01.13 — Step by Step. Automating multistep attacks in Burp Suite

When you attack a web app, you sometimes have to perform a certain sequence of actions multiple times (e.g. brute-force a password or the second authentication factor, repeatedly…

Full article →

2022.06.03 — Vulnerable Java. Hacking Java bytecode encryption

Java code is not as simple as it seems. At first glance, hacking a Java app looks like an easy task due to a large number of available…

Full article →

2023.03.26 — Poisonous spuds. Privilege escalation in AD with RemotePotato0

This article discusses different variations of the NTLM Relay cross-protocol attack delivered using the RemotePotato0 exploit. In addition, you will learn how to hide the signature of an…

Full article →

2022.06.01 — Log4HELL! Everything you must know about Log4Shell

Up until recently, just a few people (aside from specialists) were aware of the Log4j logging utility. However, a vulnerability found in this library attracted to it…

Full article →

2023.07.29 — Invisible device. Penetrating into a local network with an 'undetectable' hacker gadget

Unauthorized access to someone else's device can be gained not only through a USB port, but also via an Ethernet connection - after all, Ethernet sockets…

Full article →

2022.02.09 — First contact: An introduction to credit card security

I bet you have several cards issued by international payment systems (e.g. Visa or MasterCard) in your wallet. Do you know what algorithms are…

Full article →

2022.06.03 — Challenge the Keemaker! How to bypass antiviruses and inject shellcode into KeePass memory

Recently, I was involved with a challenging pentesting project. Using the KeeThief utility from GhostPack, I tried to extract the master password for the open-source KeePass database…

Full article →

2023.02.13 — First Contact: Attacks on Google Pay, Samsung Pay, and Apple Pay

Electronic wallets, such as Google Pay, Samsung Pay, and Apple Pay, are considered the most advanced and secure payment tools. However, these systems are also…

Full article →