Advanced OSINT Techniques: Exploring Modern Network Intelligence Methods

Date: 21/07/2025

What do competitive intelligence, pentesting, and cyber incident investigations have in common? They all involve information gathering, primarily from open sources. But what do you do when the information found is insufficient, and traditional tools are ineffective? Let’s discuss hacks and techniques closely related to accessing restricted materials.

According to representatives of U.S. intelligence, about eighty percent of useful information comes from open sources. Only a small portion consists of intelligence gathered directly by agents. The difference is significant, but it is important to remember that the details obtained through intelligence efforts are crucial for confirming and verifying the broader picture.

This is well illustrated by the story of how the CIA during the Soviet era recreated the electrification scheme of the Urals and nuclear industry facilities based on a magazine photograph. Intelligence analysts focused on a single retouched image of a monitoring panel and then undertook a massive effort, collecting data from Soviet newspapers, magazines, and diplomatic reports. However, the decisive evidence came from aerial reconnaissance photographs, as these were crucial for verifying the conclusions about the location of power lines and factories.

Equally important is the ability to obtain “intel” when it comes to gathering information about individuals. This could involve investigating an infection to find the culprit or identifying a person to refine the attack surface. Let’s explore unusual cases of obtaining information when open source intelligence doesn’t yield the desired results.

Cracking a Telegram Bot

As you know, Telegram offers extensive functionality for working with bots. It’s quite simple to create a new bot, obtain a token to manage it, and use it with a ready-made script from GitHub. Just as easily, bot tokens can be compromised: you can find them through basic dorking. Sometimes, malware ends up using Telegram functions as a control panel—if only you knew how many stealer archives are stored in the messenger’s cloud!

What prevents someone from extracting all information from a bot using a leaked token? The primary safeguard is the Telegram API, which limits multiple connections from different locations for bots. Even if the bot is inactive or its owner ignores connection errors, we can only access new, previously unprocessed messages. Additionally, the bot does not have any record of past interactions, so there’s no way to read message history. Therefore, one must quietly wait for someone to send a message in order to intercept it.

Fortunately, the HTTP API for working with bots is just an extension of the original MTProto API, which allows you to interact with them as if they were human accounts. And there are already libraries available to work with this protocol! Personally, I recommend using Telethon, which recently implemented a layer for managing bots.

As you might expect, the capabilities with this method increase significantly. Now, we can connect without the bot owner’s knowledge and extract the message history. The first conversation, naturally, will be with the bot’s owner. I’ve written a simple script that retrieves conversations, saves media content, and stores the names and photos of correspondents.

By the way, you probably remember that recently Telegram introduced a feature allowing you to delete your messages on the recipient’s side. However, if you delete a message from a chat with a bot, it will remain with the bot.

Discovering a Phone Number Without Direct Contact

Imagine needing to find out the phone number tucked away in someone else’s pocket without drawing any attention. There are, of course, numerous ways to do this, ranging from using an IMSI catcher for a brute-force interception to employing social engineering. Before 2018, the Wi-Fi in the Moscow metro was even capable of revealing a phone number and the advertising profile of its owner through the MAC address.

Researchers from Hexway recently published information on how Apple devices communicate with each other via Bluetooth LE. For mutual identification, they utilize hashes of user account data: three bytes from the SHA-256 hash of the phone number, Apple ID, and email address. Yes, iPhones, MacBooks, and even headphones are essentially broadcasting their identity and operating mode in the radio space around them.

Fortunately, hashes aren’t sent every time, but only in specific cases, such as when opening password entry screens for wireless networks. However, that’s enough: suggest someone connect to a Wi-Fi network, and you can capture the hashes of their data.

Additionally, remember that a local phone number consists of eleven digits, and with three bytes, there will only be a few dozen collisions. This number reduces even further if we know the region and the provider. The researchers have created a convenient script that allows for generating hashes for the required ranges. Some enthusiasts have already created a ready-made database by cleaning out nonexistent provider codes. Once you have the list of numbers, you just need to narrow down the “suspects” by using HLR requests to check subscriber activity and verify if they have messenger accounts. A few iterations, and you’ll have the desired mobile number in your hands.

Also, don’t forget about the Apple ID and email inbox! Now you have all the necessary data to verify the account information of a person that you find in open sources.

Tracing Emails with a Network Honeypot

Now, let’s assume we’re in the opposite situation: we can only contact a person through the network, for example, using the Jabber messenger. We know that they are involved in malware development but lead a double life and have an official job. We are eager to uncover their identity, but they use only Tor and VPN. What can we latch onto in this case?

Just as each person has a unique handwriting, different people’s code often differs in small details. What if we have a script from a person of interest? We can search for key strings in public repositories in the hopes that they have written or reused their code elsewhere. This way, we might find their account, download all available repositories from it, and obtain a list of people and email addresses using a simple command.

$ git log —pretty="%an %ae%n%cn %ce" | sort | uniq

You wouldn’t believe how often people, out of sheer carelessness, commit under one account and push under another!

Searching this way is often imprecise, and there are surely other methods. What can reveal an active Git repository user? Here’s a hint: a public key, which the server uses to identify you! Often, this is the same SSH key generated in id_rsa, but living a double life means multiple identities and several accounts. Surely, one of them is used to connect to servers. Do you see where I’m going with this?

To complete the picture, let’s clarify two more things. First: all users’ public keys on GitHub and GitLab are stored openly and publicly accessible, simply by adding .keys to your profile’s URL. Second: when connecting, the SSH client cycles through all the keys that are explicitly specified for the server or added to the agent. This is the crux of our exploration.

We allow a person to connect to our server with a patched SSH that sends us all the keys. Then we search for the person in the database of all users of public repositories! The idea is simple yet brilliant and has already been implemented in some form. Type in the console

$ ssh whoami.filippo.io

The server will attempt to find your GitHub profile and simultaneously check for CVE-2016-0777.

The server code is open-source and is based on a framework for researching the reliability of GitHub users’ keys. While the data itself wasn’t published, others have replicated the data collection independently. However, it’s worth mentioning that it’s best to collect up-to-date user data yourself since accounts can be renamed, new accounts are created, and some change their passwords.

Uncovering Secrets in Telegram

How often have you tried to find information about a Telegram account? For me, it’s been far too often. Sometimes the origin becomes clearer with metadata such as a username (especially when it resembles a real person’s name) or photos from the account and the date they were set. However, sometimes even this information is missing, leaving only the First and Last name, which can be entirely uninformative.

Here’s how identity verification works in real-world searches. First of all, we determine the account’s identifier. It is hidden in official clients, but some unofficial ones display the ID when opening a profile. It’s more convenient to forward a person’s message to the @userinfobot, which will provide the desired digits.

However, sometimes the method may not work due to account privacy settings. But there’s a solution: using scripts, you can utilize APIs to track system fields in messages and events. This way, we can identify a person if they change their name, leave group chats, or go off the grid temporarily.

Next, we search for data using specialized search engines. For Telegram, search.buzz.im is the most useful as it indexes all public chats and even allows searching by the message author’s name. However, this won’t help us if the account owner recently changed their name. Searching by ID requires a lot of luck—it might have appeared as text in public chats and been indexed, but in practice, the likelihood of this is quite low.

We also perform searches by ID. Unfortunately, Telegram does not allow access to account data with just an identifier—this is a basic privacy feature. Therefore, there are no straightforward tools for direct identification, except for databases that perform reverse searches by ID for accounts whose data has been collected “legally” (meaning when data can be obtained through the API by having common chats with the person).

Who has access to such account databases? It’s clear that these are bots used to manage a vast number of public chats, private chats, and direct messages. There’s nothing stopping their developers from storing all available account data. 🙂

Now it’s time to discuss the practical part. For instance, in one of my cases, tracing the source of information led to a group bot designed for accumulating karma. Naturally, a database was used to update and synchronize values for each account. And, of course, the developers couldn’t resist the temptation to create a simple web interface using PHP to check karma based on a nickname, name, or user ID.

Finding the necessary information turned out to be even easier than expected: the search displayed a list of suitable accounts with names and IDs (autocompletion was working). Upon entering an identifier, the first entry from the database appeared, which is the first username under which the bot encountered it. Bingo!

But if you think that’s all, you’re quite mistaken. The autocomplete request had a SQL injection vulnerability that could have been exploited to extract the entire user database—accounts that had at least once joined groups with the bot. This included name change history, joining dates, and other information. I believe the lesson here speaks for itself.

Hidden Data from the Forum

What if you only know one account of a person on a site, and searching by username yields nothing? Registration implies that someone has left their personal information there and at the very least entered an email address, which is what we need for further investigation. The problem is that the privacy settings are at maximum, and only the user can see their data.

Let’s take our stealth level to the maximum: it’s not advisable to use social engineering techniques on a person, as they might get suspicious about the decoy account and erase traces in other important areas. Of course, one could try to swipe cookies from a forum administrator or moderator to gain elevated privileges and direct access to the database…

However, let me present a more elegant solution to this problem. But before I proceed, I must caution you: I do not recommend repeating these actions, if only because they go beyond passive information gathering.

In popular forum engines today, it’s becoming increasingly difficult to find XSS vulnerabilities. But does that mean they are completely absent? The author of this case study wondered the same and was proven right: an exploit was discovered in the phone number field of a user profile page, which was accessible to everyone. What does this mean for us?

As you might have guessed, this allows for user session hijacking, but even that might be unnecessary. Why log into a forum to view data that’s already obtained? Exactly, we’ll simply inject a JS script into the page, which will use AJAX to request information about the active user and then send it back to our server.

But remember, we agreed not to draw attention to the subject of our investigation. So let’s create conditions in which our XSS reaches them through someone else’s hands, such as through the accounts of their acquaintances. One more small touch is needed: besides the direct request for information, we will send an additional request via JS to store the XSS in the user’s field. Can you see the whole picture now?

We will create a highly attention-grabbing account and post a provocative message on a forum. I’ll leave the details of implementation to your imagination. It’s not crucial who reads it or if we get banned, as the account will act as the “patient zero.” Someone will check out our profile to find out more information and will unknowingly begin spreading a data-collection virus. This script will propagate in a wide-reaching manner, infecting friends, friends of friends, and even random contacts.

That’s exactly what happened. Shortly after, data about the targeted user, including the desired email, arrived at the command server, along with information on a good portion of the other forum members. It’s a pity that there was no calculation of how many degrees of separation it took for “the reward to find its hero.” Otherwise, it could have been possible to write a study on social graphs in forums.

By the way, it’s worth recalling that a similar method was recently used in VK, and it was done openly rather than for data collection purposes. Now imagine how long attackers could exploit such vulnerabilities covertly, gathering our data.

Conclusion

In this article, we discussed only five unconventional methods to obtain information that can’t be reached through open sources. There are always tricks and hacks, but keep in mind that someone may have already traveled this path before you: uploaded a database online, found a leak on a website, or created a Telegram bot for verification. Therefore, I recommend you compile a list of the most useful open information sources and go through it before digging deeper. For OSINT within the Russian internet, I recommend a bot with a comprehensive set of tools for all occasions — HowToFind bot.

And don’t forget to organize the data you’ve gathered. Often, a simple mind map can help reveal not-so-obvious connections and filter out the unnecessary. You might find that you won’t even need to use complex techniques to get the information you desire.

Related posts:
2023.06.08 — Croc-in-the-middle. Using crocodile clips do dump traffic from twisted pair cable

Some people say that eavesdropping is bad. But for many security specialists, traffic sniffing is a profession, not a hobby. For some reason, it's believed…

Full article →
2022.06.01 — WinAFL in practice. Using fuzzer to identify security holes in software

WinAFL is a fork of the renowned AFL fuzzer developed to fuzz closed-source programs on Windows systems. All aspects of WinAFL operation are described in the official documentation,…

Full article →
2023.07.29 — Invisible device. Penetrating into a local network with an 'undetectable' hacker gadget

Unauthorized access to someone else's device can be gained not only through a USB port, but also via an Ethernet connection - after all, Ethernet sockets…

Full article →
2022.02.09 — F#ck da Antivirus! How to bypass antiviruses during pentest

Antiviruses are extremely useful tools - but not in situations when you need to remain unnoticed on an attacked network. Today, I will explain how…

Full article →
2023.02.21 — SIGMAlarity jump. How to use Sigma rules in Timesketch

Information security specialists use multiple tools to detect and track system events. In 2016, a new utility called Sigma appeared in their arsenal. Its numerous functions will…

Full article →
2023.06.08 — Cold boot attack. Dumping RAM with a USB flash drive

Even if you take efforts to protect the safety of your data, don't attach sheets with passwords to the monitor, encrypt your hard drive, and always lock your…

Full article →
2023.03.03 — Infiltration and exfiltration. Data transmission techniques used in pentesting

Imagine a situation: you managed to penetrate the network perimeter and gained access to a server. This server is part of the company's internal network, and, in theory, you could…

Full article →
2022.06.02 — Climb the heap! Exploiting heap allocation problems

Some vulnerabilities originate from errors in the management of memory allocated on a heap. Exploitation of such weak spots is more complicated compared to 'regular' stack overflow; so,…

Full article →
2023.04.19 — Kung fu enumeration. Data collection in attacked systems

In penetration testing, there's a world of difference between reconnaissance (recon) and data collection (enum). Recon involves passive actions; while enum, active ones. During recon,…

Full article →
2023.02.21 — Pivoting District: GRE Pivoting over network equipment

Too bad, security admins often don't pay due attention to network equipment, which enables malefactors to hack such devices and gain control over them. What…

Full article →