How to get sensitive data using social networks API?


OSINT (Open source intelligence) is a discipline of American Intelligence Service responsible for search, collection, and choice of information from publicly available sources. Social networks are among the largest public information suppliers, because almost all of us have an account (sometimes more than one) in one or more social networks. Here we share the news, private photos, preferences (e.g., when you “like” something or start following some community), friend lists. And we do it of our own free will without thinking of possible consequences. In several articles, we already analyzed the ways of getting the interesting data out from the social networks. Usually it had to be done manually, but for better results, it’s more reasonable to use specific utilities. There are several open source utilities enabling to get user information out of the social networks.


Among the most popular ones — Creepy. It serves to collect user geolocation information from his/her Twitter, Instagram, Google+ and Flickr accounts. Among the advantages of this standard for Kali Linux tool are: plain interface, very handy process of API services tokens acquisition, and indication of the found results with placemarks on the map (which, in its turn, enables to follow all the user’s movements). Among the disadvantages: weakish functional. Tools utility can collect geotags from the mentioned services and indicate them on Google Map showing who and how many times has been retwitted by the user, statistics of the devices used for tweets, and the posting time. But as far as this is an open source tool, you can always extend its functional by yourself.
Let’s not study the program use — clear and detailed explanation given in official video should be enough for anyone to work with the tool.

Example of Creepy use


Two more tools which are less widespread, but have a strong functional and merit your attention — fbStalker и geoStalker.

fbStalker serves to collect the following user information from his/her Facebook account:

— user’s video, photo, posts;
— amount of “likes” he/she received from other users;
— photos geotaggings;
— statistics of comments he/she receives to the posts and photos;
— time of day when the user is usually online.

To run the tool you need Google Chrome, ChromeDriver, which are installed in the following way:

You also need to have Python 2.7 and pip to install the following packages:

And finally, you need a library for GraphML files parsing:

After that, one can edit to indicate there his/her own e-mail address, password, username and start the search. Running the Tools is rather simple


geoStalker is much more interesting. It collects the information from the coordinates you’ve sent it. For example:

— local Wi-Fi-points based on (in particular, their essid, bssid, geo);
— Foursquare check-ins;
— Instagram- and Flickr-accounts from which the photos with given coordinates have been posted;
— all tweets posted from this location.

To run the tool, just as in the previous case, you need Chrome & ChromeDriver, Python 2.7, pip to install the following packages: google, python-instagram, pygoogle, geopy, lxml, oauth2, python-linkedin, pygeocoder, selenium, termcolor, pysqlite, TwitterSearch, foursquare, as well as pygraphml and gdata:

After that, you need to edit, indicate all the necessary API-keys and access-tokens (if you don’t indicate these data for some certain network, it will not be searched). Then, run the tool with sudo python2.7 command and indicate an address or coordinates. As a result, all the data are collected, placed on Google Map, and saved as HTML-file.

Let’s take action

Previously, we referred to complete tools. In most cases, their functional will not be enough, and it will be necessary to update them or to write your own tools — all popular social networks provide their APIs. Usually, you see them as separate subdomains, where you send GET-request and receive XML/JSON-response. For example, for Instagram,, for Vkontakte. Of course, most APIs of that kind have their own function libraries, but we want to understand how it works; moreover, charging the script with excessive external libraries in order to have a couple of functions is not the best idea. So, let’s write our own tool, which would enable the search of photos in VK and Instagram by specified coordinates and time period.

Using VK and Instagram API documentation, we’ll form a request for the list of photos bound with the geographic information and time.

Instagram API Request:

Vkontakte API Request:

The variables used here:

  • location_latitude — geographic latitude;
  • location_longitude — geographic longitude;
  • distance — search radius;
  • timestamp — start date of the time interval;
  • date_increment — amount of seconds from the start to the end date of the time interval;
  • access_token — developer token.

As it turned out, one needs access_token to access Instagram API. It’s not hard, but takes some time to get it (see the side bar). Vkontakte is more loyal to the strangers, and that’s very good for us.

Process automation

So, now we know how to make the necessary requests; however manual break-up of the server response (in JSON/XML view) is not an entertaining thing to do. It’s far more convenient to create a small script that will do that for you. And again we use Python 2.7. The procedure is the following: you search all the photos within the given radius of coordinates and time interval. Consider that the amount of resulting photos is limited. So, for a large time interval you need to make several requests with intermediate time intervals (the very date_increment). Consider also coordinates inaccuracy and don’t use a radius of several meters. Don’t forget that the time should be in timestamp format.

Let’s start coding. First, let’s connect all necessary libraries:

Write the functions for getting data from API through HTTPS. By means of the sent function arguments, you form a GET-request and return the server response with a string.

Let’s also code timestamp converting function to improve it:

Now, let’s split the time interval, save the results in HTML-file, and then, write the principal image search logic The function looks bulky, but the most difficult thing in it is to split the time interval into blocks. As for the rest, it’s a usual JSON parsing and saving the data in HTML.

HTML-format is chosen for a reason. It lets you indicate a link to the images instead of saving them separately. When starting the results page in browser, the images will be loaded automatically.
Let’s write the same function for Vkontakte.

And the function invocations, of course:

The results of our script work are in console

The results of our script work are in console


One of the Instagram parsing results

One of the Instagram parsing results


Vkontakte parsing result

Vkontakte parsing result


“Trial by fire”

The script is ready, it’s time to try it out. And here, an idea came to my mind. Those who were at PHD’14 surely remember those very attractive Mail.Ru promo-girls. Well, let’s make up leeway — we’ll find them and get acquainted.

What do we actually know about PHD’14:

  • venue — Digital October — 55.740701,37.609161;
  • date held — May 21–22, 2014 — 1400619600–1400792400.

We obtain the following data-set:

Launch the script and start studying the results. Aha, one of the girls posted a selfie from a restroom with coordinates reference! Certainly API didn’t let it go by, and soon all the pages of other promo-girls were found. As it turned out, there was a couple of twins among them :).

The mentioned restroom selfie of the promo-girl from PHD’14

The mentioned restroom selfie of the promo-girl from PHD’14


Instructive example

As the second example, I’d like to mention one of PHD’14 tasks for CTF final. Actually, the very case provoked me to get into this topic. The point was the following.

There is an evil cool hacker who has developed some malware. You have a set of coordinates and related time marks of his Internet connections. And you need to get his name and photo. The coordinates were the following:

Definitely, first of all we checked which places correspond to these coordinates. As it turned out, these were railway stations; the first one — Kazansky Rail Terminal (Moscow) and the last — Zeleny Dol (Zelenodolsk). The rest of them — the stations between Moscow and Zelenodolsk. So he was going online from the train. The very train was found by departure time. Kazan turned out to be the arrival station. And that’s when a question came up: where to find his name and photo? The logic was the following: if one needs to find a photo, it makes sense to search it in social networks. The main targets were Vkontakte, Facebook, Instagram, and Twitter. Besides the Russian teams, the foreigners also took part in the competition, so we decided that the organizers wouldn’t have chosen Vkontakte. So, it was decided to start with Instagram.

At that time, we didn’t work with photos and coordinates search scripts, so we used public services able to do that. It appeared that there weren’t many of them, and these available had rather poor interface. After having checked hundreds of photos from every station of the train traffic, finally we found the needed one.

After all, it took not more than one hour to find the train and missing stations, and to develop the further search logic. But it took a really long time to find the photo. Once again it confirms how important it is to have and to use correct and handy programs.


The article is over, it’s time to make a conclusion. And this conclusion is pretty simple: load the geotagged photos cautiously. Competitive spies can grab hold of any possibility to get new information, and social networks’ API can really help them do that. When writing this article, I studied some more services, including Twitter, Facebook, and LinkedIn to check the presence of such functional. Good news — Twitter gave positive results. Facebook and LinkedIn, instead, disappointed me, but maybe all is not lost, and they will extend their APIs. So be cautious posting geotagged photos — they can come to be in the wrong hands :).

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">