Kung fu enumeration. Data collection in attacked systems

In penetration testing, there’s a world of difference between reconnaissance (recon) and data collection (enum). Recon involves passive actions; while enum, active ones. During recon, you use only open sources (OSINT), and the target system is not affected in any way (i.e. all actions are performed anonymously). By contrast, at the enumeration (data collection) stage, you interact with the target. This article discusses the data collection stage as an integral component of any pentesting study.

The phrase “Enumeration is the key” is very popular among pentesters. In the very beginning, you usually scan ports, identify services, and run specific tests for each of them. At this stage, rookie pentesters often make severe mistakes since port scanning can be treated by the other side as a network attack.

In fact, enumeration is quite an art. Acting strictly legitimately, without exploiting any vulnerabilities, you have to extract as much information as possible from a service or an open port based solely on the app behavior. The more such information a hacker pentester collects, the easier their next steps will be.

There are so many different services running on servers nowadays that a whole book is required to describe all tricks used against each of them. Therefore, this article examines just the basics present in any service: the TCP/IP protocol stack. You’ll be surprised how much information you can actually collect at the L2/L3/L4 levels even without access to the remote server: you just have to know its IP address and send certain packets to it. Most of the techniques described below are applicable to all Internet nodes.

Using only the network and transport layer protocols (IP, TCP, and ICMP), you can determine:

  • operating system;
  • network speed rate;
  • operating system uptime;
  • time on the remote server;
  • number of servers actually located behind an IP address;
  • number of balancing servers on a port;
  • whether the ports belong to the same server or not;
  • how many IP addresses the server has;
  • topology of the packet’s route to the destination; and 
  • fake open ports.

In the article, I will present all examples in two ways: (1) using basic tools; and (2) using my own set of scripts. These scripts contain useful examples showing how the scapy packet crafting library can be used to customize packets for your malicious pentesting purposes.

ARP

ARP is usually associated with traffic interception, but I suggest to look at this protocol from a different perspective.

Every ARP packet has a field with src_mac.

Based on the contents of this field, you can answer a number of questions.

Detecting hosts behind a firewall

If a remote host doesn’t respond to ping, do you really think it’s not there? When you initiate any call to a node in the current network segment (be it a ping command or a browser request), your OS first makes an ARP request to find out the MAC address of the remote host. And firewalls (even if they are configured to completely block all incoming traffic) usually don’t restrict data link protocols (Ethernet) and ARP because such actions can disrupt the entire network.

For instance, there is a host of the current network segment that pretends to be nonexistent since it doesn’t respond to ping. However, its MAC address is present in the cache, which means that the host actually exists.

To detect such hidden hosts in a network segment, you can use the script that automatically performs the above-described steps.

This information can also be useful in networks that don’t have DHCP in situations when you have to calmly gain an IP address. In addition, it tells you that one or another server isn’t turned off, but hidden behind a firewall.

Detecting hosts with multiple IPs

Since you can see MAC addresses of remote nodes in the given network segment, you can also detect systems with multiple IP addresses.

It’s simple: if one MAC address corresponds to several IP addresses, then, with a high probability, there is a node whose network card has several IP addresses (aliases) assigned to it. Too bad, you cannot detect dual-homed (i.e. equipped with several network cards) hosts. Later, I will explain how to discover them.

Determining remote host hardware

The first three octets of a MAC address are reserved for equipment manufacturers and therefore can accurately identify the hardware.

As you can see, there are four IP cameras and four network devices next to you on the network. Information about them is stored in the OUI databases:

  • /usr/share/arp-scan/ieee-oui.txt; and 
  • /usr/share/airgraph-ng/oui.txt).

MAC address of the remote host also fairly accurate indicates whether the server is physical or virtual. For instance, Vmware that is popular in the enterprise sector by default uses MAC addresses in the following format: 00:50:56:xx:xx:xx.

Stale network address configurations

Another not-so-obvious attack involving ARP is called SNAC; it detects stale information flows. As you know, for direct communication between nodes in local networks, a MAC address requested over ARP is required; and since this request is broadcasted (i.e. everyone can hear it), you can find out whether all the requested nodes actually exist.

The ARP response is not broadcasted, but sent directly to the requester; so, you won’t hear it. Therefore, the script makes another ARP request to check if the requested host actually exists in the current network segment. If yes, it’s highlighted in green (as an active information flow); if not, in red (as inactive one). In the course of its execution, the script continuously updates the graph visualizing these information flows.

Simply by analyzing ARP, you can see that someone located in another network segment tries to connect to nonexistent hosts via gateway 192.168.8.1. Or, for instance, that three nodes (82, 79, and 83) are contacting host 192.168.8.73 at once.

It is logical to take an available, but requested IP address by assigning as many addresses to your network card as necessary. The result can be any information flow, including the disclosure of passwords (e.g. if the assigned address belongs to an MS SQL server) or hashes (if the address is assigned to a network drive). But this is a different story…

A SNAC attack is extremely difficult to detect because the reason behind its success is a stale (and forgotten by everybody) config. Concurrently, risks arising when the server’s IP address changes are not so obvious for security stuff: who could expect an attacker to take it?

SNAC is very effective in attacks on so-called test stands since tested nodes are simply pulled out of productive networks, thus, breaking information flows.

Analyzing IP.ttl

The ttl field is present in any IP packet.

Its distinct feature is as follows: each network device that delivers your IP packet to the target subtracts one from the value of this field. If a packet passes through many devices en route to the destination, then the longer is the packet’s route, the more its IP.ttl value will decrease. As soon as the content of the IP.ttl field is reduced to zero, the packet is deleted. This is how protection against packet looping is implemented. The sender can set any ttl value, while the remote side will respond with a packet having its own IP.ttl value.

OS identification

The sender (your PC) sets a certain initial IP.ttl value. In the same way, the remote party will send you a packet in response with its initial IP.ttl value (that may be reduced depending on the route length). And this value is OS-specific.

Below are the four popular initial IP.ttl values that accurately indicate the OS family on the remote host:

  • 255 – Cisco;
  • 128 – Windows;
  • 64 – Linux, Android, BSD, macOS; and 
  • 32 – embedded systems.

In other words, you can determine the OS type with a single ping command.

On a local network, where the number of firewalls between your PC and the target cannot be large, this method is pretty accurate. The received information enables the attacker to immediately determine the device type and compile a list of ports to be scanned for suitable exploits.

The defending side, in turn, should immediately ring the alarm if Linux (especially Kali) suddenly appears in the user segment where all computers run Windows.

By contrast, on the global network, ping (ICMP) is of little use. On its way, the packet passes through many firewalls with their own rules and may simply not reach the target, even if ports on the target IP are listened and services are running. This is why Nmap should always be run with the -Pn flag if you are confident that the IP is ‘alive’.

The ttl field belongs to the IP protocol, and your favorite TCP and UDP transports (as well as ICMP (ping)) operate underneath it. Accordingly, you can analyze IP.ttl directly from the TCP port.

If a remote IP address has several open ports, their IP.ttl values can be different. On the Internet, a server doesn’t necessarily belong to the same IP address. In reality, it can be some kind of network device that forwards some ports to one server in the DMZ and some to other servers.

In this particular case, the IP.ttl values indicate that four systems are actually hidden behind the IP address. One of them is Windows-based and three are Unix-based. A bit later, I will show how to determine systems behind ports more accurately.

Detecting scan interference

The fact that the longer is the route, the lower is the IP.ttl value can be used to detect scan interference (i.e. false positives). When you run a scan from ‘censored’ networks (e.g. corporate networks, VPNs, etc.), part of the traffic often doesn’t go beyond such networks. In most cases, you won’t even notice this, but sometimes this is manifested in the presence of fake open ports. This happens because some connections forward traffic to one or another internal host (e.g. Proxy, DNS, or SMTP server). Below is an illustrative example of such a situation.

But if you look at IP.ttl values in response packets, you’ll notice something strange.

Remember the initial OS-specific IP.ttl values (i.e. 255, 128, 64, and 32)? The screenshot indicates that two ports have changed their IP.ttl values by 6 (which seems to be true), while one of the ports has changed its IP.ttl value by only 1. In other words, there is only one network device between you and the target (Google server) – as if both of you were in Silicon Valley, not on different continents. Based on this fact, you conclude that network equipment of your provider, corporate office, or VPN forwards this part of traffic.

This knowledge will greatly simplify your life when you explore a network.

Traceroute

A sophisticated, but well-known trick called “traceroute” exploits the fact that every network device forwarding your traffic reduces IP.ttl by 1 and that you can set an arbitrary initial IP.ttl value.

The idea is as follows. First, you send a packet with a very short IP.ttl=1. Such a packet won’t pass beyond the first hop and will be deleted by it; but when a packet is deleted, the network device usually notifies the sender of this by sending a special ICMP packet. By recording such a packet (namely, the sender’s IP address), you determine the address of the first hop. Then you send the same packet with IP.ttl=2 and get a response from the second hop, and so on… In the end, you find out IP addresses of all network devices located along the packet’s route to the destination.

But the problem is that, by default, Windows and Linux implement traceroute differently. Windows uses for this purpose ICMP; while Linux, UDP. Accordingly, you can get different results on different systems since different packets can use different routes. To check this, you have to perform traceroute to the same IP address three times using TCP, UDP, and ICMP.

The above screenshots indicate that the actual route of the packet to the same node can slightly change.

Note that the same packet has several possible routes at once and can ‘choose’ between them with varying degrees of probability. This is indicated by changes in IP.ttl under the same conditions.

The packet’s route periodically changes by 13 hops. Such changes in IP.ttl indicate changes in the route (initial IP.ttl=128). Sometimes this means that traffic is intercepted globally.

If IP.ttl periodically changes to ttl<64 or 128<ttl<255, this indicates that the server you are interacting with is changed (i.e. balancing is performed). A little later, I will show how to determine balancing servers more accurately.

To analyze traffic of numerous packets, some visualization is required:

msf> services -c port,proto
sudo ./path_discover.py
8.8.8.8 53 tcp
8.8.8.8 53 udp
^D

This simple example shows the node where the routes diverge. The tool conveniently groups information by autonomous zones (i.e. responsible companies).

Information about the route can be used, for instance, to understand which company’s equipment is blocking traffic or to prepare an attack on intermediate network equipment with the purpose to intercept desired traffic.

To make traceroute even more informative, the IP addresses can be georeferenced. As a result, the traffic path becomes clearer.

Interestingly, but according to geoip, the packet could avoid leaving the country. Later, I will show how to add some useful features to geoip traceroute.

Analyzing IP.id

Every IP packet has the identification field.

This is kind of a serial number. Due to time delays, packets may arrive at the destination out of order, and this field is used to restore the correct sequence of IP packets.

Determining traffic usage

Many operating systems, including some Unix versions and, of course, Windows, have a distinct feature: changes in IP.id can be global. In other words, if a remote node sends packets not only to you, but also to someone else, then the IP.id field in its packets will change by more than one. This simple feature can be used to determine host’s network activity.

If you see that IP.id in the response has changed by more than one, this indicates that between your request and the response to it, the remote node has downloaded something (or something was downloaded from it).

Analysis of host’s network activity can help in many situations. For instance, an attacker can find out when the target PC isn’t in use to perform some malicious actions on it without alerting the user. Or estimate traffic on the website to decide whether to compromise it or not. Or just spy on the user.

In fact, the ability to see the activity of any host on the Internet is a great power, especially if the host is idle (IP.id ++1); this enables you to detect moments when it’s accessed. For instance, if this is an SMTP server, then you can detect the delivery of a letter by a small spike in network activity. Of course, this is not a fully functional network sniffer, but it operates quietly and is applicable almost everywhere. This enables you to perform so-called correlation attacks representing a sophisticated but powerful deanonymization technique. But that’s another story… Overall, depending on the situation and your imagination, the above-described technique makes it possible to find answers to many questions.

But what prevents you from looking at the network activity of a number of hosts on a specific TCP port at once?

As you can see, some servers are idle, while others receive requests. Some are running Windows (ttl=128); others, Linux (ttl=64); and one gateway is a Cisco device (ttl=255). Traffic fluctuations on the gateway are times lower than fluctuations on all devices combined; this indicates that the major portion of network interaction with servers occurs within the same network segment (i.e. the gateway isn’t involved in it).

Since id and ttl are located in the IP layer of the packet, you can analyze them in combination with the traceroute technique (TCP, UDP, and ICMP). This enables you to see the network activity of each device passed by the packet en route to its destination.

Quite predictably, IP.ttl indicates that the traffic comes mainly through Cisco devices. Based on changes in IP.id, you can analyze the workload of network devices and find out how much foreign traffic goes along these routes. However, such a behavior occurs not in every case.

Since the IP.id field consists of only two bytes, it will be reduced to zero after 65,535 packets. Therefore, if you see that the IP.id value has changed in a seemingly random way, it’s most likely because the IP.id value has completed its full cycle (perhaps, even more than once). This indicates that the remote node sends more than 65 thousand packets per second. In such a situation, you can send packets at shorter intervals to measure changes in IP.id more accurately.

Now you can see the IP.id increment more distinctly; it indicates how many packets the remote host sends in 100 ms.

Using basic tools, it would be difficult to implement such a mechanism for the traceroute mode: you have to set a specific IP.ttl value for each hop, send two packets almost immediately after each other, and then compare the IP.id delta. However, this technique implemented in my traceroute.py script.

Idle scan

A predictable IP.id value can even be used to interact with the target when it’s scanned via an intermediate host.

This technique makes it possible to anonymously ‘probe’ a remote host by substituting the sender’s IP address (provided that it’s ‘resting’) with a third host featuring a predictable change in IP.id.

You can see that IP.id didn’t change by more than one until another packet was sent. This is how you can search for zoombie hosts; this scan type is implemented in Nmap as the idle scan mode.

Too bad, sender’s IP address can be substituted not in all networks.

TCP.options

The TCP transport protocol has the options field.

Determining uptime

One of the options in this field enables you to request uptime for a remote host. The point is that you can 100% legitimately request it for any remote host.

For instance, on a local network, you can see when its users have turned off their PCs last time or how long have the servers been running.

This is how the attacker can determine the role of a particular host: either a server (high uptime) or a workstation (low uptime).

If servers have similar uptime, you can assume that there was a power outage or a massive restart for some reason. Just use deduction.

OS identification

As you remember, on the Internet, some of the ports located at the same IP address are often forwarded to specific DMZ servers. For an attacker, such information is very useful: it’s always better when you have a network topology map in front of your eyes.

To find out what ports correspond to various systems, you can request uptime from each port and group the scan results on its basis. For instance, analysis of IP.ttl makes it possible to conclude that two systems are ‘hidden’ behind the examined IP address since the packets contain different ttl values.

Uptime analysis confirms that more than one server is hidden behind this IP address: in total, three servers have been identified.

This analysis is useful if ports whose functions are unknown are present on a remote host, and you have information about at least some of its ports. If there is only one IP address, it won’t be difficult to perform this analysis, but if you are dealing with multiple targets, additional visualization is required:

sudo ./path_discover_group.py
8.8.8.8 53 tcp
8.8.8.8 443 tcp

The ability to identify such features enables you to detect common patterns in different addresses. For instance, the same server may be located at different IP addresses, but on the same port, etc.

On the Internet, several servers can be hidden behind the same IP address; but on local networks, the situation is quite the opposite: the same server may have several addresses (aliases or dual-homed). To identify such nodes, scan results should be grouped based on uptime.

As you can see, the two marked IP addresses in the screenshot most likely represent the same server that either has several IP addresses (aliases) on the network card or is equipped with several network cards (dual-homed). On a local network, information about multiple IP addresses of network nodes can also be useful (e.g. when you deliver relay attacks).

Determining the number of balancing servers

On the Internet, not only can many systems be hidden behind the same IP address, but several servers can be hidden behind the same port as well. This is called ‘balancing’; it usually occurs on highly loaded services. If you keep connecting to a remote port for some time and analyze its uptime, you can group the results on the basis of this parameter and estimate the number of balancing servers.

In the above screenshot, you can see that the response came 36 times from the server whose uptime is 29 days, 19 times from the server whose uptime is 11 days, etc. Taking that servers with identical uptime values occur very rarely, you can conclude that this site is maintained by at least five servers.

Information on the number of servers that are processing requests can be very useful, for instance, to calculate power for a DDoS attack. In addition, if you suddenly detect RCE on a remote host, this can indicate balancing, which enables you to increase the number of servers under your control.

ICMP.type=13

The ICMP network protocol supports many types of requests.

Determining local UTC time

Using one of these requests, you can get a timestamp and use it to determine the local time at the remote host. In an ICMP packet, it’s contained in the Transmit timestamp field and represents the number of milliseconds since the UTC midnight.

As you can see, there is a discrepancy between originate (current system) and transmit (remote system). It indicates that your local clock and the clock on the remote server don’t match.

Windows returns the timestamp in the opposite byte order (big endian); while Unix, in little endian. The icmp_ts.py script takes into account the difference between Windows and Linux in IP.ttl enabling you to determine time on any node on the network.

Information about time will help both in relative OS identification (similarly to uptime) and in situations when you use a protocol that requires synchronization (e.g. SSL/TLS or Kerberos).

TCP.flags

The TCP transport protocol has the flags field.

This field is used to manage the connection status.

Interference detection

Interference manifested in the presence of auxiliary open ports was discussed at the very beginning of this article. However, the opposite situation (i.e. the target has an open port, but your network or the firewall located directly in front of the target filters off this port) occurs much more frequently. In such situations, you won’t see this open port on the target host.

There are certain rules regulating how operating systems should handle unusual flag sequences in TCP connections. There are plenty of such rules; some of them are OS-specific. But the common rule is the response of the TCP protocol to an ACK packet: any system must respond to it, and it doesn’t matter whether the port is open or closed.

Firewalls don’t always match the OS behavior precisely, but in most situations, it’s not really necessary. Accordingly, if a firewall separates you and your target, you won’t get a response from it to such special packets.

This is how you can determine the presence of a firewall between the attacker and the target system.

OS identification

The well-known p0f tool determines the OS fingerprint in the same way as IP.ttl does, but it uses a wider set of properties and, accordingly, identifies it more accurately. The IP and TCP layers of the packet contain a number of additional fields whose default values are OS-specific. Based on IP.ttl, you can only determine the OS family (i.e. Windows, Unix, or Cisco); while TCP analysis makes it possible to accurately determine even the OS version on the remote host. In fact, this is an entire database; the tool itself can be found at /usr/share/p0f/p0f.fp or /etc/p0f/p0f.fp. It operates completely passively: p0f just listens traffic that has to be initiated.

IP.options

The IP network protocol also has the options field.

Hops disclosure

In the past, active network equipment had small amounts of memory, and the reverse route of the packet was written directly to the IP header. For this purpose, the record route IP option was introduced. Information extracted from record route contains more data compared to the classic traceroute.

Not only does record route record addresses of hops located along the route of a network packet, but it also records addresses of all network interfaces, both incoming and outgoing, passed by the packet on its way to the destination. To compare: traceroute will only show you the IP address on the outgoing interface ‘looking’ towards the sender.

The examination of the above screenshot makes it possible to draw the following conclusions:

  • the packet arrived to the network device from 10.0.1.4; then it left this device from the interface 10.0.1.6 and went to 192.168.255.17;
  • then it left the device from the interface 192.168.27.1;
  • ultimately, the packet reached the target 192.168.27.61, ‘turned around’, and a response was sent from the same address 192.168.27.61 (as indicated by two identical records); and 
  • on the return path of the packet, the route was recorded according to the same logic up to the maximum possible number of entries (eight) in an IP packet.

The same result can be obtained using Nmap.

For further experiments with record route, you can use my scapy-based example (rr_test.py). Using record route or above-described traceroute, you can restore hops and reconstruct the network topology; this function is also implemented in the open-source zenmap tool and in a number of commercial products.

Conclusions

Now you are aware how much nonobvious information about the target can be obtained by examining the topmost network and transport protocols. Imagine the amount of data that can be extracted at the application level!

Enumeration is not about searches for security flaws: you neither run active tests to detect vulnerabilities and misconfigs, nor do you check passwords or passively check versions. But still, you managed to collect a fairly large amount of information about the targets. After all, you never know what vulnerabilities could be identified at the next stage and what kind of puzzle you would eventually have to solve. A tiny missing piece of information can suddenly turn into a workable attack chain that successfully penetrates system defenses.

Even if one or another feature doesn’t pose a risk to the target by itself, the attacker may need this information later to deliver more attacks.


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>