The price paid for the absolute universality and flexibility of IPsec is the excessive complexity of the protocol (not to mention its debugging and configuring). In addition, the first version of IPsec was developed prior to the invention of dynamic addresses, NAT, and mobile connections. It was not possible to completely revamp the protocol architecture; so, numerous extensions were added to it, making IPsec even more complicated.
One might ask: why not to forget IPsec for good and switch to more handy solutions (e.g. above-mentioned OpenVPN)? But the problem is that IPsec is the only generally accepted standard and the only protocol supported by network equipment from all manufacturers.
VPN services that raise your anonymity and security are designed for end users who can choose their OS and programs to be installed on this OS. Users of a corporate network may not have such a freedom of choice. And if you connect together two networks using equipment from different manufacturers, your only option is IPsec.
In other words, if you are involved with network administration, you have to deal with IPsec despite all its complexity and associated issues. But the problem is not limited to the protocol sophistication! Different implementations and default settings (as well as incompetent admins on the other side who never share with you essential debugging information) make your life a true nightmare. In such situations, tunnel configuring and debugging turns into divination by logs.
To make things worse, IPsec logs are often full of specific terminology and unclear to novice admins not familiar with all protocol details. This article addresses most common configuration errors and their descriptions in the logs.
I will use strongSwan as an example. This free IPsec (to be specific, IKE) implementation is very popular, and many Linux and FreeBSD distributions (OpenWRT, pfSense, Sophos, VyOS, etc.) use it for connection to other network devices.
Theory
The IPsec protocol consists of two components: the IKE (Internet Key Exchange) protocol on the one hand and the AH and ESP (Authentication Header and Encapsulated Security Payload) protocols on the other hand.
AH and ESP are responsible for traffic encryption and authentication; these protocols are implemented in the OS kernel or in the router hardware. To make them work, the encryption parameters must be negotiated on both sides. A set of traffic selectors and encryption parameters that tell the kernel what to do with specific packets is called a Security Association (SA). In many systems, SAs can be configured manually. For instance, in Linux, you can do this with the ip xfrm command. But in real life, this method is almost never used due to its complexity.
The IKE protocol is used to automate the key exchange and negotiate the settings. It is normally implemented in the user space: the protocol itself doesn’t encrypt traffic but creates Security Associations in the kernel (in Linux, the netlink protocol is used for communication between user space processes and the kernel). The IKE protocol is usually configured by the system admins, and all errors and incompatibilities in its settings pop up during the negotiation of the tunnel parameters.
Practice
Prior to examining logs, I have to deploy a test system.
Creating a tunnel
First of all, I need two computers (hereinafter East and West) with strongSwan v. 5.2 or newer installed on them; their symbolic addresses are 192.0.2.10 and 203.0.113.10.
info
The best way is to take symbolic public addresses from networks specifically reserved in RFC 5737 for examples and documentation:
- 192.0.2.0/24
- 198.51.100.0/24
- 203.0.113.0/24
For the sake of simplicity, I use a static pre-shared key – even though strongSwan supports RSA keys and x.509 certificates.
First, I create /
and /
for East.
config setupconn tunnel-west left=192.0.2.10 right=203.0.113.10 leftsubnet=10.20.30.0/24 rightsubnet=10.40.50.0/24 leftsubnet=10.20.30.0/24 ike=aes128-sha1-modp2048! keyexchange=ikev2 reauth=no ikelifetime=28800s closeaction=none esp=aes128-sha1! keylife=3600s rekeymargin=540s type=tunnel compress=no authby=secret auto=start keyingtries=%forever
192.0.2.10 203.0.113.10 : PSK "qwerty"
Then I create /
и /
for West.
config setupconn tunnel-east left=203.0.113.10 right=192.0.2.10 leftsubnet=10.40.50.0/24 rightsubnet=10.20.30.0/24 leftsubnet=10.40.50.0/24 ike=aes128-sha1-modp2048! keyexchange=ikev2 reauth=no ikelifetime=28800s closeaction=none esp=aes128-sha1! keylife=3600s rekeymargin=540s type=tunnel compress=no authby=secret auto=route keyingtries=1
203.0.113.10 192.0.2.10 : PSK "qwerty"
To check the status of the tunnels, I use the sudo
command. If everything was done correctly, the output should be something like this:
Security
tunnel-west[
tunnel-west[
tunnel-west[
info
To deploy or disable a tunnel manually, use commands in the format: sudo
.
Enabling debugging messages
By default, strongSwan doesn’t save any details in the logs. This is understandable: if everything works well, a detailed record of the IKE dialogue is unnecessary noise.
If you are able to edit the config files, the selection of available options is very large. To apply changes without restarting the daemon, use the command swanctl
or send a SIGHUP to the charon process.
Too bad, many specialized network distributions are very sensitive to attempts to do something bypassing their interface and don’t provide the possibility to edit the config files. But if you have SSH access, you can use the command sudo
to change the log detail level for a running daemon.
The detail levels vary from -1 (nothing is recorded) to 4 (everything is recorded, including secret keys). At Level 3, packet dumps are recorded, but no secret information is saved in the logs, which is optimal for my purposes.
info
Unfortunately, IKE doesn’t generate detailed error reports: the initiator normally receives something like “no proposal chosen”. Therefore, debugging messages must be enabled and reviewed on the receiver side. If you have to debug a nonfunctioning tunnel, switch your router to the passive mode (the auto=route
option in strongSwan)
Incompatible IKE versions
There are two IKE versions: an old one (IKEv1) and a new one (IKEv2). Many issues of the first version have been fixed in the second one: NAT traversal identification and configuration work smoothly, several local and remote networks can be combined in one tunnel, and the fully featured keepalive mechanism allows both sides to see whether the tunnel is still operational.
But the problem is that IKEv1 and IKEv2 are, in fact, different and not entirely compatible protocols. In early versions of strongSwan (prior to the release of version 5.0), different daemons (pluto and charon) were responsible for them. In newer versions, all the functionality is implemented in charon, but the differences between the protocols still exist making it impossible to automatically downgrade from IKEv2 to IKEv1.
The protocol version in strongSwan is set by the keyexchange
option. I add the keyexchange=ikev1
option to the config on the East side, restart the tunnel, and see what happens. On the East side, I get only the vague “no proposal chosen” message regardless of the log detail level.
rtr-east
rtr-east
rtr-east
rtr-east
rtr-east
However, on the West side, I can see the message “no IKE config found”.
rtr-west
Solution: provide your IKE version to the admin on the other side and ask to check whether the two versions match each other.
Incorrect key
If routers’ addresses are specified plainly in your settings, you will get the message “MAC mismatched”. MAC in this case means Message Authentication Code.
rtr-east
Incompatible encryption and PFS options
PFS (Perfect Forward Secrecy) is a mechanism ensuring long-term cryptographic security. The point is that a temporary session key is generated on the basis of the public key. The public key is never used directly: when a connection is established, a temporary key is immediately generated. This temporary key is changed on disconnection or according to a special schedule. As a result, even if malefactors manage to obtain a temporary key, they would get access only to traffic of the current session. After the expiration of this temporary key, they will have to guess the new key from scratch, which is impractical.
The session key is generated using the Diffie-Hellman key exchange algorithm, either the classical or elliptic-curve one.
Various network devices use different approaches to the PFS configuration. Some systems even have the PFS Enable/Disable option, which is as meaningless as the Enable Encryption option. There is no ‘generic’ encryption – only a specific cipher and its specific operation mode (e.g. AES-128-CBC). Accordingly, there is no ‘generic’ session key generation algorithm.
In reality, the PFS Enable option usually relates to the DH group 2 protocol (modp1024) that is currently considered obsolete and unsafe. Other variants are possible as well, and you may check documentation to find this out. But you can also retrieve this information from the logs.
At high detail levels, strongSwan shows you the ‘received proposals’ (i.e. what the initiator proposes) and the ‘configured proposals’ (i.e. what is configured on your side).
For experimental purposes, I change the esp=aes128-sha1!
option in the config on the West side to esp=aes-128-sha1-modp2048!
, and the logs show me the following information:
rtr-west
rtr-west
rtr-west
The order of the options is as follows: cipher, hash, and PFS. I see that the cipher (AES-128) and hash (SHA-1) match each other, and the only difference is in the MODP_2048 option: it’s present in ‘received proposals’ and absent in ‘configured proposals’. All options with prefixes MODP
and ECP
refer to groups for the DH Protocol. The correlation between group numbers and MODP/ECP notations can be found in the strongSwan documentation.
Incompatibilities between the cipher and the hash occur more rarely because these must be plainly specified in all systems (i.e. you cannot omit them). But now you know where to look. The cipher option format is: $name_$mode_$keyLength
. In the above example, AES in the CBC mode is used; the key length is 128 bit.
Tunnel is ‘alive’, but there is no traffic
Such things happen pretty often, especially with IKEv1. The output of sudo
claims that everything is OK, but traffic disappears and cannot be resumed. The problem is that IKEv1 doesn’t include a mandatory two-way exchange mechanism for keepalive packets. As a result, in the intervals between expirations of the key lifetime, the IKE process doesn’t control anything, and the AH/ESP implementation (i.e. the OS kernel or hardware cryptoprocessor) runs on its own. In other words, packets are encrypted and sent even if no one can accept them on the other side.
The purpose of the DPD (Dead Peer Detection) options is to prevent such situations. However, the timeout values are neither included in the proposal nor negotiated by the parties; so, even if DPD is enabled, one party can consider the tunnel alive much longer than the other one – up to the end of the IKE timeout, which can reach several hours.
If you encounter such a problem, make sure to request values of all DPD timers from the admin on the other side. An even better solution is to persuade everybody to upgrade to IKEv2 where such problems occur much more rarely.
Conclusions
At the first glance, analyzing IPsec logs seems to be a sophisticated art. But when you learn the basics of the IKE protocol, the log messages will become much clearer, while debugging – much faster and easier.
Daniil Baturin
Project Coordinator at VyOS (https://vyos.io), ‘linguist’, functional programmer, sometimes network admin