Securing cloud servers with IPSec and Ansible

in

Back in 90's the IPSec suite of protocols was originally invented to provide host-to-host encrypted communications but as result of number of factors is ended up used almost exclusively as gateway-to-gateway protocol for VPN tunnels. Exploring options to secure my cloud infrastructure I found out that IPSec can be quite successful in this task, greatly reducing the burden of transport encryption at application layer.

Background

Having a farm of cloud servers for my personal websites (webcookies.org, cspbuilder.info, this one and some others) managed pretty much exclusively with Ansible I started wondering on how to secure an increasingly complex jungle of server-to-server communications:

  • SQL between application, database and worker servers
  • AMQP between broker (RabbitMQ and Redis) and workers
  • Cache protocol (Redis)
  • Syslog
  • NTP
  • SMTP
  • HTTP between internal APIs

Most of these have their own application-layer transport protection mechanisms, some of them using TLS and some not (NTP). But that's complex and involves maintenance of X.509 certificates, very different configuration syntaxes (LISP-ish RabbitMQ) and often incomplete support for transport encryption in the client libraries for Python 3 in which most of my applications are written.

In this scenario, IPSec has the advantage of moving all the transport security one layer down in the first place, and getting rid of it from the services' configuration in the second place. All the complexity is delegated to the IPSec configuration obviously, but it's single configuration to handle all transports and it's easily deployed using Ansible.

The full source code repo can be found on GitHub: https://github.com/kravietz/ansible-ipsec

Brief intro to IPSec

IPSec is a suite of protocols that together do pretty much the same as TLS, protect confidentiality, authenticity and integrity of a network connection, just one network layer lower. TLS runs over TCP, while IPSec runs over IP and they are independent (you could run a TLS connection inside an IPSec connection without any of them knowing about the other).

IPSec really consists of two protocols: ESP, which is the actual work horse providing symmetric encryption and MAC-based authentication, and IKE, which exchanges symmetric keys between the parties and refreshes them periodically. ESP runs over bare IP and can encapsulate any other protocols such as TCP or UDP. Each ESP connection (called SA) is unidirectional (so you need two for two-way traffic), has an unique, static key for encryption and another one for authentication.

IKE on the other hand runs over UDP on port 500. All the nice packet diagrams are in these Wikipedia articles, so go there if you need to understand more.

In Linux and BSD the ESP protocol is implemented as part of the kernel and can be controlled with setkey command. IKE is implemented in racoon daemon, a quite lightweight one in terms of memory and CPU consumption:

$ ps axuw|grep racoon
root     20823  0.0  0.0  84068  1820 ?        Ss    2015   0:25 /usr/sbin/racoon

The setkey command also controls kernel's table describing which traffic needs to be encrypted (SPD). Now, this is actually important because it's one of the main configuration points in IPSec: the kernel, hitting a matching entry in SPD, will look for an existing ESP session in another table (SAD). If it's not there, it will ask racoon to establish one. The latter will speak to the racoon on the other side over 500/udp, they will exchange symmetric keys and establish an unique ESP session. This sounds complicated but it's no different from what happens during a TLS handshake and takes fractions of seconds in real life.

In Ubuntu Linux you add SPD entries to /etc/ipsec-tools.conf and racoon configuration is stored in /etc/racoon.

IPSec based on IKE and Racoon

The following configuration files are taken from my Ansible templates and use Jinja2 templating language, both of them generally Pythonic by nature.

First the pre-shared secret configuration for racoon stored in /etc/racoon/psk.txt. The file is sequence of IP addresses and secrets used to authenticate the link between this server and that IP address:

98.143.148.144 2cadcc0ddf6235b828e5bffb822d899d878331d9e6a686395f3587ef40bf0687
2607:fcd0:0:33:1234:1234:1201:fa 29ff7dbad975fc6f5e7e2a65e3024cfeec85372caed05257fe637d86a29ad7d1

Creating this manually for each server doesn't make sense and guarantees mistakes, but that's where Ansible comes very helpful — all these files can be generated during deployment from an inventory of hosts. Each per-tunnel secret is generated as a SHA256 of endpoint hostnames (hostname, inventory_hostname) and a global secret (ipsec_secret):

{% set secret=(hostname, inventory_hostname, ipsec_secret) %}
{{hostvars[hostname]['ansible_default_ipv4']['address']}} {{ secret | sort | join | hash('sha256') }}

Note: the hostnames need to be sorted so that they're always hashed in the same order and the resulting hash is the same on both hosts.

The whole template for generating psk.txt is as follows (including debugging output):

{% for hostname in play_hosts %}
{% if hostname != inventory_hostname %}
{% set secret=(hostname, inventory_hostname, ipsec_secret) %}
# secret for connections from {{inventory_hostname}} (this host) and {{hostname}}, generated using
# SHA256{{secret}}
{{hostvars[hostname]['ansible_default_ipv4']['address']}} {{ secret | sort | join | hash('sha256') }}
 
{% endif %}
{% endfor %}

The racoon.conf file is essentially a sequence of remote IP statements, each initiating an IKE key exchange with that IP. There's also one sainfo section configuring cryptographic parameters for all the low-level ESP connections established by IKE.

log info;
path pre_shared_key "/etc/racoon/psk.txt";
 
sainfo anonymous  {
    pfs_group modp1536;
    encryption_algorithm aes;
    authentication_algorithm hmac_sha256;
    compression_algorithm deflate;
}
 
remote 2607:fcd0:0:33:1234:1234:1201:f5 {
    exchange_mode main,aggressive;
    proposal {
        lifetime time 24 hour;
        dh_group modp1536;
        encryption_algorithm aes;
        hash_algorithm sha256;
        authentication_method pre_shared_key;
    }
}

Note: the encryption parameters in sainfo and remote sections looks similar, but they are for different protocols. Whatever you configure in sainfo will be used for bulk data transfer over ESP. The parameters in remote will be used only for protecting rare IKE packets over 500/udp, so you can use larger DH groups and longer keys here, as well as more sophisticated authentication schemes (including X.509)

And a matching Ansible template to generate racoon.conf:

log info;
path pre_shared_key "/etc/racoon/psk.txt";
 
{% macro remote(ip) %}
remote {{ip}} {
    exchange_mode main,aggressive;
    proposal {
        lifetime time 24 hour;
        dh_group modp1536;
        encryption_algorithm aes;
        hash_algorithm sha256;
        authentication_method pre_shared_key;
    }
}
{% endmacro %}
 
sainfo anonymous  {
    pfs_group modp1536;
    encryption_algorithm aes;
    authentication_algorithm hmac_sha256;
    compression_algorithm deflate;
}
 
{% for hostname in play_hosts %}
 {% if hostname != inventory_hostname %}
  #  {{hostname}}
   {{ remote(hostvars[hostname]['ansible_default_ipv4']['address']) }}
 {% endif %}
{% endfor %}

The last configuration file is the setkey configuration stored in /etc/ipsec-tools.conf. This sample entry, generated for host 98.143.148.144, means that any packets going out to 98.143.148.161, and coming back in, are required to be sent over encrypted ESP and optionally compressed.

spdadd 98.143.148.144 98.143.148.161 any -P out ipsec ipcomp/transport//use esp/transport//require;
spdadd 98.143.148.161 98.143.148.144 any -P in ipsec ipcomp/transport//use esp/transport//require;

The template to generate that is really simple:

f{% macro spd(ip1, ip2, dir) %}
spdadd {{ip1}} {{ip2}} any -P {{dir}} ipsec ipcomp/transport//use esp/transport//require;
{% endmacro %}
 
{% for hostname in play_hosts %}
 
{% if hostname != inventory_hostname %}
# {{hostname}}
{% set ip41=hostvars[inventory_hostname]['ansible_default_ipv4']['address'] %}
{% set ip42=hostvars[hostname]['ansible_default_ipv4']['address'] %}
{{ spd(ip41, ip42, "out") }}
{{ spd(ip42, ip41, "in") }}
{% endif %}
 
{% endfor %}

Manual ESP keying

The main advantage of using Racoon for key management is that it will not only securely establish symmetric keys for ESP but also replace them periodically to thwart various attacks based on elapsed time or traffic sent over static ESP. But in continuous integration environments, where build are deployed frequently (even a few times per day), manual keying of ESP could be sufficiently secure if the keys were different each time.

In such case the whole IPSec configuration would be stored in the /etc/ipsec-tools.conf file but, in addition to the SPD configuration, it would also need to contain the actual cryptographic configuration for the ESP connections (SAD). Sample configuration block for one pair of hosts would look like that:

# ESP 98.143.148.161 <-> 155.94.254.149
add 98.143.148.161 155.94.254.149 esp 0x47dfce -E aes-cbc 0xa8bc3b14cee3861dd7af724a4a8b9219 -A hmac-sha256 0x504432381558e249cf9ba1d7fa8667c3f17c4170804324ca8b04fac57aa5eb4b ;
add 155.94.254.149 98.143.148.161 esp 0xdd307b -E aes-cbc 0xce157961c787efb38e03f059dfad5ea4 -A hmac-sha256 0x0bac4148cf14bf277f21b65b596c15c6e821150d0ac1a8498c9b15d56be4ebe3 ;
# IPComp 98.143.148.161 <-> 155.94.254.149
add 98.143.148.161 155.94.254.149 ipcomp 0x0226d3 -C deflate;
add 155.94.254.149 98.143.148.161 ipcomp 0x76485f -C deflate;
# SPD for incoming and outgoing traffic with 155.94.254.149
spdadd 98.143.148.161 155.94.254.149 any -P out ipsec ipcomp/transport//use esp/transport//require;
spdadd 155.94.254.149 98.143.148.161 any -P in ipsec ipcomp/transport//use esp/transport//require;

Note: in manual ESP scenario we have to define everything that previously Racoon took care about. This includes actual cryptographic key for AES encryption and HMAC, their connection identifiers (SPI) as well as IPComp connections and their SPIs. This is a lot of text, but as long as it's generated automatically I don't care.

The template to generate the manual configuration would be quite simple:

{% for hostname in play_hosts %}
 
{% if hostname != inventory_hostname %}
# {{hostname}}
{% set local=hostvars[inventory_hostname]['ansible_default_ipv4']['address'] %}
{% set remote=hostvars[hostname]['ansible_default_ipv4']['address'] %}
# ESP {{local}} <-> {{remote}}
add {{local}} {{remote}} esp {{ spi(local,remote) }} -E {{cipher}} {{ enc(local,remote) }} -A {{mac}} {{ auth(local, remote) }} ;
add {{remote}} {{local}} esp {{ spi(remote,local) }} -E {{cipher}} {{ enc(remote, local) }} -A {{mac}} {{ auth(remote, local) }} ;
# IPComp {{local}} <-> {{remote}}
add {{local}} {{remote}} ipcomp {{ comp(local, remote) }} -C deflate;
add {{remote}} {{local}} ipcomp {{ comp(remote, local) }} -C deflate;
# SPD for incoming and outgoing traffic with {{remote}}
spdadd {{local}} {{remote}} {{proto}} -P out ipsec ipcomp/transport//use esp/transport//require;
spdadd {{remote}} {{local}} {{proto}} -P in ipsec ipcomp/transport//use esp/transport//require;
{% endif %}

But let's have a more thorough look at the macros used to generate actual keying material:

{% set proto="any" %}
{% set cipher="aes-cbc" %}
{% set mac="hmac-sha256" %}
 
{% macro enc(host1, host2) -%}  0x{{ ( ansible_managed ~ host1 ~ host2 ~ ipsec_secret ~ "ESP KEY" ) | hash('sha256') | truncate(32,end='')  }} {%- endmacro %}
{% macro auth(host1, host2) -%} 0x{{ ( ansible_managed ~ host1 ~ host2 ~ ipsec_secret ~ "MAC KEY" )  | hash('sha256') | truncate(64,end='')  }} {%- endmacro %}
{% macro spi(host1, host2) -%}  0x{{ ( ansible_managed ~ host1 ~ host2 ~ ipsec_secret ~ "SPI" )     | hash('sha256') | truncate(6,end='')  }} {%- endmacro %}
{% macro comp(host1, host2) -%} 0x{{ ( ansible_managed ~ host1 ~ host2 ~ ipsec_secret ~ "IPCOMP" )  | hash('sha256') | truncate(6,end='')  }} {%- endmacro %}

The actual keys are generated by concatenating Ansible signature variable ansible_managed, the IP addresses of both ends and a global secret that ensures that the generated key is not predictable. At the end we add a static string (like ESP KEY), different for each key type. The concatenated strings are then passed through SHA256 and truncated to a desired length — the hashing input will look like this:

Ansible managed: /home/kravietz/ansible-ipsec/templates/ipsec-tools-manual.conf modified on 2016-01-07 00:51:24 by kravietz on pax98.143.148.161155.94.254.149Your IPSec SecretESP KEY

Because they keys are generated based on time, they will be different on each Ansible run. Because they depend on endpoint's IP addresses, each connection will have a different key. Because they have the textual descriptor, each key type (auth, enc) will be different.

Remember however that the manual keying is definitely not a solution for production environments — it will work well between development machines, where you might want to avoid the configuration burden and traffic noise generated by Racoon. For production environments, where deployment may be occassional, IKE would be much better choice.

What can go wrong?

IPSec works on low level of the networking stack and its default behaviour is just to drop packets without any warning. And packets may be dropped at many stages, most notably IPSec stack, iptables, forwarding and fragmentation. The recommended approach is as usually to gradually introduce specific elements of the puzzle, testing if things work as expected at each step.

SPD

The SPD table is the key decision point for all IPSec-related processing in the kernel. If you esp/transport//require IPSec, the kernel will neither output nor consume unencrypted packets, but you can also opportunistically esp/transport//use IPSec, in which case the worst thing to happen will be to send unencrypted packets. But we've been doing this all the time until now, so we can live with that for a moment longer...

Second, in SPD you can tell the kernel very precisely which traffic should be encrypted. In the ipsec-tools.conf change this:

spdadd 155.94.222.55 98.143.148.144 any -P out ipsec ipcomp/transport//use esp/transport//require;

to this:

spdadd 155.94.222.55 98.143.148.144 icmp -P out ipsec ipcomp/transport//use esp/transport//require;

and the kernel will only require IPSec for ICMP packets, passing all the other traffic in plain-text.

When you run into trouble, especially with manual keying, you might also want to disable IPComp, which is done by simply removing the ipcomp section:

spdadd 155.94.222.55 98.143.148.144 any -P out ipsec esp/transport//require;

In SPD and SAD definitions pay attention to the in and out keywords. Remember that you can't just copy these lines between hosts because what is outgoing traffic from host A will be incoming traffic on host B (the Ansible templates take care about that already).

Iptables

Assuming that your hosts are fully firewalled, with all INPUT traffic blocked, you will need to open the firewall for two types of traffic: protocol esp (protocol 50) and UDP traffic on port 500 (IKE). The simplest, stateless configuration would be like this:

iptables -A INPUT -p esp -j ACCEPT
iptables -A INPUT -p udp --dport 500 -j ACCEPT

If you forget that, you will see funny things, especially if using the state module. Some connections will work, but not all of them, in not very intuitive pattern. It's because the state module tries to be stateful even for stateless protocols such as UDP and ESP: a single outgoing packet creates a "state" and incoming responses will be allowed. Packets will leave the host initiating connection, but not necessarily reach their destination, where they will be seen dropped on input.

Forwarding (Docker)

All above examples are using IPSec in transport mode which silently assumes we're dealing with host-to-host connections and no forwarding of packets for third parties is involved (in which case we would use tunnel mode). However, if you start using Docker you will quickly discover that while it's not really a typical tunnelling scenario, it also involves packet forwarding from kernel's point of view which requires surprisingly simple fix.

In SPD, each outin pair will need to be duplicated with the fwd keyword. So for each pair of hosts we will now have four SPD entries:

spdadd 155.94.222.55 98.143.148.144 any -P out ipsec ipcomp/transport//use esp/transport//require;
spdadd 98.143.148.144 155.94.222.55 any -P in ipsec ipcomp/transport//use esp/transport//require;
spdadd 155.94.222.55 98.143.148.144 any -P fwd ipsec ipcomp/transport//use esp/transport//require;
spdadd 98.143.148.144 155.94.222.55 any -P fwd ipsec ipcomp/transport//use esp/transport//require;

This will tell the kernel that the traffic that is strictly speaking forwarded to and from the virtual docker0 interface to the real network interface is also subject to IPSec processing. When you add these lines, reload SPD with service startkey restart and reload Docker with service docker reload the traffic should start flowing between the Docker containers and the world.

You might also want to ensure that the Docker traffic is actually forwarded (forwarding) and not blocked by the IP spoofing filter (rp_filter) — that's one of the things that Docker reload script does:

net.ipv4.conf.default.rp_filter=0
net.ipv6.conf.all.forwarding=1

Tcpdump is your best friend

Tcpdump (or Wireshark) is very useful in all kinds of low-level network debugging, including IPSec problems. First, you can actually confirm that packets are sent encrypted (ESP) or not (anything else):

# tcpdump -ni em1 esp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on em1, link-type EN10MB (Ethernet), capture size 65535 bytes
01:32:46.635207 IP6 2001:470:1f09:1008::20 > 2607:fcd0:0:33:1234:1234:1201:f5: ESP(spi=0x014434c1,seq=0x201), length 116
01:32:46.788267 IP6 2607:fcd0:0:33:1234:1234:1201:f5 > 2001:470:1f09:1008::20: ESP(spi=0x075a1bd4,seq=0x21b), length 116

Second, you will see if one of the parties is sending packets but no response is coming back. This can be caused by any of the reasons mentioned above (networking unrelated to IPSec) or by the IPSec stack. The latter will drop packets if there's encryption or authentication key mismatch and if there's no matching SPD or SAD entries in kernel.

Third, you can confirm whether IPComp is working — just try to send a big packet with repeatable data to the other host, for example using ping -c1 -s 4096. If IPComp is working, you will only see two small packets (as above) even though the data sent was much bigger. If there is no IPComp you will see as many packets as required by the data size divided by the MTU.