3. Using TCP keepalive under Linux

Linux has built-in support for keepalive. You need to enable TCP/IP networking in order to use it. You also need procfs support and sysctl support to be able to configure the kernel parameters at runtime.

The procedures involving keepalive use three user-driven variables:

tcp_keepalive_time

the interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further

tcp_keepalive_intvl

the interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime

tcp_keepalive_probes

the number of unacknowledged probes to send before considering the connection dead and notifying the application layer

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using the setsockopt interface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions explained later in this document.

3.1. Configuring the kernel

There are two ways to configure keepalive parameters inside the kernel via userspace commands:

We mainly discuss how this is accomplished on the procfs interface because it's the most used, recommended and the easiest to understand. The sysctl interface, particularly regarding the sysctl(2) syscall and not the sysctl(8) tool, is only here for the purpose of background knowledge.

3.1.1. The procfs interface

This interface requires both sysctl and procfs to be built into the kernel, and procfs mounted somewhere in the filesystem (usually on /proc, as in the examples below). You can read the values for the actual parameters by "catting" files in /proc/sys/net/ipv4/ directory:


  # cat /proc/sys/net/ipv4/tcp_keepalive_time
  7200

  # cat /proc/sys/net/ipv4/tcp_keepalive_intvl
  75

  # cat /proc/sys/net/ipv4/tcp_keepalive_probes
  9
        

The first two parameters are expressed in seconds, and the last is the pure number. This means that the keepalive routines wait for two hours (7200 secs) before sending the first keepalive probe, and then resend it every 75 seconds. If no ACK response is received for nine consecutive times, the connection is marked as broken.

Modifying this value is straightforward: you need to write new values into the files. Suppose you decide to configure the host so that keepalive starts after ten minutes of channel inactivity, and then send probes in intervals of one minute. Because of the high instability of our network trunk and the low value of the interval, suppose you also want to increase the number of probes to 20.

Here's how we would change the settings:


  # echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time

  # echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl

  # echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes
        

To be sure that all succeeds, recheck the files and confirm these new values are showing in place of the old ones.

Remember that procfs handles special files, and you cannot perform any sort of operation on them because they're just an interface within the kernel space, not real files, so try your scripts before using them, and try to use simple access methods as in the examples shown earlier.

You can access the interface through the sysctl(8) tool, specifying what you want to read or write.


  # sysctl \
  > net.ipv4.tcp_keepalive_time \
  > net.ipv4.tcp_keepalive_intvl \
  > net.ipv4.tcp_keepalive_probes
  net.ipv4.tcp_keepalive_time = 7200
  net.ipv4.tcp_keepalive_intvl = 75
  net.ipv4.tcp_keepalive_probes = 9
        

Note that sysctl names are very close to procfs paths. Write is performed using the -w switch of sysctl (8):


  # sysctl -w \
  > net.ipv4.tcp_keepalive_time=600 \
  > net.ipv4.tcp_keepalive_intvl=60 \
  > net.ipv4.tcp_keepalive_probes=20
  net.ipv4.tcp_keepalive_time = 600
  net.ipv4.tcp_keepalive_intvl = 60
  net.ipv4.tcp_keepalive_probes = 20
        

Note that sysctl (8) doesn't use sysctl(2) syscall, but reads and writes directly in the procfs subtree, so you will need procfs enabled in the kernel and mounted in the filesystem, just as you would if you directly accessed the files within the procfs interface. Sysctl(8) is just a different way to do the same thing.

3.1.2. The sysctl interface

There is another way to access kernel variables: sysctl(2 ) syscall. It can be useful when you don't have procfs available because the communication with the kernel is performed directly via syscall and not through the procfs subtree. There is currently no program that wraps this syscall (remember that sysctl(8) doesn't use it).

For more details about using sysctl(2) refer to the manpage.

3.2. Making changes persistent to reboot

There are several ways to reconfigure your system every time it boots up. First, remember that every Linux distribution has its own set of init scripts called by init (8). The most common configurations include the /etc/rc.d/ directory, or the alternative, /etc/init.d/. In any case, you can set the parameters in any of the startup scripts, because keepalive rereads the values every time its procedures need them. So if you change the value of tcp_keepalive_intvl when the connection is still up, the kernel will use the new value going forward.

There are three spots where the initialization commands should logically be placed: the first is where your network is configured, the second is the rc.local script, usually included in all distributions, which is known as the place where user configuration setups are done. The third place may already exist in your system. Referring back to the sysctl (8) tool, you can see that the -p switch loads settings from the /etc/sysctl.conf configuration file. In many cases your init script already performs the sysctl -p (you can "grep" it in the configuration directory for confirmation), and so you just have to add the lines in /etc/sysctl.conf to make them load at every boot. For more information about the syntax of sysctl.conf(5), refer to the manpage.