5. Accurate Global Time Synchronization

To have accurate time in all your systems is as important as having a solid network security strategy (achieved by much more than simple firewall boxes). It is one of the primary components of a system administration based on good practices, which leads to organization and security. Specially when administering distributed applications, web-services, or even a distributed security monitoring tool, accurate time is a must.

5.1. NTP: The Network Time Protocol

We won't discuss here the protocol, but how this wonderful invention, added to the pervasivenes of the Internet, can be useful for us. You can find more about it at www.ntp.org.

Once your system is properly setup, NTP will manage to keep its time accurate, making very small adjustments to not impact the running applications.

People can get exact time using hardware based on atom's electrons frequency. There is also a method based on GPS (Global Positioning System). The first is more accurate, but the second is pretty good also. Atomic clocks require very special and expensive equipment, but their maintainers (usually universities and research labs) connect them to computers, that run an NTP daemon, and some of them are connected to the Internet, that finally let us access them for free. And this is how we'll synchronize our systems.

5.2. Building a Simple Time Synchronization Architecture

You will need:

  1. A direct or indirect (through a firewall) connection to the Internet.

  2. Choose some NTP servers. You can use the public server pool.ntp.org, or choose some from the stratum 2 public time servers on NTP website. If you don't have an Internet access, your WAN administrator (must be a clever guy) can provide you some internal addresses.

  3. Have the NTP package installed in all systems you want to synchronize. You can find RPMs in your favorite Linux distribution CD, or make a search on rpmfind.net.

Here is an example of good architecture:

Figure 1. Local Relay Servers for NTP

If you have several machines to synchronize, do not make them all access the remote NTP servers you chose. Only 2 of your server farm's machines must access remote NTP servers, and the other machines will sync with these 2. We will call them the Relay Servers.

Your Relay Servers can be any machine already available in your network. NTP consumes low memory and CPU. You don't need a dedicated machine for it.

Tip

It is a good idea to create hostname aliases for your local Relay Servers like ntp1.my.com and ntp2.my.com, and use only these names when configuring the client machines. This way you can move the NTP functionality to a new Relay Server (with a different IP and hostname), without having to reconfigure the clients. Ask your DNS administrator to create such aliases.

5.3. NTP Configurations

For Your Relay Servers

Edit /etc/ntp.conf and add the remote servers you chose:

Example 5. Relay machines' /etc/ntp.conf


.
.
server  otherntp.server.org	# A stratum 1 server at server.org
server  ntp.research.gov	# A stratum 2 server at research.gov
.
.

Again, you can use the public server pool.ntp.org, or get a list of public stratum 2 time servers from NTP website.

For Your Clients

Edit /etc/ntp.conf and add your Relay Servers with a standard name:

Example 6. Client machines' /etc/ntp.conf


.
.
server  ntp1.my.com		# My first local relay
server  ntp2.my.com		# My second local relay
.
.

If your machine has a UTC time difference bigger than some minutes comparing to the NTP servers, NTP will not work. So you must do a first full sync, and I recommend you to do it in a non-production hour. You need to do it only when you are making the initial NTP setup. Never more:

Example 7. First sync

bash# ntpdate otherntp.research.gov	(1)
24 Mar 18:16:36 ntpdate[10254]: step time server 200.100.20.10 offset -15.266188 sec
bash# ntpdate otherntp.research.gov	(2)
24 Mar 18:16:43 ntpdate[10255]: adjust time server 200.100.20.10 offset -0.000267 sec
(1)
First full sync. We were 15 seconds late.
(2)
Second full sync, just to be sure. Now we are virtually 0 seconds late, which is good.

The last step is to start or restart the NTP daemons in each machine:


bash# service ntpd restart
			

5.4. Watching Your Box Synchronizing

Now you have everything setup. NTP will softly keep your machine time synchronized. You can watch this process using the NTP Query (ntpq command):

Example 8. A time synchronization status

bash# ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
-jj.cs.umb.edu   gandalf.sigmaso  3 u   95 1024  377   31.681  -18.549   1.572
 milo.mcs.anl.go ntp0.mcs.anl.go  2 u  818 1024  125   41.993  -15.264   1.392
-mailer1.psc.edu ntp1.usno.navy.  2 u  972 1024  377   38.206   19.589  28.028
-dr-zaius.cs.wis ben.cs.wisc.edu  2 u  502 1024  357   55.098    3.979   0.333
+taylor.cs.wisc. ben.cs.wisc.edu  2 u  454 1024  347   54.127    3.379   0.047
-ntp0.cis.strath harris.cc.strat  3 u  507 1024  377  115.274   -5.025   1.642
*clock.via.net   .GPS.            1 u  426 1024  377  107.424   -3.018   2.534
 ntp1.conectiv.c 0.0.0.0         16 u    - 1024    0    0.000    0.000 4000.00
+bonehed.lcs.mit .GPS.            1 u  984 1024  377   25.126    0.131  30.939
-world.std.com   204.34.198.40    2 u  119 1024  377   24.229   -6.884   0.421

The meaning of each column

remote

Is the name of the remote NTP server. If you use the -n switch, you will see the IP addresses of these servers instead of their hostnames.

refid

Indicates where each server is getting its time right now. It can be a server hostname or something like .GPS., indicating a Global Positioning System source.

st

Stratum is a number from 1 to 16, to indicate the remote server precision. 1 is the most accurate, 16 means 'server unreachable'. Your Stratum will be equal to the accurate remote server plus 1. Never connect to a Stratum 1 server, use Stratum 2 servers! Stratum 2 servers are also good for our purposes, and this policy is good for reducing the traffic to the Stratum 1 servers.

poll

The polling interval (in seconds) between time requests. The value will range between the minimum and maximum allowed polling values. Initially the value will be smaller to allow synchronization to occur quickly. After the clocks are 'in sync' the polling value will increase to reduce network traffic and load on popular time servers.

reach

This is an octal representation of an array of 8 bits, representing the last 8 times the local machine tried to reach the server. The bit is set if the remote server was reached.

delay

The amount of time (seconds) needed to receive a response for a "what time is it" request.

offset

The most important value. The difference of time between the local and remote server. In the course of synchronization, the offset time lowers down, indicating that the local machine time is getting more accurate.

jitter

Dispersion, also called Jitter, is a measure of the statistical variance of the offset across several successive request/response pairs. Lower dispersion values are preferred over higher dispersion values. Lower dispersions allow more accurate time synchronization.

The meaning of the signs before server hostname

-

Means the local NTP service doesn't like this server very much

+

Means the local NTP service likes this server

x

Marks a bad host

*

Indicates the current favorite

5.5. Configure to Automatically Run NTP at Boot

You may want to have NTP running all the time even if you reboot your machine. On each machine, do the following:


bash# chkconfig --level 2345 ntpd on
			

This will ensure autostart.

If your machine is up and running for a long time (months, years) without rebooting, you'll find a big discrepancy between the inaccurate hardware clock and the (now very accurate) system time. Modern Linux distributions copy OS time to the HC everytime the system is shutdown, using a mechanism similar to the setclock command. This way, in the next OS boot, you'll get date and time almost as accurate as it was when you shutdown the machine.