I thought my watch had stopped. Or was running wild. It was happening again. Just a few months after I cracked the case of the slipped time reference site in the case I called "Pssst, ABAP Detective, you got a second?", I found systems with differing times. My first clue was a message saying there were 7 seconds missing from a job. This time (no pun intended), the fault wasn't directly seen in an SAP system. I started digging further.
In the last time drift episode, I found clues inside of SAP job logs that led me to review what sources of time synchronization were defined. I couldn't just ask the railroad conductor where he got his timetable from before he blew the train whistle, I needed to discover it for myself. The first pattern was Windows systems having one time, yet UNIX systems having another. When I first checked, it was seven seconds; when I looked later, it was up to 10 seconds and climbing. Not a lot in the grand scheme of things, but if some timestamp got whacky the shippers could lose packages, or something.
Here were the lessons from the last drift:
- Don't use just 2 reference sites. Use 3.
- Use the NTP pools as reference.
- Make sure your reference sites exist.
I could sense these didn't solve everything. To keep this tale short (and maybe sweet), I found out the Windows systems get their time from Active Directory domain controllers. That's fine, if those controllers are straight. If they're crooked, well, the whole continent was drifting out to sea with them, climate change or not. My first new clue was to use a little-known Window utility, since "ntpdate" isn't part of the standard OS.
Rap sheet on remote reference system
$ w32tm /stripchart /computer:0.us.pool.ntp.org /samples:5 /dataonly
Tracking 0.us.pool.ntp.org [184.108.40.206].
Collecting 5 samples.
The current time is 11/13/2012 4:50:01 PM (local time).
Rap sheet on domain controller
$ w32tm /stripchart /computer:domcol /samples:5 /dataonly
Tracking domcol [220.127.116.11].
Collecting 5 samples.
The current time is 11/13/2012 4:47:48 PM (local time).
- MS information on w32tm. "A tool used to diagnose problems occurring with Windows Time"
To make sure my head was on straight, my pocket watch was intact, and I wasn't looking at things in a mirror, I also found a distribution of NTP client for Windows, which included the same "ntpdate" command I had used to run down the culprits in the last chase.
C:\Program Files\NTP\bin>.\ntpdate.exe -d 2.north-america.pool.ntp.org
15 Nov 02:40:24 ntpdate.exe: ntpdate firstname.lastname@example.org Jul 30 11:53:32 (UTC
+02:00) 2012 (1)
15 Nov 02:40:24 ntpdate.exe: Raised to realtime priority class
server 18.104.22.168, port 123
stratum 2, precision -21, leap 00, trust 000
refid [22.214.171.124], delay 0.10367, dispersion 0.00319
transmitted 4, in filter 4
reference time: d44ed3b3.36942e53 Thu, Nov 15 2012 2:36:35.213
originate timestamp: d44ed493.c8d76175 Thu, Nov 15 2012 2:40:19.784
transmit timestamp: d44ed49f.329708c5 Thu, Nov 15 2012 2:40:31.197
filter delay: 0.15057 0.11934 0.10367 0.11923
0.00000 0.00000 0.00000 0.00000
filter offset: -11.4821 -11.4584 -11.4598 -11.4600
0.000000 0.000000 0.000000 0.000000
delay 0.10367, dispersion 0.00319
Pretty much the same result, though now, a day or so later, the delta is over 11 seconds. I'm sure some of you are saying what difference does a few seconds make? In my world, a lot. It's not keeping me awake at night, but it's my mission to set the servers straight. And if their controllers are crooked, well, we can vote them out of office, or we can take them downtown.
- NTP Client For Windows ("NT"): http://www.meinberg.de/english/sw/ntp.htm
- General NTP (Network Time Protocol) Server and Client information: http://www.eecis.udel.edu/~mills/ntp
- NTP and Windows details: http://www.eecis.udel.edu/~mills/ntp/html/hints/winnt.html
- "Time synchronization may not succeed when you try to synchronize with a non-Windows NTP server in Windows Server 2003" Microsoft Knowledge base: 875424
Crude purple crayon drawings