of one mean solar day is slightly longer than 86,400 seconds (a UTC day). The purpose of a leap second is to compensate for this drift, by scheduling days with 86401 or 86399 international standard seconds.
Because the Earth's rotation speed varies in response to natural events, UTC leap seconds are irregularly spaced and unpredictable. The last leap second occured at 23:59:59 UTC on 31 December 2008. Leap seconds occur based on UTC time, and therefore are timezone independent and occur around the world at the same moment, regardless of local time.
* How the event is triggered *
- The system inherits a leap second flag from an upstream NTP server with knowledge of the upcoming leap second. (This occurs on the day of the leap second event and cannot be unset.)
- At 23:59:59 UTC on the leap second day, the kernel sees the leap second flag and causes 23:59:59 UTC to occur twice
- In order to process the leap second event a lock is acquired to access the current time
- While processing the leap second the kernel issues a printk to notify the user that the leap second has occurred
- The printk triggers klogd to wake up so that it can process the new kernel message
- klogd attempts to acquire a lock to access the current kernel time.
If step 3 happens on the same core and during the same tick as step 6 then a deadlock occurs (on xtime_lock).
* Likelihood of Occurrence *
It's exceptionally unlikely that the triggering events would happen as required to cause a hang. It is
extremely difficult to trigger this issue during reproduction attempts, even when those reproduction attempts included artificially introducing high printk loads to attempt to trigger the hang.
* Workarounds *
Updating to kernel version kernel-2.6.9-89.EL (RHEL4) or kernel-2.6.18-164.el5 in RHEL5, or any later RHEL kernel is the most reliable method to avoid any impact from this bug. The bug has been
patched in these kernel versions. If your environment includes systems with a kernel version //lower//
than the those patched kernels and you remain concerned even with the low probability of encountering this issue, there are several workarounds available to further mitigate the risk of encountering this bug.
- Manually adjust the system time so that 2012-06-30 T23:59:59 UTC never occurs.
- Disable NTP clients on the affected system at least a full day ahead of the leap second so that the leap second flag is never inherited.
Then, re-enable NTP on those systems after the leap second has occured. It's important to insure that the tzdata package installed on the system has not been updated to include the 2012-06-30 leap second, as the
system can inherit the leap second flag from the tzdata file as well, even if NTP is disabled.
No comments:
Post a Comment