...

How time drift can slow down servers – NTP, Chrony, and time synchronization

NTP Chrony Hosting stops time drift, which slows down servers, by quickly synchronizing clocks, organizing log times, and keeping authentications reliable. I'll show you how. Chrony, NTP, and systemd-timesyncd interact, why drift occurs, and which settings prevent outages and security risks in hosting environments.

Key points

  • Time drift: Causes, consequences, and why milliseconds matter
  • NTP hierarchyStratum design for internal time sources
  • Chrony vs. ntpd vs. systemd-timesyncd in data centers
  • NTS & Hardware timestamps: Security and accuracy
  • Monitoring & Troubleshooting for lasting consistency

How server time drift occurs and what its effects are

Time drift occurs because the RTC A host may run slightly too fast or too slow, and the error accumulates with each passing day. Even small deviations can generate contradictory results. Timestamp, which disrupts transactions, caches, and replication. Certificates can suddenly appear „too early“ or „too late,“ and authentications fail. In distributed systems, the order of events is lost and debugging becomes difficult or impossible. In hosting environments, I regularly see that a lack of synchronization leads to failures that can be avoided with solid time design.

NTP stratum explained briefly

The stratumThe model organizes time sources hierarchically and reduces dependencies on the Internet. Stratum 0 are reference watches such as GPS or radio; Stratum 1 servers are directly connected to them; Stratum 2 obtains data from Stratum 1. In hosting environments, it is worth having an internal Stratum 3 server that supplies all nodes and reduces external load. This allows me to distribute a uniform time to hosts and containers without sending each node to the Internet. This architecture enables consistent logs, matching certificate windows, and replicated databases with clean sequencing.

NTP, Chrony, or systemd-timesyncd? The comparison

I set Chrony in productive setups because it engages faster and tracks cleanly on unstable networks. The classic ntpd Works reliably, but takes longer to „settle in.“ systemd-timesyncd is lightweight and sufficient for simple hosts, but cannot be used as a server. For clusters or hosting, I recommend a uniform implementation on all nodes to avoid mixed operation and side effects. The following table summarizes the most important differences.

implementation Strengths Weaknesses Suitable for
Chrony Fast synchronization, tolerant of packet loss, server and client mode, good offline handling More options require clean configuration Productive servers, clouds, VMs, containers
ntpd Tried and tested over many years, widely available Slow to start up, less flexible with mobile hosts Legacy environments, conservative setups
systemd-timesyncd Slim, SNTP client, virtually „zero config“ No server mode, limited features Small servers, appliances, simple VMs

Role model: Clearly separate time clients and internal servers

In practice, I make a strict distinction between Client only-Hosts and internal NTP servers. Clients only query defined sources and do not offer an NTP port themselves. Internal servers aggregate multiple sources, check quality, and distribute time to the environment. This reduces the attack surface and keeps the chain of dependency short.

It is important to set polling intervals and preferences correctly. I mark a reliable internal source with prefer and keep external providers as a fallback. In networks with latency fluctuations, I occasionally lower minpoll, To measure corrections more quickly, but increase maxpoll again once stability has been achieved in order to keep network load low.

Chrony in practice: Configuration for hosting

I start with a clear chrony.conf, which defines drift, stratum, and accesses. A minimal basis includes:

driftfile /var/lib/chrony/drift
local stratum 8
manual
allow 192.168.0.0/16

The drift file remembers the clock error and speeds up the correction after reboots. With „local stratum 8,“ the internal server remains low priority if external sources are available. „allow“ controls which networks are allowed to obtain time and prevents abuse. I activate the service with systemctl start chronyd and systemctl enable chronyd and then check the status and sources.

Client-only and server profiles

On pure clients, I disable the server port and keep the configuration lean:

# Client-only profile
server ntp-internal.example iburst prefer
server ntp-external-1.example iburst
server ntp-external-2.example iburst
port 0
makestep 1.0 3
rtcsync
leapsectz right/UTC

port 0 prevents the host itself from offering time. makestep 1.0 3 Allows a hard correction >1s in the first three measurements, after which only geslewt (slightly adjusted). rtcsync keeps the RTC synchronized at reasonable intervals so that reboots start without major jumps.

On internal NTP servers, I consolidate sources and finely control access:

# Internal NTP server
pool 0.pool.example iburst maxsources 4
server ref1.example iburst prefer nts
server ref2.example iburst nts
allow 10.0.0.0/8
allow 192.168.0.0/16
bind address 0.0.0.0
bindcmdaddress 127.0.0.1
cmdallow 127.0.0.1
driftfile /var/lib/chrony/drift
makestep 0.5 5
local stratum 8
leapsectz right/UTC

I bind the command socket to 127.0.0.1 and only allow it locally. pool automatically keeps multiple sources up to date. prefer sets the desired primary source. In larger setups, I set bind address targeted at a management VLAN.

Polling, source quality, and stability

In the case of unstable networks, I initially increase the measurement density and then scale up after stabilization:

server ntp-external-1.example iburst minpoll 6 maxpoll 10

With minsamples, maxsamples and maxdistance I cancel bad sources early on. For asynchronous paths or asymmetric routing, it helps to hw timestamp Reducing jitter on suitable NICs:

hwtimestamp eth0

Security and accuracy: NTS, hardware timestamps, leap seconds

I protect NTP connections with NTS, so that an attacker cannot inject false time data. An entry such as server time.cloudflare.com iburst nts provides a quick start through iburst and cryptographic security. Where the network card allows it, I activate hardware timestamping to circumvent latency fluctuations in the kernel. For leap seconds, I use „leapsectz right/UTC“ so that services do not experience hard time jumps. This combination keeps services reliable and prevents errors in sensitive applications.

Curing and mesh design

I limit myself UDP/123 strictly to the designated networks, both in the direction of in detail (clients → internal server) as well as outgoing (Server → external sources). On clients, I set port 0, so that they cannot be misused as a source in the first place. allow/deny Chrony provides additional filtering. In segmented networks, I position the internal servers in a network with low latency to the workers and keep the path deterministic (no asymmetric routes, no excessive shaping).

NTS requires an initial key agreement via a dedicated port. I only allow this target port to be used by trusted providers. If NTS fails, I define a deliberate fallback behavior (strict alarm instead of silent switching to unsecured sources). This way, I avoid a „silent decay“ of security.

Leap second strategies and smearing

I decide per environment: classic leap handling (UTC with leap second) or Leapsmearing, where the second is smoothed over a window. Important: Do not mix. If some sources smear and others do not, permanent offsets will occur. In critical clusters, I keep the entire fleet on the same line and document the choice. Chrony allows clean leap handling via leapsectz; if you smooth, you must plan this consistently for all nodes.

Monitoring and troubleshooting: Making drift visible

I check status and offsets with timedatectl as well as Chrony tools such as chronyc sources and tracking. Deviations between RTC and system time are normal at first, but should quickly become smaller. For long-term monitoring, I integrate metrics and alarms into a monitoring stack. This allows me to identify trends, peaks, and outliers before users notice anything. Alerts are triggered automatically when offsets exceed defined thresholds.

Key figures and alarm thresholds

  • system offset (Tracking last/avg offset): Warning from 5 ms, critical from 25 ms in web/DB stacks.
  • Root dispersionIndicates the uncertainty of the source. If it increases permanently, I respond by changing sources.
  • Reachability and Jitter Source: Early detection of packet loss and instability.
  • stratumUnexpected stratum increases indicate insulation or source loss.

For ad hoc diagnoses, I also use:

chronyc sourcestats -v
chronyc ntpdata
chronyc rtcdata
chronic activity

Shows activity many invalid sources, I check the firewall, MTU/fragmentation, and asymmetric paths. For large jumps after reboots, makestep often not set or blocked by thresholds that are too narrow.

Best practices for consistent time in clusters

I consider the time source redundant, typically with at least three Servers, so that one can fail. An internal Stratum 3 server supplies the fleet and itself draws from several Stratum 2 sources. I avoid mixed operation with ntpd and Chrony, as different algorithms can cause unexpected offsets I save the RTC in UTC with timedatectl set-local-rtc 0, so that summer time changes don't bring any surprises. I document every change so that I can quickly understand the history in the event of a malfunction.

Kubernetes and orchestration

In Kubernetes and similar orchestrations, I only set Chrony on the nodes, not in individual pods. Containers inherit the host time; duplicate corrections lead to drift. Components such as etcd are sensitive to time errors – even double-digit milliseconds can affect election timeouts. I make sure that the control plane and worker use the same internal source and that no pod/node with leapsmear mix is in use.

Cloud features

Many cloud providers offer internal time servers ready. I like to use these as my primary source (low latency) and supplement them with external NTS sources as a fallback. For instances with hibernation or stops I allow initial steps via makestep. I disable host-to-guest time synchronization via agents when Chrony is active to avoid double corrections.

Special scenarios: VMs, containers, and cloud

In VMs, I pay attention to host-to-guest time, because duplicate Corrections (Hypervisor and guest) create chaos. Containers draw time from the host, so maintenance focuses on the underlying infrastructure. In elastic environments where instances start frequently, fast convergence from Chrony. At edge locations with poor connectivity, you benefit from Chrony's behavior in the event of packet loss and temporary offline phases. For performance analyses related to time reference and latency, this helps me Response time analysis.

Performance effects: databases, logs, and certificates

Clean time reduces strange Deadlocks in databases because transaction sequences remain consistent. Caches invalidate correctly, CRLs and OCSPs work in real time windows. In practice, many „ghost errors“ disappear when offsets are under control. For correct correlation of events, I rely on central log analysis with an identical time source. Certificates behave more reliably because validity windows match the system time.

Migration path to Chrony without interruptions

I plan to implement the change in waves so that Services I'll build an internal Chrony server first and have some staging hosts point to it. Once the sources are running smoothly, I'll gradually switch over the productive nodes. During the migration, I'll measure offsets and wait times to spot any deviations early on. Once everything is consistent, I'll deactivate the old ntpd instances and clean up any old data.

Rollback and contingency plan

I hold a Rollback Ready: I version old configurations and document a sequence for returning to ntpd or systemd-timesyncd, if necessary. For emergencies, I write a short runbook: pause services, chronyd Stop, set time manually (only if absolutely necessary), restart service, check sources, monitor offsets. It is critical to limit manual intervention to avoid jumps in applications.

Checklist for implementation

I start by defining clear time sources and the target hierarchy with an internal Stratum 3 server. I then create a uniform configuration for all hosts, test it in staging, and document it. I activate NTS where appropriate and check hardware timestamping on the appropriate network card. I then integrate metrics into alarms and set offset thresholds. Finally, I schedule regular checks so that time errors don't become significant in the first place.

Runbook: 10-minute health check

When something seems „strange,“ I proceed as follows:

  1. system status: timedatectl (NTP active? RTC in UTC?)
  2. Sources: chronyc sources -v (Reach, Stratum, Jitter)
  3. Tracking: chronyc tracking (Offset, skew, root dispersion)
  4. NetCheck firewalls/ACLs for UDP/123, measure latency/loss
  5. Drift: chronyc source stats observe for several minutes
  6. RTC: chronyc rtcdata, if applicable. rtcsync activate
  7. SecurityCheck NTS status, no silent degradation

Costs and benefits in euros

An incorrect clock quickly costs time and money: failed deployments, support cases, and SLA deductions add up. Setting up an internal Chrony server and monitoring is inexpensive, often costing only a few hundred dollars. On the other hand, avoided downtime can easily save four to five figures in Euro endanger. Especially in clusters with many transactions, synchronization pays off day after day. I therefore see NTP/NTS and Chrony as a must rather than an option.

Summary

Time drift slows down servers, confuses logs, and throws certificates out of sync. With Chrony, NTP, and an internal stratum design, I keep clocks synchronized and services reliable. NTS protects the source, hardware timestamping smooths latency, and correct leap second handling prevents jumps. Monitoring with metrics and alarms shows deviations before users notice them. Those who set up NTP Chrony Hosting cleanly will enjoy consistent time windows, fewer disruptions, and measurable benefits in dollars.

Current articles