...

Interrupt handling on servers: How CPU interrupts affect performance

CPU interrupts control how quickly my server responds to network packets, storage events and timers - incorrectly distributed or too frequent interruptions slow down applications measurably. A clean interrupt handling server reduces context switches, lowers latencies and stabilizes response times during peak loads.

Key points

I will summarize the following key aspects before going into detail:

  • Interrupt load understand: When percentages become critical
  • Parallelism manage: Simultaneous interrupts and worst-case latencies
  • MSI-X use: More news, better distribution
  • RSS & Affinity: Place NIC interrupts on cores
  • Monitoring establish: Reading numbers, taking targeted action

What triggers CPU interrupts on servers

An interrupt is a Signal, which immediately snaps the CPU out of its current task and starts a handler. Network cards report new packets, storage controllers signal completed I/O, timers trigger clocks - each of these interrupts costs CPU time. With high activity, these events add up to many context switches and cache misses. I therefore observe how often and how long the CPU in the kernel spends on ISRs and DPCs. If you understand these dynamics, you can control response times reliably and keep applications running noticeably more smoothly.

Why high interrupt times cost performance

In healthy environments, system interruptions are usually between 0.1-2% CPU, 3-7% are possible in the short term. If the interrupt time regularly remains above 5-10%, there is often a driver problem, faulty hardware or incorrect tuning behind it. From 30% it gets serious, beyond 50% there is the threat of Bottlenecks and slow response times. Applications lose throughput, latencies jump and predictability suffers. I then first check driver versions, firmware, affinities and the moderation of interrupts on the NICs.

Simultaneous interrupts: Understanding latencies

A single interrupt rarely remains a Problem; It becomes difficult when several events collide. If a high-priority interrupt occurs during a low-priority interrupt, its processing is extended by further interruptions. An example: If the high-priority path requires 75 cycles and the low-priority path 50, the latency of the low-priority path easily increases to 125 cycles - further overlaps drive up the latency. Worst case-latency quickly increases. This behavior makes systems unpredictable. I therefore plan core affinities and priorities in such a way that hotpaths do not block each other.

MSI and MSI-X in everyday life

Modern hosts use MSI or MSI-X, instead of sending classic line signals (IRQ lines). MSI transmits the message as a memory write, thus reducing latency and susceptibility to interference. MSI-X extends the concept: more messages, separate queues, more precise distribution to cores. This reduces interrupt collisions and improves the Scaling with high throughput. I enable MSI-X for NICs and NVMe controllers, as long as the drivers and firmware support it stably.

mechanism Max. Messages Addressing Distribution to cores Typical effect
Legacy IRQ 1 per device/line Line signal Restricted Higher Latency, more collisions
MSI Up to ~32 Memory Write (16-bit) Good Less overhead, more stable paths
MSI-X Until 2048 Memory Write (32-bit) Very good Finer Distribution, higher parallelism

DMA, DPCs and the right data path

With DMA, devices can store data directly in the Memory The CPU only triggers processing routines. This saves interrupts because fewer intermediate states have to be signaled. I make sure that DPCs bundle the actual work instead of doing too much in the ISR. This keeps the time in the critical section short and the Latency more predictable. All in all, the CPU gains more time for the application logic.

Configure RSS and CPU affinity specifically

Receive Side Scaling distributes network queues and their interrupts across several cores. I bind every queue including interrupt, DPC and user thread to the same core or core cluster to avoid cross-core wakes. If different cores are involved in a flow, cache misses and context switches increase. A structured affinity plan noticeably prevents such frictional losses. If you would like to delve deeper, you can find a compact CPU affinity-Overview for hosting setups.

Defuse storage interrupts and I/O paths

Storage also generates many Interrupts, especially with many small IOPS. I use MSI-X on NVMe controllers and assign queues to fixed cores so that input and output remain local. In addition, a suitable I/O scheduler, to smooth the load per queue. Deadline, BFQ or MQ variants react very differently depending on the workload. If you test properly here, you reduce jitter and increase the Throughput.

Network storms, SYN floods and interrupt moderation

Sudden floods of parcels drive the ISR-rate and take the CPU's breath away. I activate interrupt moderation on the NIC so that packets arrive in reasonable bursts without generating latency peaks. For DoS scenarios, a resilient SYN flood defense the connection table at an early stage. At the same time, I measure whether the moderation itself reacts too slowly - then I adjust the values. The aim is to achieve a smooth packet stream that evenly distributes DPCs. feeds.

Monitoring: reading and acting on figures

I start with a few, clear MetricsCPU total utilization, interrupt time, DPC time, context switch and processor queue. If the CPU mostly stays below 50%, I react calmly; at 50-80% I observe peaks and hotspots; above 80% I plan scaling or tuning. If the interruption time rises above 30%, I check the driver, firmware and affinities. A latency check for audio/video indirectly shows how deterministically the kernel reacts. Important: I only change one Variable per test run and then measure again.

NUMA topology and PCIe locality

On multi-socket hosts, I always decide interrupt affinities in the context of the NUMA-topology. A NIC or an NVMe controller is physically connected to a PCIe root complex and therefore to a NUMA node. If I set the queues and their interrupts to distant cores, data travels via UPI/QPI links - latencies increase, bandwidth decreases. I therefore check which NUMA node a device is assigned to, bind its queues to local cores and ensure that the associated user threads use the same node. On Windows, I pay attention to processor groups and the device setting for the preferred NUMA node; on Linux, I consistently link IRQs, softirqs and application threads to the local node. The result: less cross-node traffic, more stable Jitter-values and calculable worst-case latencies.

Using offloads, NAPI and coalescing correctly

Offloads are powerful levers against interrupt floods - but must be used to Workload fit. Roughly summarized: TSO/GSO move the segmentation to the NIC, LRO/GRO summarize incoming segments, RSC on the host has a similar effect to LRO. For bulk transfers (backup, replication), these features increase throughput and significantly reduce the ISR rate. However, for latency-critical flows (RPCs, trading, VoIP), large aggregations can have a negative impact on the ISR rate. Response times extend. I therefore choose moderate settings: GRO yes, but don't overdo it; LRO only if no mid-path devices or firewalls are causing problems; leave TSO/GSO active as a rule.

NAPI on Linux switches from pure interrupt mode to poll mode from load onwards. This smoothes peaks and keeps the CPU busy in the DPC path instead of triggering thousands of short ISRs. Together with Interrupt moderation (coalescing), a plan is created: short timers for interactive profiles, longer timers for bulk. I test intervals in microsecond increments, observe drops, ring fill levels and latencies to find the sweet spot. In the storage stack, analogue adjusting screws (queue depth, NCQ, blk-mq optimizations) deliver the same effect: less staccato, more Efficiency.

IRQ balancing vs. static pinning

Automatic IRQ balancing distributes load acceptably - but not perfectly. In homogeneous web environments, I often leave it running and only control hotspots. In latency-critical or asymmetric setups Static pinning superior: I define fixed CPU sets per queue and device, keep them consistent via reboots and minimize wandering softirqs. In addition, I reserve „housekeeping“ cores for background work (timers, Kthreads) so that performance cores remain free. On Windows, I specifically use interrupt steering and affinity masks for each queue; on Linux, I work with per-IRQ affinity and Softirq control. The motto: as much automation as necessary, as much Determinism as possible.

Virtualization and SR-IOV/virtio

Additional costs arise in VMs: Virtual interrupts mean VM exits, scheduling delays and shared queues. I pin I/O-intensive vCPUs to appropriate pCPUs, avoid overcommit on I/O hosts and separate dataplane threads from management load. Where possible, I use SR-IOVVirtual functions bring MSI-X to the guest VM and reduce the load on the hypervisor path. For generic workloads, virtio with vhost acceleration delivers solid results; in high-throughput scenarios, I map queues 1:1 to vCPUs and keep affinities consistent from guest to host. Important: The same rules for RSS, coalescing and NUMA also apply in VMs - only the Transparency is lower, so I measure more closely.

Power management and deterministic latencies

Power-saving functions are good for the balance sheet, but bad for hard Latency budgets. Deep C-states extend the wake-up time, aggressive frequency changes cause jitter. On hosts with strict SLOs, I set performance profiles, limit deep package C-states and only allow turbo where the thermal reserve is large enough. Timer decisions (high-resolution timers vs. lower interrupt frequency) also influence the amount and rate of kernel work. In near real-time setups, tickless modes and isolated cores help: application threads on isolated cores, system work on dedicated „housekeeping“ cores - this keeps the critical Hotpath free from interfering fires.

Tools and measurement methodology per OS

I keep my Diagnostic chain lean and reproducible. On Linux I start with /proc/interrupts and /proc/softirqs, check per-queue counters via ethtool and look into the coalescing and offload settings. mpstat, vmstat and sar show macro trends; perf uncovers hotspots in ISRs/DPCs. I correlate packet and drop counters with kernel times and flow metrics. On Windows, performance indicators on interrupt/DPC time, interrupts/sec and DPCs/sec provide a clean picture; traces show which drivers are setting the clock. Important is the common Time scaleI log everything synchronized so that peaks, drops and latency jumps match.

Troubleshooting playbook and anti-pattern

My procedure is consistent: first Observe, then hypothesis, then a change. Typical causes: a queue or a device with an escalating ISR rate, faulty firmware, coalescing values that are too high (tough system) or too low (ISR storm), offloads that bundle too large, or threads that pull queues across NUMA nodes. I isolate the affected device, test conservative defaults, tweak drivers/BIOS and distribute load cleanly. Anti-pattern: move everything at the same time, messy rollbacks, no baseline or readings without context. If you persistently use a Variable after the other, you will quickly end up with a stable configuration.

Blueprints for 10/25/100G hosts and NVMe

For 10G NICs, I calculate 4-8 RSS queues, depending on the CPU generation and packet profile. I start coalescing moderately (e.g. low double-digit microseconds), GRO on, LRO carefully. At 25G I scale to 8-16 queues and keep the affinity strictly NUMA-local. From 40/100G, queue architecture becomes the Core taskMany queues, clean per-core allocation, active offloads, NAPI takes effect under load. For NVMe storage, I map at least one queue per core and keep the queue depth suitable for the workload - small I/Os benefit from more parallelism, large sequential transfers from a stable coalescing policy and a scheduler that smoothes bursts. The goal remains the same: constant latencies, no hot cores, no overflowing rings.

Practice checklist for quick success

I update first Drivers and BIOS/firmware, because faulty states often drive up the interrupt load. Then, if possible, I switch to MSI-X and distribute queues cleanly to cores. I set up RSS so that flow affinities are correct and hotpaths remain consistent. On the NIC, I adapt the moderation to the traffic profile and observe the effect on latencies. If I continue to find outliers, I search for defective hardware, incorrect options or problem devices using the exclusion procedure and a separate Profiling.

Realistically assess costs and benefits

Not every system needs maximum Fine tuning. I prioritize hosts with a high packet load, many small IOPS or tight latency specifications. A few hours of tuning pay off greatly there, because less interrupt overhead immediately frees up CPU for the application. On non-critical servers, a solid basic configuration with the latest drivers and MSI-X is sufficient. The measured values guide me, not gut feeling or Assumptions.

Summary: What I include in daily maintenance

I observe consistently Interrupt- and DPC times, keep drivers and firmware up to date and use MSI-X where possible. I plan RSS and affinities per workload so that flows, DPCs and threads remain local. I adapt the NIC moderation to patterns in the traffic, distribute storage queues cleanly and use suitable I/O paths. If the monitoring shows outliers, I work my way straight through the drivers, hardware and configuration. This keeps the interrupt handling server predictable, and my workloads run with stable Performance.

Current articles