...

HTTP Keep-Alive Timeout: Optimal configuration for server performance

With the focus on HTTP Keep-Alive Timeout I'll show you how to set idle times so that connections are reused without blocking threads. I explain specific values, show typical pitfalls and provide tried and tested configurations for nginx, Apache and the operating system.

Key points

  • Balance: Too short increases handshakes, too long blocks threads.
  • ValuesMostly 5-15 s and 100-500 requests per connection.
  • CoordinationCoordinate client, LB and firewall timeouts.
  • Special casesWebSockets, SSE, Long Polling separately.
  • Monitoring: Monitor open sockets, FDs and latencies.

HTTP Keep-Alive briefly explained

I hold TCP connections with Keep-Alive open so that several requests use the same line. This saves me repeated TCP and TLS handshakes and reduces the CPU-overhead noticeably. This is particularly beneficial for many small files such as icons, JSON or CSS. Every new connection that is avoided reduces context switches and relieves kernel routines. In benchmarks with a high proportion of GET, the overall duration is significantly reduced because fewer SYN/ACK packets are generated and more computing time flows into the application logic.

I quickly measure the effect: moving average latencies become smoother and the number of new TCP connections per second drops. I don't achieve this by magic, but by Connection reuse and sensible limits. It is important to note that Keep-Alive is not a substitute for fast rendering or caching. It shortens waiting times at the network boundary, while the app itself must continue to respond efficiently. Both together increase the Performance noticeably.

Understanding the right timeout

The timeout defines how long an inactive connection remains open before the server closes it. closes. If I set it too short, clients are constantly opening new TCP connections, which Overhead is raised. If I set it too long, idle connections park precious workers or threads. The trick is to strike a balance between reuse and resource consumption. I test practically: first set it roughly, then fine-tune it with load tests.

I also pay attention to the relationship between response times and idle windows. If the typical user interaction between two clicks is 2-4 seconds, a 5-15 second timeout usually covers the real pattern. Short API calls can easily tolerate 5-10 seconds, media workloads 10-15 seconds. It is important that I do not exaggerate: overlong timeouts rarely lead to more Throughput, but often lead to blocked Resources. I can quickly see this from the increasing number of open sockets and high FD figures.

Separate timeout types cleanly

I make a strict distinction between Idle timeout (Keep-Alive), Read/header timeout (how long the server waits for incoming requests) and Send/write timeout (how long sending towards the client is tolerated). These categories fulfill different tasks:

  • Idle timeout: Controls the reuse and parking duration of inactive connections.
  • Read/header timeout: Protects against slow clients (slow loris) and half-sent headers.
  • Send/write timeout: Prevents the server from waiting endlessly for a slow reception at the client.

At nginx I deliberately use header_timeout/read_timeout and send_timeout per context (http/server/location) in addition to keepalive_timeout. Since newer versions I optionally set keepalive_time, to cap the maximum lifetime of a connection, even if it remains active. In Apache I also use RequestReadTimeout (mod_reqtimeout) and check Timeout (global) separate from KeepAliveTimeout. This separation is an important building block against tying up resources without any real benefit.

Recommended values in practice

For productive environments, I set a keep-alive timeout of 5-15 seconds and 100-500 requests per connection. This range achieves good connection reuse rates and keeps the number of dormant connections low. On nginx I use keepalive_timeout 10s as the starting value and keepalive_requests 200. If there is a lot of traffic, I increase it moderately if I see too many new TCP connections. If traffic is sparse, I lower it again to avoid a flood of idle traffic.

Those who go deeper will benefit from a clear tuning process with measuring points. To this end, I summarize my guidelines in a practical guide that describes the path from measurement to configuration to control. For a quick start, I refer you to my steps in Keep-Alive Tuning. How to control Reuse and limits and avoid surprises. In the end, what counts is low latency with stable Throughput.

Risks of long timeouts

A long timeout keeps connections artificially open and blocks workers even though no request follows. This causes sockets to swell and drives up file descriptor numbers. If the process reaches limits, I see rejecting accept errors or queues when establishing a connection. Memory grows, garbage collectors or allocators cost additional time and latency increases. In the event of an error, clients then send to sockets that are already closed and receive cryptic Error.

I avoid this by setting moderate values and checking regular metrics. If idle connections increase too much under low load, I lower the timeout. If I see many new connections per second during traffic peaks, I carefully increase it in small steps. This is how I keep the Capacity usable and prevent dead connections. The result is a quieter system with fewer Tips in the bends.

Configuration: nginx, Apache and OS layer

I start at the web server level and set the timeout and limits. On nginx I set keepalive_timeout 5-15s and keepalive_requests 100-500. In Apache with event-MPM I combine KeepAlive On, KeepAliveTimeout 5-15 and MaxKeepAliveRequests 100-500. Then I calibrate worker or thread pools according to the expected load. This prevents idle keep-alives from becoming productive. Slots bind.

I increase limits and queues at operating system level. I set ulimit -n to at least 100,000, adjust net.core.somaxconn and tcp_max_syn_backlog and check TIME_WAIT handling. This ensures that the kernel and process have enough Resources provide. Finally, I verify the paths from the NIC via IRQ balancing to the app. This allows me to identify bottlenecks in good time and keep the Latency low.

Component Directive/Setting Recommendation Note
nginx keepalive_timeout 5–15 s Shorter with little traffic, longer with many small requests
nginx keepalive_requests 100–500 Recycles compounds and reduces Leaks
Apache (event) KeepAliveTimeout 5–15 s Event-MPM manages idle more efficiently than prefork
Operating system ulimit -n ≥ 100.000 More open FDs for many Sockets
Operating system net.core.somaxconn Increase Fewer rejected connections under Peak load

Reverse proxy and upstream reuse

I always think keep-alive end-to-end. Behind the edge server there is often a chain of reverse proxy → app servers. For nginx, I activate my own Keep Alive Pools (upstream keepalive, keepalive_requests, keepalive_timeout), set proxy_http_version 1.1 and remove „Connection: close“. This also saves me internal handshakes and relieve app backends (Node.js, Java, PHP-FPM). In Apache with mod_proxy, I also keep persistent connections to backend servers and limit them per destination so that a hotspot does not monopolize the pools.

I measure separately: Reuse rate Client→Edge and Edge→Backend. If I see good reuse at the edge, but many new connections to the backend, I selectively increase the upstream pools. This allows me to scale without globally increasing the frontend timeouts.

Workers, threads and OS limits

I do not dimension workers, events and threads according to desired values, but according to load profile. To do this, I monitor active requests, idle workers, event loop utilization and context switches. If threads are parked in idle mode, I lower the timeout or the max-idle-per-thread limits. If I see 100 percent CPU all the time, I check accept queues, IRQ distribution and network stack. Small corrections to FD limits and backlogs often make a big difference. Effects.

I plan headroom realistically. A 20-30 percent reserve in threads and FDs provides security for peaks. If I overdo it, I lose caches and waste increases. If I underdo it, requests end up in queues or expire. The right intersection of Capacity and efficiency keeps latencies low and protects the Stability.

Coordinate client, load balancer and firewall timeouts

I set time limits along the entire path so that there are no dead ends. Connections are created. Clients ideally close minimally earlier than the server. The load balancer must not cut off shorter, otherwise I will see unexpected resets. I include NAT and firewall idle values so that connections are not lost in the network path. disappear. This tuning prevents retransmits and smoothes the load curves.

I use clear diagrams to keep the chain understandable: Client → LB → web server → app. I document idle timeouts, read/write timeouts and retry strategies for each link. If I change a value, I check the neighbors. This keeps the path consistent and gives me reproducible measurement results. This discipline saves time in the Troubleshooting and increases the Reliability.

Security: Protection against slow loris and idle abuse

Open timeouts that are too generous Attack surfaces. I therefore set limits that allow legitimate reuse but make it more difficult to keep them open maliciously. In nginx, header and read_timeout, request_headers_size limits and a hard upper limit for keepalive_requests help. In Apache, I use mod_reqtimeout and limit parallel connections per IP. Rate limits and limit_conn in nginx additionally protect against floods of many idle sockets. For long-running endpoints, I separate dedicated pools so that attacks on streams do not bind regular API workers.

Special cases: Long Polling, SSE, and WebSockets

Long streams collide with short ones Timeouts and need their own rules. I technically separate these endpoints from classic API and asset routes. For SSE and WebSockets, I set higher timeouts, dedicated worker pools and hard limits per IP. I use heartbeats or ping/pong to keep the connection alive and detect disconnections quickly. This way, streams do not block threads for regular Short requests.

I limit simultaneous connections and measure actively. Limits that are too high consume FDs and RAM. Limits that are too tight cut off legitimate users. I find the sweet spot with clean metrics for open, idle, active and dropped connections. This separation saves me global Increases the timeouts and protects the Capacity.

HTTP/2, multiplexing and keep-alive

HTTP/2 multiplexes several streams via a Connection, but remains dependent on timeouts. I keep the idle window moderate because sessions can also park under HTTP/2. High keepalive_requests are less important here, but recycling remains useful. Head-of-line blocking moves to frame level, so I continue to measure latency per Stream. If you want to make a deeper comparison, you will find background information on HTTP/2 multiplexing.

Under HTTP/2, I pay particular attention to the number of active streams per connection. Too many parallel streams can overload app threads. Then I slow down limits or increase server workers. The same applies here: measure, adjust, measure again. This keeps the Response times scarce and preserved Resources.

TLS, session resumption and HTTP/3/QUIC

TLS handshakes are expensive. I use Session resumption (tickets/IDs) and OCSP stapling so that reconnections are faster if a connection does end. Under HTTP/3, QUIC takes over the transport layer: Here, the QUIC idle timeout similar to Keep-Alive, but on a UDP basis. Here, too, I keep the windows moderate and measure retransmits, as packet losses have a different effect than with TCP. For mixed environments (H1/H2/H3), I choose uniform guideline values and make fine adjustments per protocol.

Monitoring, metrics and load tests

I trust measurement data more than gut feeling and start with clear KPIs. Important are: open sockets, FD utilization, new connections/s, latencies (P50/P90/P99), error rates and retransmits. I run realistic load profiles: Warmup, plateau, ramp-down. I then compare curves before and after changes to the timeout. A look at Server queuing helps to clearly interpret waiting times.

I document every adjustment with a time stamp and measured values. In this way, I preserve the history and recognize correlations. I take negative effects seriously and roll them back quickly. Small, comprehensible steps save a lot of time. What counts in the end is a stable Latency and low Error rate under load.

Measurement methods and tools in practice

  • Rapid tests: I use tools such as wrk, ab or vegeta to check reuse quotas (-H connection: keep-alive vs. close), connections/s and latency percentiles.
  • System view: ss/netstat show statuses (ESTABLISHED, TIME_WAIT), lsof -p the FD consumption, dmesg/syslog indications of drops.
  • Web server metrics: nginx stub_status/VTS and Apache mod_status provide active/idle/waiting and requests/s. From this I can recognize idle peaks or worker bottlenecks.
  • Traces: I use distributed tracing to monitor whether waiting times occur at the network boundary or in the app.

Configure step-by-step

First, I determine the real usage pattern: how many requests per session, which Intervals between clicks, how big are the responses. Then I set an initial profile: timeout 10 s, keepalive_requests 200, moderate worker numbers. I then carry out load tests with representative data. I evaluate the number of new connections per second and the FD occupancy. I then adjust the Values in 2-3 second increments.

I repeat the cycle until latencies remain stable under load and FD peaks do not reach the limit. With heavy traffic, I only increase the timeout if I clearly see fewer new connections and workers still remain free. If the load is low, I reduce the timeout to avoid idling. In special cases such as SSE, I set dedicated server blocks with higher limits. This path leads to a resilient Setting without rate cardboard.

Kubernetes, containers and auto-scaling

In container environments I use conntrack-limits, pod FD limits and node backlogs. I ensure consistent idle timeouts between Ingress, service mesh/proxy and app. For auto-scaling, I pay attention to Drain timesWhen pods are terminated, they should reject new connections via „Connection: close“ and serve existing ones cleanly. Keep alive values that are too long unnecessarily lengthen drains, while those that are too short generate handshake storms when scaling out.

Graceful shutdown and rolling deployments

I also plan to switch off. Before a rollout, I gradually reduce keep-alive or send targeted Connection: close on Responses so that clients do not open fresh idle connections. In nginx, a worker_shutdown_timeout for running requests. In Apache, I use graceful mechanisms and keep an eye on MaxConnectionsPerChild/Worker so that recycling takes place automatically over time. This keeps deployments smooth without hard-capping open sockets.

OS tuning: ports, timeouts, kernel parameters

  • ephemeral ports: Select a wide ip_local_port_range so that short-lived connections do not run into shortages.
  • TIME_WAIT: I watch TW peaks. Modern stacks handle this well; I avoid questionable tweaks (tw_recycle).
  • tcp_keepalive_time: I am not confusing it with HTTP Keep-Alive. It's a kernel mechanism for detecting dead peers - useful behind NAT, but not a replacement for the HTTP idle window.
  • Backlogs and buffers: dimension somaxconn, tcp_max_syn_backlog and rmem/wmem sensibly so as not to throttle under load.

Troubleshooting checklist

  • Many new connections/s despite keep-alive: Timeout too short or clients/LB cut off earlier.
  • High idle figures and full FDs: Timeout too long or worker pools too large for the traffic pattern.
  • RST/Timeout error for longer sessions: NAT/firewall idle too short in the path, asymmetry between links.
  • Long tail latencies (P99): Check send/read timeouts, slow clients or overfilled backlogs.
  • Backends overloaded despite low edge load: Upstream cage is missing or too small.

Practice profiles and starting values

  • API-first (short calls): Keep-Alive 5-10 s, keepalive_requests 200-300, tight header/read timeouts.
  • E-commerce (mixed): 8-12 s, 200-400, slightly more generous for product images and caching hits.
  • Assets/CDN-like (many small files): 10-15 s, 300-500, strong upstream pools and high FD limits.
  • Intranet/low load: 5-8 s, 100-200, so that idle does not dominate.

Briefly summarized

I set the HTTP keep-alive timeout so that connections are reused without blocking threads. In practice, 5-15 seconds and 100-500 requests per connection deliver very good results. I coordinate client, load balancer and firewall timeouts, separate long-running connections such as WebSockets and regulate OS limits. With clean monitoring, realistic load tests and small steps, I achieve low Latencies and high Throughput. Those who maintain this discipline get measurable performance out of existing hardware.

Current articles