Plesk web server

HTTP connection reuse and keep-alive optimization: increasing web server performance

I show how HTTP Connection Reuse and structured keep-alive tuning reduce the overhead from TCP and TLS handshakes so that pages respond faster and servers have to do less. With suitable timeouts, limits and protocol features, I reduce Latency, smooth out load peaks and significantly increase throughput.

Key points

Keep-Alive reduces handshakes and shortens Loading times.
Timeouts and keep limits Resources efficient.
HTTP/2 and HTTP/3 reinforce Reuse through multiplexing.
Client pooling lowers backendLatency.
Monitoring makes tuning successes measurable.

Efficient HTTP optimization in the server room

What does HTTP Connection Reuse mean?

I use Connection reuse, to send multiple HTTP requests over a single TCP connection and thus avoid expensive reconnections. Each new connection costs three TCP packets plus a possible TLS handshake, which saves time and money. CPU eats. If the line remains open, subsequent requests run on the same socket and save round trips. Sites with many small resources such as CSS, JS and images benefit in particular because the waiting time per object is reduced. In HTTP/1.1, the “Connection: keep-alive” header signals reuse, which noticeably reduces latency and stabilizes throughput.

Why Keep-Alive improves web server performance

I rely on Keep-Alive-tuning because it reduces overhead in the kernel and in TLS, allowing more payload per second to pass through the line. In tests, the effective throughput often increases by up to 50 percent as handshakes are eliminated and the CPU performs fewer context switches. At the same time, pages react more quickly as browsers can quickly reload additional objects. Short timeouts prevent idle connections from taking up RAM, and limits for keepalive_requests ensure stability. This keeps the number of active sockets in the green zone and avoids bottlenecks under peak load.

Server-side configuration: Nginx, Apache and proxies

I put Nginx so that timeouts are short enough to save RAM, but long enough for browsers to fetch several objects in succession. For typical websites, I do well with 60-120 seconds idle timeout and 50-200 requests per connection, which I compare with real traffic patterns. An example shows how I start and then fine tune. Via the link Configure keep-alive timeout I delve into details such as open file descriptors and accept queues. For reverse proxies, I activate proxy_http_version 1.1 so that keep-alive is passed on cleanly and backends benefit from reuse.

# Nginx (Frontend / Reverse Proxy)
keepalive_timeout 65s;
keepalive_requests 100;

# Proxy to upstream
proxy_http_version 1.1;
proxy_set_header Connection "";

# Apache (example)
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5

TLS, HTTP/2 and HTTP/3: protocols that strengthen reuse

I combine Keep-Alive with TLS 1.3, session resumption and OCSP stapling, so that connections are available more quickly. In HTTP/2, I bundle many streams on a single connection, which eliminates head-of-line delays at application level. The effect increases with Multiplexing, because browsers request resources in parallel without having to create new sockets. For a well-founded classification, please refer to HTTP/2 multiplexing, which clearly shows the differences to HTTP/1.1. HTTP/3 with QUIC also provides 0-RTT start for idempotent requests and reacts noticeably faster in the event of packet loss.

Client-side optimization: Node.js and Python

I activate Keep-Alive also in the client, so that API and backend calls require less connection setup. In Node.js, I use an https.agent with connection pooling, which reduces latencies and speeds up time-to-first-byte. Python with requests.Session() does the same in a simple way, making services more stable. This keeps transport paths short and saves round trips in both directions. This results in more consistent response times and a measurably lower Server load.

// Node.js
const https = require('https');
const httpsAgent = new https.Agent({
  keepAlive: true,
  keepAliveMsecs: 60000,
  maxSockets: 50
});

// Usage: fetch / axios / native https with httpsAgent

# Python
import requests
session = requests.Session() # Reuse & Pooling
r = session.get('https://api.example.com/data') # fewer handshakes

Typical values and their effect

I start with conservative Values and measure whether connections tend to hang idle or close too early. If I expect load peaks, I shorten timeouts to keep RAM free without forcing browsers to constantly reconnect. When parallelism is high, I set the maximum file descriptors high enough to avoid acceptance bottlenecks. The following table provides a quick overview of how I get started and what the settings do. After that, I tweak in steps and watch metrics closely for Corrections.

Parameters	Nginx	Apache	Typical start value	Effect
Idle timeout	keepalive_timeout	KeepAliveTimeout	60–120 s	Compensates for reuse and RAM consumption
Requests per connection	keepalive_requests	MaxKeepAliveRequests	50-200	Stabilizes utilization per socket
Proxy version	proxy_http_version	–	1.1	Enables keep-alive to be passed on
Open descriptors	worker_rlimit_nofile	ulimit -n	>= 65535	Prevents socket shortage
Accept queue	net.core.somaxconn	ListenBacklog	512-4096	Reduces drops at peaks

Monitoring and load testing: metrics that count

I rate Reuse-successes with wrk or ApacheBench and correlate them with logs and system metrics. Important are open sockets, free sockets, pending requests and error codes that indicate bottlenecks. If the number of idle connections increases, I lower timeouts or reduce keepalive_requests moderately. If connections are dropped too frequently, I increase limits or check whether backends are responding too slowly. This allows me to quickly find the point at which latency, throughput and Resources go well together.

WordPress practice: Fewer requests, faster first paint

I reduce HTTP requests by CSS/JS bundle, use icons as SVG sprites and deliver fonts locally. In conjunction with browser caching, the number of network transfers on revisits is drastically reduced. This creates more scope for reuse because browsers require fewer new sockets. If you want to delve deeper, you can find practical steps in the Keep-Alive Tuning Guide, which explains tuning paths from timeout to worker setup. In the end, what counts is that pages load noticeably faster and the Server load remains predictable.

Scaling and system resources

I check CPU-profiles, memory footprint per worker and the network card before I increase limits. Higher parallelism is only useful if each layer has enough buffers and descriptors. NUMA affinity, IRQ distribution and fast TLS implementations provide additional reserves. With containers, I pay attention to open file limits and hard limits of the host, which otherwise slow down reuse. In this way, I avoid bottlenecks that quickly become noticeable with growing traffic and waste valuable resources. Performance cost.

Error patterns and troubleshooting

I recognize Error I often see patterns: too many TIME_WAIT sockets, increasing 502/504 or abrupt RPS kinks. Then I check whether backends accept keep-alive and whether proxy headers are set correctly. Incorrect idle timeouts on individual hops often trigger chain reactions, which I rectify by setting consistent values. TLS problems manifest themselves as handshake_time spikes, which session resumption or 1.3 optimizations alleviate. With targeted adjustments, I stabilize the chain from the edge to the app server and keep Response times reliable.

Keep cross-shift timeouts consistent

I equalize Idle and activity timeouts across all hops: CDN/WAF, load balancer, reverse proxy and application. An origin timeout that is too short cuts connections while the browser is still loading; an edge timeout that is too long fills RAM with idle sockets. I therefore plan in cascades: Edge a little shorter as browser idle, proxy in the middle, longest backend timeout. This way I avoid RSTs and prevent expensive TLS connections from being terminated pointlessly.

# Nginx: precise timeouts & upstream reuse
client_header_timeout 10s;
client_body_timeout 30s;
send_timeout 15s;

proxy_read_timeout 60s;
proxy_send_timeout 60s;
proxy_socket_keepalive on; # Detect dead peer faster

upstream backend_pool {
  server app1:8080;
  server app2:8080;
  keepalive 64; # Cache idle upstream connections
  keepalive_timeout 60s; # (from Nginx versions with upstream timeout)
  keepalive_requests 1000;
}

I differentiate between HTTP-Keep-Alive from TCP-Keepalive (SO_KEEPALIVE). I use the latter specifically on proxy sockets to detect hanging remote stations without terminating HTTP reuse unnecessarily.

HTTP/2 and HTTP/3 fine-tuning: using multiplexing correctly

I set HTTP/2 so that streams run efficiently in parallel without generating head-of-line on the server. To do this, I limit the maximum number of streams per session and keep idle timeouts short so that forgotten sessions are not left behind. I use prioritization to Critical assets and with HTTP/3, make sure you have a clean 0-RTT setup for idempotent requests only.

# Nginx HTTP/2 optimization
http2_max_concurrent_streams 128;
http2_idle_timeout 30s; # Inactivity on H2 level
http2_max_field_size 16k; # Header protection (see Security)
http2_max_header_size 64k;

With Connection Coalescing (H2/H3), a browser can use multiple hostnames via a connection if the certificate SANs and IP/configuration match. I take advantage of this by consolidating static subdomains and choosing certificates that cover multiple hosts. This saves me additional handshakes and port contention.

Kernel and socket parameters at a glance

I also secure Reuse on Kernel level so that port and socket shortages do not occur. Ephemeral port ranges, FIN/TIME_WAIT behavior and keepalive probing have a direct influence on stability and handshake rate.

# /etc/sysctl.d/99-tuning.conf (examples, test with caution)
net.ipv4.ip_local_port_range = 10240 65535
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 5
net.core.netdev_max_backlog = 4096

I avoid risky tweaks such as thoughtlessly activating tcp_tw_reuse on publicly accessible servers. More importantly, Reuse odds so that there are not many short-term connections in the first place. Under heavy load, I also scale the IRQ distribution and CPU affinity so that network interrupts are not bundled and generate latency peaks.

Safety and abuse protection without slowing down Reuse

Keep-Alive invites attackers to Slowloris-variants or HTTP/2 abuse if limits are missing. I harden header sizes and request rates without interfering with legitimate reuse patterns. Against Rapid Reset-pattern in H2, I set limits for simultaneous streams and RST rates and log conspicuous clients.

# Nginx: Protection rules
large_client_header_buffers 4 8k;
client_body_buffer_size 128k;

limit_conn_zone $binary_remote_addr zone=perip:10m;
limit_conn perip 50;

limit_req_zone $binary_remote_addr zone=periprate:10m rate=20r/s;
limit_req zone=periprate burst=40 nodelay;

# H2-specific already above: http2_max_concurrent_streams, header limits

I also use graceful Shutdowns so that keep-alive connections expire cleanly during deployments and no client errors occur.

# Nginx: Cleanly clear connections
worker_shutdown_timeout 10s;

Load balancers, CDN and upstreams: reuse throughout the chain

I make sure that between LB/proxy and backend reuse takes place. To this end, I operate upstream pools with sufficient slots and use sticky or consistent hashing strategies if sessions are required in the backend. I reduce the load on CDNs by using a few, long-lasting Origin-connections and limit the maximum number of connections per POP so that the app servers do not drown in too many small sockets.

Important are Homogeneous idle timeouts along the path: The Edge must not cut connections earlier than the Origin, otherwise multiplexing sessions will be unnecessarily reestablished. With HTTP/3, I take into account that notebook and mobile clients change IPs more frequently; I therefore plan tolerant but limited idle times.

Deepen client pooling: Node.js, Python, gRPC

On the client side, I take care of sensible pooling and clear limits so that neither stampedes nor leaks occur. In Node.js, I set free socket limits and idle timeouts so that connections stay warm but don't stay open forever.

// Node.js agent fine-tuning
const https = require('https');
const agent = new https.Agent({
  keepAlive: true,
  keepAliveMsecs: 60000,
  maxSockets: 100,
  maxFreeSockets: 20
});
// axios/fetch: httpsAgent: agent

# Python requests: larger pool per host
import requests
from requests.adapters import HTTPAdapter

session = requests.Session()
adapter = HTTPAdapter(pool_connections=50, pool_maxsize=200, max_retries=0)
session.mount('https://', adapter)
session.mount('http://', adapter)

For async workloads (aiohttp), I limit the maximum number of sockets and use DNS caching to keep latencies low. With gRPC (H2), I set keep-alive pings moderately so that long idle phases do not lead to disconnection without flooding networks.

Metrics and target values for tuning loops

I control tuning iteratively with key figures that make reuse visible:

Reuse quota (requests/connection) separately for frontend and upstream.
TLS handshakes/s vs. requests/s - Goal: Reduce the proportion of handshakes.
p95/p99 latency for TTFB and total.
Idle connections and their service life.
Error profiles (4xx/5xx), resets, timeouts.
TIME_WAIT/FIN_WAIT-counter and ephemeral port utilization.

A simple target image: TLS handshakes/s stable well below Requests/s, reuse rate in H1 range >= 20-50 depending on object size, for H2/H3 several simultaneous streams per session without congestion.

Front-end strategies that favor reuse

I avoid Domain sharding with H2/H3, consolidate hosts and use preload/preconnect selectively to save expensive handshakes where they are unavoidable. I load large images in a modern and compressed way so that bandwidth does not become a bottleneck that unnecessarily blocks keep-alive slots. I reduce cookies to the bare minimum to keep headers small and send more objects efficiently over the same sessions.

Consider mobile and NAT networks

In mobile radio and NAT environments Idle timeouts often shorter. I therefore keep server idle moderate and accept that clients reconnect more often. With session resumption and 0-RTT (H3), reconnections still remain fast. On the server side, TCP keep-alive probes on proxy sockets help to quickly dispose of dead paths.

Rollouts and high availability

For deployments I manage connections soft off: Stop new acceptances, wait for existing keep-alive sockets, only then terminate processes. I place connection draining behind LBs so that multiplexing sessions are not terminated in the middle of the stream. I keep health checks aggressive, but idempotent, in order to detect errors early on and restructure pools in good time.

Summary for quick success

I rely on HTTP Connection reuse, short timeouts and sensible limits so that connections remain productive and do not tie up resources when idle. Modern protocols such as HTTP/2 and HTTP/3 reinforce the effect, while client pooling relieves the backends. With monitoring, I recognize early on where sockets are lying idle or are too scarce and adjust values iteratively. For WordPress and similar stacks, I combine reuse with caching, asset bundling and locally hosted fonts. This results in fast pages, smooth load curves and a Web server-performance, which is evident in every metric.