...

Connection limits in web hosting: optimize simultaneous connections and server load

Connection Limits in web hosting to control how many simultaneous requests a server can reliably process before latencies and errors occur. I show you specifically how to measure and optimize limits, simultaneous connections and server load and how to control them reliably through targeted tuning.

Key points

The following key points provide a compact overview of the content and benefits of this article.

  • Limitation simultaneous connections protects against overload and error messages.
  • Resources such as CPU, RAM and I/O determine the effective limit.
  • Tuning with Sysctl, Nginx/Apache and DB parameters raises capacities.
  • Monitoring recognizes bottlenecks early and prevents breakdowns.
  • Scaling and caching reduce server load during peak traffic.

What do connection limits mean?

A connection limit sets a threshold value for the number of simultaneous TCP connections that a host accepts before new requests are rejected or placed in a queue. Behind every connection is a TCP-handshake, buffer and a processing unit that costs resources. Without a limit, the system quickly times out during peaks or reports „Connection refused“. Typical start values are between 128 and 4096, depending on the kernel and setup, which remains too low for many projects. I therefore first check how many open sockets, files and processes the system can reliably handle and then set a limit that reduces load peaks but does not unnecessarily block legitimate traffic.

Simultaneous connections and server load

Every open connection consumes Resources in CPU, RAM, network and possibly in the database. Under high load, context switches increase, kernel queues fill up and the server pauses accepting new requests. Keep-Alive reduces handshakes, but increases the memory requirement per socket during long timeouts. Backlogs that are too small (SYN and Accept) lead to drops even before the application. I therefore monitor active connections, backlog fill levels and retransmits and optimize timeouts so that I avoid idle time, but release connections quickly after use.

Performance tuning for more capacity

For more concurrent users, I first raise kernel limits and agree to Network-buffer. The parameter net.core.somaxconn is often 128 and slows down the acceptance of new connections, so I set it much higher depending on the system, often to 4096 or more. I increase the queue for half-open connections with net.ipv4.tcp_max_syn_backlog so that peaks pass through cleanly. I adjust the receive and send buffers (rmem_max, wmem_max) to the bandwidth times RTT so that no packets jam in the user space. With coordinated timeouts and a clean accept queue, the number of stably processed requests increases noticeably without me having to rely on Quality for the response time.

Configuring the web server: Nginx and Apache

With Nginx I increase worker_connections and set worker_rlimit_nofile to match the system limit so that file descriptor limits do not collide early. A keepalive_timeout of around one minute keeps connections open efficiently without holding idle sockets for too long. In Apache, I use Event-MPM and dimension MaxRequestWorkers so that RAM reservations match the size of the PHP processes. A deeper understanding of the processes between prefork, worker and event makes noticeable differences in throughput. For an overview of the strengths of the respective models, please refer to Event MPM and worker models, which helps me to choose the right approach.

Database connections and timeouts

In the database, I limit connections with max_connections and plan sufficient buffers in the InnoDB buffer pool so that active records are in RAM. I monitor aborts, lock wait times and connection queues of the application, because too high a limit puts too much load on the CPU with too many active sessions. I keep transaction durations and pool timeouts short so that connections are quickly returned to the pool. For typical web stacks, moderately set values go much further than blindly high maximums. If you want to delve deeper into error patterns such as 500s with too many DB sessions, you can find information at Database connection limits, which often speeds up my diagnosis.

Caching, HTTP/2/3 and Keep-Alive

Clean caching reduces dynamic Load immediately because fewer PHP and DB calls are required. Page, fragment and object cache reduce the pressure on the database by a very large proportion, depending on the application. With HTTP/2 or HTTP/3, a browser bundles many requests over a few connections, which drastically reduces the number of sockets per client. Compression (Gzip/Brotli) saves bandwidth and shortens transfer times as long as CPU reserves are available. With sensible keep-alive timeouts, I collect the gains from reused connections without tying up memory with excessively long idle phases, which reduces the Efficiency further increases.

Hardware and network tuning

High concurrent users benefit from CPU-threads, RAM and fast NVMe SSDs, because waiting times for I/O are reduced. From 16 threads and 64 GB RAM, large peaks can be run with clean latency. In the network, 10 Gbps pays off, especially with modern congestion control such as BBR. I minimize background services, set appropriate I/O schedulers and keep the kernel and drivers up to date. A clear separation of data and log volumes avoids „noisy neighbor“ effects and keeps the Response time stable.

PHP-FPM and process limits

Many websites depend on PHP-FPM, so I'm introducing pm.max_children according to the process size and the available RAM. A number that is too high blocks RAM and leads to swapping, which massively increases latencies. A number that is too low causes 503s during load peaks, although CPU capacity would be available. I adjust the start, spare and max values so that queues remain short and processes run smoothly. If you want to set the finer points of this module more precisely, you can find practical tips at PHP-FPM pm.max_children, which considerably simplifies troubleshooting.

Monitoring and load testing

I achieve lasting stability through Monitoring and reproducible load tests. I look at CPU utilization, steal time in virtual environments, RAM quotas, disk latencies and network errors. Accept queues, SYN backlogs and retransmits show whether the limit is too tight or whether an app is slowing down. For load tests, I use tools such as „hey“ or „wrk“ and gradually increase the number of users until I find the kink in the curve. On this basis, I change limits, check again and keep the Stability under realistic patterns.

Practical guide values and table

For start configurations I use Standard values, which I fine-tune later with measurements. With Nginx, I often start with 2048 worker_connections and set the open file limit appropriately higher. With Apache, I choose the event model and keep MaxRequestWorkers within a range that matches the size of the PHP processes. In the database, I start conservatively and only increase if latencies remain stable. I raise kernel limits, then test under peak loads and check the impact on queues and response times.

Parameters Component starting value impact
net.core.somaxconn Kernel 4096+ Increases acceptance of new connections
net.ipv4.tcp_max_syn_backlog Kernel High four-digit value Reduces drops with half-open sockets
rmem_max / wmem_max Kernel to bandwidth x RTT Prevents congestion with a fast network
worker_connections Nginx 2048 Increases concurrency per worker
MaxRequestWorkers Apache (Event) 150-400 Controls processes in the RAM budget
keepalive_timeout Nginx/Apache ~60s Reduces handshake overhead
max_connections Database ~1000 Balances session load

Operating system limits: descriptors, ports and states

In addition to the obvious network parameters File descriptors and process limits are critical parameters. I set nofile (ulimit) for users and services so that the web server, PHP-FPM and database can open enough sockets and files. The overall kernel value fs.file-max must match this; otherwise processes will reach the end early despite correct service settings. Equally important is the number of permitted processes/threads (nproc) so that no unexpected fork errors occur under load.

A second look Ephemeral ports (ip_local_port_range) and TCP states such as TIME_WAIT. With a large number of outbound connections (e.g. as a proxy or with microservices), the available port range can become a bottleneck. I choose a wide, sensible range and set timeouts so that inactive connections are released quickly without using aggressive or insecure kernel switches. The key is to minimize idle time and promote reuse (keep-alive, HTTP/2/3, database pooling) instead of constantly establishing new connections.

Reverse proxy and load balancer level

Between client and app there is often a Reverse proxy or load balancer. There, I also set sensible backlogs, timeouts and keep-alive on the upstream-page. In Nginx, an upstream keepalive pool ensures that connections to the application are reused, which reduces the load on both ports and CPU. I use connection throttling (limit_conn) and request-based rate limiting (limit_req) in doses to tame individual clients without curtailing legitimate load. A clear error return (429 instead of 503 for rate limiting) helps with root cause analysis during operation.

At Connection process During deployments or scale-downs, I use connection draining or graceful shutdown: new requests are no longer accepted, existing ones are terminated cleanly. In this way, I avoid spike latencies and error rates when replacing versions or reducing the number of instances.

TLS termination, HTTP/2/3 details and CPU utilization

TLS handshakes cost CPU and latency. I terminate TLS if possible close to the client (e.g. on the edge proxy) and use session resumption, OCSP stapling and modern, high-performance cipher suites. This saves handshakes and shortens time-to-first-byte. Under HTTP/2/3, it is worth keeping an eye on header compression and prioritization: Incorrectly prioritized streams can increase latencies, even though concurrency is high. I also make sure that keep-alive timeouts and limits per origin are selected in such a way that no head-of-line blocking can occur.

Especially with CPU-heavy ciphers or Brotli levels, I use benchmarks to find the point at which compression uses instead of brakes. During peak traffic, I temporarily lower the compression level when CPU is the bottleneck and raise it again during normal traffic.

Real-time traffic: WebSockets, SSE and long polling

Connections that remain open for a long time (WebSockets, server-sent events, long polling) have a strong influence on capacity planning. I separate such Long-Lived-connections from classic request/response paths, dimension dedicated workers and set tighter limits. It is important that low resources are required per connection: Light protocol stacks, tight buffers and conservative keep-alive strategies are mandatory here. I measure separately by connection type so that classic page views do not suffer from permanent connections.

Containers and cloud: Conntrack, pod limits and warmup

In container environments, I often come up against Conntrack-limits. nf_conntrack_max and the hash size must match the expected number of connections, otherwise packets will already drop in the kernel. Pod limits (CPU/Memory Requests & Limits) also determine how many simultaneous requests an instance can actually handle. I schedule warm-up phases so that freshly started pods can fill caches before they receive full traffic. At node level, I make sure that ulimit and sysctl values arrive in the containers (e.g. via initContainer or DaemonSets) and do not get stuck at the host.

At Horizontal scaling I use p95/p99 latencies as triggers, not just CPU. This way I react to real user experience and prevent single „loud“ pods from distorting the average. Connection draining in Ingress/Service ensures smooth transitions when scaling up and down.

Error images and quick diagnosis

I recognize typical symptoms by clear patterns:

  • High retransmits / SYN drops: Backlog too small, packet losses or accept queues too short.
  • Many 502/504: Upstream timeouts, PHP FPM/DB pools that are too small or blocking application calls.
  • 503 under load: Worker or process pools exhausted, RAM limit reached, limits too tight.
  • Spikes in TIME_WAIT: Excessive new construction instead of reuse; check keep-alive/pooling.
  • Increasing p99 latencies with stable p50: Queuing effects, hotspots, lock competition.

For the Quick diagnosis I combine metrics (backlogs, connection states, latencies) with short profiling and log samples. I write access logs buffered or selectively so that I/O does not become a bottleneck. If logs become a bottleneck, I move them asynchronously and aggregate them centrally.

Capacity planning: headroom, SLOs and test profiles

I plan with headroom of 20-40% above the typical daily load, so that short peaks don't break limits straight away. For business-critical applications, I run N-1 reserves: if one instance fails, the capacity of the remaining instances is still sufficient for acceptable SLOs. I define measurable targets (e.g. 99% of requests under 300 ms, error rate < 0.1%) and test against them.

I switch between profiles during load tests:

  • Step load: Increase in 1-5 minute increments to see kink points cleanly.
  • Soak tests: Several hours under constant, high load to detect leaks and drift.
  • Burst tests: Simulate short-term peaks to validate backlog reserves and limits.

I not only measure throughput, but also Waiting times in queues, CPU steal in VMs, disk latency and network errors. Only the combination shows whether the system is systemically stable or only fast in the short term.

Scaling and traffic peaks

For sudden peaks, I combine Load balancing, caching and content outsourcing. Round-robin or weighted methods distribute requests across multiple instances. I pull static files to a CDN so that the origin server has CPU free for dynamic responses. Autoscaling at application or container level supplements these measures and shortens response times to load jumps. I use quotas and rate limiting to protect the platform against backlog floods and keep the Availability high.

My core roadmap: This is how I proceed

First I determine the current Limit, I measure latencies, error rates and queue lengths and log hard bottlenecks. I then gradually raise kernel and web server limits, adjust keep-alive and buffers and check the effect under load. In the third step, I integrate caching, activate HTTP/2 or HTTP/3 and optimize database parameters. In the fourth step, I adjust PHP FPM processes and file descriptor limits to the RAM budget. Finally, I establish constant monitoring, repeat load tests regularly and thus keep my Connection Limits permanently in the green range.

Closing: Stable with reserves instead of on edge

Connection Limits are not a single switch, but the Interaction from kernel queues, web server settings, process pools, database tuning, network paths and hardware. Raising limits in isolation often only postpones the problem. I therefore rely on a holistic approach: first measure, then increase in a targeted manner, always test against real load patterns and secure with monitoring. In this way, throughput and reliability grow together and the server remains stable even under peak loads. predictable performance.

Current articles