HTTP requests can block even though CPU, RAM and bandwidth seem open, because invisible limits, filters and queues take effect along the entire chain. I explain where Boundaries how they work and which settings I set so that requests run smoothly again.
Key points
Before I go into detail, I will summarize the most important causes and name what I look at first. These points cover the typical bottlenecks that lead to congestion despite free resources. I've deliberately kept the list compact so that you can check the starting points immediately. The key point is that each layer has its own rules that apply independently of CPU and RAM. If you know these rules, you can quickly resolve many „inexplicable“ waiting times.
- Worker limitsToo few processes/threads block new connections despite free CPU.
- Security layerWAF/web filters block patterns, methods or clients, often without a high load.
- ConcurrencyPHP-FPM, database and proxies limit simultaneous sessions.
- Keep-Alive/TimeoutsLong connections tie up slots, requests end up in queues.
- Client filterBrowser extensions stop requests before they reach the server.
These key points are often enough to check behavior in a targeted manner. In the following, I will show you how I derive concrete measures from this and Blockages cleanly.
Why HTTP requests block despite free resources
A request passes through several layers: Client, network, filter, web server, runtime environment and database. Each layer brings its own Limits which take effect regardless of CPU, RAM or bandwidth. If worker slots are occupied or rules are active, the request waits in a queue or is immediately thrown out. This waiting time often does not appear at all in classic resource diagrams. This is precisely what leads to the misconception that the server is „empty“, even though requests are not being answered.
Security layer: WAF, filters and provider rules
Many blockages occur before the application is even running. Web application firewalls, IDS/IPS and provider-side filters recognize patterns and slow them down or block them [1][5][9]. Suspicious parameters, old protocols or combinations of methods are enough to cause a Lock to ignite. From the operator's point of view, this looks like a server error, but the decision is made „upstream“. I therefore check the WAF logs and note the request ID, IP, time and status code. With this data, the rule can be identified and adjusted in a targeted manner without compromising security.
Client side: browser extensions and local blockers
Not every request reaches the server. Adblockers, password managers and script blockers already stop URLs in the browser; the DevTools then show „Requests to the Server Have Been Blocked by an Extension“ [3][7]. I test in a private window, deactivate extensions and check whether the Request was sent at all. It also helps to control priorities in the front end, for example with a clean Request prioritization for critical assets. This prevents non-critical third-party calls from delaying important routes.
Understanding method and routing: 405, 403, 429
A 405 „Method Not Allowed“ clearly shows that the server knows the resource but does not allow the method used [5]. Similarly, 403 indicate filters or rights and 429 active rate limiting. In logs, I can quickly recognize whether a global rule allows methods such as PUT or DELETE or whether an endpoint has never been implemented. I then adjust the routing, controller or WAF rule. In this way, supposed „blocking“ dissolves into a clean correction of methods and paths.
Web server architecture and worker limits
Apache, NGINX, LiteSpeed and OpenLiteSpeed handle connections differently [4]. The decisive factors are the number of worker processes, threads and how keep-alive sockets occupy slots. If all workers are occupied by long connections, new requests move to a Queue, although the CPU and RAM appear free. I therefore evaluate connection states and adjust workers, backlogs and keep-alive times. Background knowledge of queues helps, for example with the topic of Server queuing and latency.
| shift | Relevant limit | Typical symptom | Diagnostic note |
|---|---|---|---|
| Web server | Worker/thread number | Queues, 503 under load | Status modules, check connection statuses |
| PHP-FPM/FastCGI | max_children / pm | Hanging requests, high time-to-first-byte | FPM logs, slow log, number of processes |
| Database | max_connections | Error „Too many connections“ | SHOW PROCESSLIST, Connection peaks |
| WAF/Filter | Signatures, methods | 403/405, broken form posts | WAF logs, rule hit IDs |
| Load Balancer | Per-Backend-Conn-Limit | Inconsistent response times | LB-Stats, Backend-Health |
Concurrency in PHP-FPM, database and proxies
Concurrent processing often bursts first in the runtime environment. If all PHP FPM workers are busy, there is no slot available for new scripts; the requests wait even though the CPU hardly works. The situation is similar for databases with max_connections or for proxies with connection limits per backend. I first optimize the duration of individual requests before increasing limits. In this way, I shorten the document time per slot and reduce the probability of queues growing.
Slow backends and PHP session locking
Long database queries, external APIs or file I/O tie up workers significantly longer. Session locking can also slow down entire chains, such as WordPress logins or shopping carts. I check whether parallel requests to the same session ID run consecutively instead of simultaneously. If so, I rely on targeted unlocking, reduce critical write accesses and follow tried-and-tested instructions on the PHP session locking. This allows me to free up slots more quickly and reduce Waiting times noticeable.
Timeouts, keep-alive and connection strategies
Keep-alive times that are too long tie up resources, while those that are too short generate handshakes and latency. I choose values that match the traffic profile and set limits for header, body and backend timeouts. It is important to set timeouts not only at the Web server but uniformly along the chain: proxy, app, database. In addition, I prevent idle blocking through finer HTTP/2/HTTP/3 settings and prioritization. This keeps slots available without clients having to constantly reconnect.
Hosting models: Shared, VPS, Dedicated
Shared hosting sets early filters and hard quotas so that the platform remains fair [1]. On VPS, providers isolate CPU and RAM, but maintain limits for I/O, network or security; differences in performance and monitoring are clear [10]. On dedicated servers, I bear full responsibility for the web server, database and WAF configuration. Comparisons show that modern stacks with HTTP/3, NVMe and DDoS protection offer clear advantages [2][6][11][8]. Those who need high parallelism benefit from clearly documented Boundaries and support, which helps with rule units.
Systematic analysis: step by step
I start at the source: Are DevTools really sending the request, or is an extension [3][7] blocking it? Then I look at status codes: 403/405/429/503 give strong indications of filters, methods or capacity [5]. At the same time, I check logs from the web server, app and WAF to find patterns and recurring signatures [1][9]. I then check worker numbers, FPM parameters, keep-alive and database connections and increase limits on a test basis with measurement points before and after. Finally, I simulate load, observe bottlenecks in real time and verify that the Queues shrink.
Best practices against blockages
I formulate concurrency targets per layer and set limits so that load peaks are cushioned. The web server must match the traffic pattern; benchmarks help with the selection and configuration [4]. I optimize backends logically first: faster queries, shorter transactions, fewer serial sections. I keep security rules strict enough against attacks, but with exceptions for legitimate ones Sample. Monitoring does not end with CPU/RAM: I look at connections, queues, response times and error codes so that bottlenecks remain visible [6][11].
Practice notes: request blocking hosting
In shared environments, blockades often end up before the actual web space; support then needs concrete request data to adjust rules [1]. On VPS, I scale gradually: more workers, more suitable keep-alive values and closer monitoring of the database [10]. On my own hardware, I decide on load balancing, WAF rules and limits per backend. Projects with highly parallel access benefit from a clean HTTP/2/HTTP/3 configuration and clear reserves for Tips. If you expect growth, plan to switch to more powerful tariffs early on and save a lot of tuning effort later [2][6][10][11].
Network and kernel limits: backlog, ports and descriptors
In addition to the web server and app, the kernel limits how many connections can arrive, be established and managed at the same time. I first check the List backlog: Even if the web server has many workers, the accept queue can be short. The interaction between the application (listen backlog), kernel (somaxconn) and SYN backlog (tcp_max_syn_backlog) determines whether connections remain in the queue or are discarded. Symptoms are increasing connect times and retransmits - with low CPU utilization. I compare the values and measure the actual utilization of the queues to avoid drops.
Another classic is the conntrack table for NAT/firewall setups. If it is full, connections disappear „without a trace“; the application never sees a request. I can recognize this by messages in the system log and abrupt timeouts during peak loads. Countermeasures are: suitable table size, realistic idle timeouts for protocols, fewer unnecessary NAT paths and efficient keep-alives that reuse connections sensibly.
I also check the number of open File descriptors (ulimit -n). If many simultaneous sockets and files hit restrictive limits, Accept fails („too many open files“) and new requests pile up in front of it. The fix is usually trivial: set the nofile limits for the web server, proxy and database to a healthy level - and make them persistent, not just interactive.
In highly parallel setups, I observe the Ephemeral port range and TIME_WAIT-states. Especially behind NAT gateways, the available source ports are exhausted when short connections are established en masse. I therefore rely on connection reuse (keep-alive, HTTP/2/3), reduce unnecessary short-lived connections and tune TIME_WAIT handling carefully without risking stability. The result: less port exhaustion and more stable connect times under load.
On the network card, I check queue lengths, offloading settings and IRQ distribution. Unevenly distributed interrupts or overloaded queues generate latency peaks that are not noticeable in application logs. With balanced IRQ balancing and sensible Qdisc settings (keyword Buffer bloat) I reduce latency without restricting the bandwidth.
HTTP/2 and HTTP/3: Using multiplexing correctly
Multiplexing solves many problems, but introduces new limits: Maximum stream numbers, flow control window and idle timeouts apply per connection. If the value for simultaneous streams is too low, new requests „hang“ even though the TCP or QUIC connection is established. I therefore check how many critical resources need to be loaded in parallel and carefully adjust the stream limits. At the same time, I pay attention to reasonable Flow control-window so that large responses are not throttled.
HTTP/2 multiplexer over TCP can suffer from head-of-line blocking in case of packet loss; HTTP/3 on QUIC avoids this, but requires clean TLS/ALPN settings and stable path handling rules. I test both paths and select the protocols that match the traffic profile. Important: Don't blindly trust prioritization - browsers and servers interpret it differently. I focus on critical routes and check whether priorities are actually effective and slots are not taken up by long-running secondary streams.
CORS, preflights and header/body limits
Not every 4xx error comes from the server. CORS violations are created in the browser and show up in the console, not in the access log. I verify whether preflight requests (OPTIONS) are answered correctly and whether WAF/proxies allow this method. If headers such as Access-Control-Allow-Methods/-Headers are missing, the browser „blocks“ the response - without any server load.
Another bottleneck: Header and cookie sizes. Overgrown cookies, many Vary headers or large referer lines lead to 431 errors or silent drops due to buffer limits. I limit cookie ballast, consolidate headers and set buffer sizes consistently along the chain. For uploads, I pay attention to body limits, 100-continue handling and consistent Chunked encoding-Support for all proxies. If the body and upload limits do not match, clients wait for a release that never comes - requests seem to „hang“.
DNS and TLS: Handshakes as hidden latency
DNS resolution and TLS negotiation are frequent blind spots. Multiple CNAME chains, slow resolvers or IPv6/IPv4 mismatch extend the startup time without using CPU. I reduce unnecessary DNS jumps, set sensible TTLs and ensure fast resolver paths. On the TLS side, I check certificate chains, activated cipher suites, OCSP stapling and session resumption. A clean ALPN handshake prevents downgrades to HTTP/1.1, which put a greater strain on keep-alive slots. Result: shorter time-to-first-byte and more stable parallelism, especially on mobile networks.
CDN/Edge: Caching, rate limits and IP reputation
Between client and origin, CDNs, reverse proxies and DDoS protection systems decide on Culvert and throttle. I check whether critical routes are cached correctly (stale-while-revalidate, stale-if-error) and whether negative caches hold errors longer than necessary. Rate limits, bot management and IP reputation can dampen legitimate traffic, especially with shared networks or heavy API access. I segment traffic (e.g. API vs. assets), define clear cache keys and selectively mitigate rules for trusted clients. This relieves the load on Origin and prevents CDN queues from growing while the server looks „underutilized“.
Containers and orchestration: cgroups, Ingress and conntrack
In containers apply cgroup limits for CPU, RAM, pids and files. A CPU quota that is too tight leads to throttling: processes wait for CPU time even though the host is free. I check quotas and make sure that ingress/proxy pods have enough file descriptors and buffers. In Kubernetes, I check ingress timeouts, readiness/liveness probes and service implementations (IPVS), because faulty probes or timeouts generate zigzag latency and unnecessary restarts.
An often overlooked bottleneck is the NAT/conntrack capacity per node. Many short-lived connections (e.g. egress to external APIs) fill the conntrack table, then requests „disappear“ in the network. I scale the table, set realistic timeouts and bundle external calls so that fewer new connections are created. I plan PodDisruptionBudgets, rolling updates and HPA scaling in such a way that no capacity is deducted from the scheduler at peak times - otherwise queues form, even though the app theoretically would have enough workers.
Observability: correlation, tracing and meaningful metrics
To find blockages quickly, I need Continuous correlation. I assign request IDs (e.g. traceparent) to the edge, web server, app and database and write them to the logs. This allows me to see whether a request fails at the WAF, is waiting at the web server, is stuck in the FPM queue or is blocked in the database. I work with Histograms instead of pure average values and monitor P95/P99 latency, open connections, accept queue, FPM queue length, active DB sessions and backend error codes. I also use synthetic checks to clearly separate client-side effects from server-side effects.
For anomalies I use a Drilldown-The procedure: first edge/WAF logs, then load balancer, then web server access/error, then app and FPM logs and finally DB and system logs. This path shows me exactly where the time is lost and at which limit the request stops. With targeted metrics per layer, I avoid gut feelings and drastically reduce the time to the root cause.
Tuning playbook and checklist
In practice, I have a compact playbook that I adapt to the environment:
- ReproducibilityNail down the scenario (route, method, size, client), log timestamps and IDs.
- Check layer by layerBrowser/Extensions, CORS/Preflight, WAF-Hits, LB-Stats, Webserver-Status, FPM-Queue, DB-Active/Locks.
- Make queues visibleAccept/SYN backlog, FPM listen queue, proxy backlog, DB connection pool.
- Adjust limitsWorker/threads, somaxconn, nofile, max_connections, stream limits for H2/H3, body/header limits, timeouts.
- Reduce occupancy timeAccelerate queries, avoid session locks, reduce I/O, compress responses and cache them sensibly.
- Harmonize strategiesKeep-Alive duration, HTTP/2/3 parameterization, prioritization of critical routes.
- Adjust security: Targeted exclusion of WAF rules instead of global weakening; logging with hit IDs.
- ScalingDefine concurrency per shift, run load tests, measure reserves, increase limits only after optimization.
- FallbacksCircuit breaker for slow backends, retry policy with jitter, „stale-if-error“ for critical assets.
Briefly summarized
Blocked requests with free CPU and RAM are usually caused by limits, filters and connection strategies - not by a lack of performance. I first check where the request stops: browser, WAF, web server, runtime or database. Then I minimize occupancy times per slot, remove unnecessary Locks and set realistic timeouts. I keep security high, adjust rules against false alarms and collect evidence in logs. With this approach, HTTP requests remain reliably accessible - even when traffic increases by leaps and bounds and every second counts.


