TCP congestion control determines how connections are handled when changing network load adjust the data rate and how algorithms perform in real hosting setups. In this comparison, I show concrete effects on throughput, delay, and fairness for Web hosting, streaming, and cloud workloads.
Key points
To help you read the article in a targeted manner, I will briefly summarize the key aspects and then put everything into context later. I make a clear distinction between loss-based and model-based methods because the two families react very differently. I explain why cwnd, RTT, and pacing affect performance and Fairness decide. I organize measurement results so that you don't have to make decisions based on gut feeling. I conclude with pragmatic recommendations for hosting, containers, and global Users.
- AIMD controls cwnd and reacts to losses
- CUBIC Delivers consistent performance with high RTTs
- BBR uses model, reduces queues and latency
- BIC scores in nets with drops
- Hybla accelerates long lines (satellite)
What congestion control controls: cwnd, RTT, loss signals
I'll start with the basics, because they influence every subsequent choice. The congestion window (cwnd) limits how many bytes are in transit without confirmation, and AIMD gradually increases cwnd until losses occur. RTT determines how quickly confirmations return and how aggressively an algorithm can grow. Loss signals used to arise from timeouts and triple duplicate ACKs, but today queue depth, minimum RTT, and measured bottleneck bandwidth are also taken into account. Those who understand cwnd, RTT, and loss interpretation can assess the impact on throughput, delay, and Jitter Immediately better.
Review: Reno, New Reno, and Vegas in everyday life
Reno uses slow start at the beginning and then transitions to linear increases, but drastically halves the window size after losses. NewReno repairs multiple losses per window more efficiently, which is particularly helpful with moderate error rates. Vegas measures expected versus actual throughput over the RTT and slows down early to avoid losses. This caution keeps queues small, but costs throughput when loss-based flows send more aggressively in parallel. If you see unexplained timeouts or duplicate ACKs, you should first Analyze packet loss and then the algorithm for Topology Select appropriately.
High-BW-RTT: BIC and CUBIC in comparison
BIC uses binary search to find the highest lossless cwnd and keeps the rate surprisingly constant even with minor errors. In labs with low latency and drop rates around 10^-4, BIC delivered throughput values above 8 Gbit/s, while classic algorithms fluctuated. CUBIC replaced BIC as the standard because it controls its cwnd via a cubic time function, thereby making better use of long RTTs. After a loss event, the curve first grows moderately, then accelerates noticeably, and finally becomes cautious again near the last maximum rate. In networks with long paths, I often achieve higher utilization with CUBIC while maintaining good Plannability the latencies.
Model-based: BBR, pacing, and buffer bloat
BBR uses a model based on minimum RTT and measured bottleneck bandwidth, sets cwnd to approximately twice the bandwidth-delay product, and paces packets evenly. This strategy prevents overflowing queues and keeps delays short, even when minor losses occur. In setups with fluctuating radio quality or mixed paths, goodput remains high because BBR does not react reflexively to every drop. Version 1 is considered very fast, but can compete with CUBIC in small buffers, showing somewhat unfair distribution. I combine BBR in BBR CUBIC hosting I prefer landscapes for primary flows and run CUBIC as a compatible fallback.
Satellite and radio: Hybla, Westwood, and PCC
Hybla compensates for the disadvantages of high RTTs by scaling cwnd as if there were a short reference RTT. This allows long distances, such as satellites, to reach usable rates much faster and remain there reliably. Westwood estimates available bandwidth based on ACK rates and reduces cwnd less harshly after losses, which helps in wireless networks with random errors. PCC goes one step further and optimizes a utility value through short experiments, enabling it to achieve high throughput-latency combinations. In heterogeneous networks with Mobility I prefer to test Hybla and Westwood before tackling complex QoS rules.
Performance comparison: measured values and fairness
Comparisons show clearly different profiles when latency and error rates vary. At low RTT, BIC often dominates the pure throughput race, while CUBIC offers a reliable mix of rate and behavior over time. With very long RTTs, CUBIC clearly outperforms Reno and NewReno because it translates waiting times into growth more effectively. BBR keeps queues empty and delivers consistently low delays, but generates more retransmissions depending on the buffer size. For parallel flows, CUBIC usually shows a fair distribution, while BIC tends to keep the line close to the Capacity.
| Algorithm | Principle | Strength | weakness | Recommended for |
|---|---|---|---|---|
| Reno / New Reno | Loss-based, AIMD | Simple behavior | Slow at high RTT | Legacy stacks, Troubleshooting |
| Vegas | RTT-based | Early congestion avoidance | Lower throughput | Constant latency |
| BIC | Binary search | High goodput with drops | Aggressive towards others | Low RTT, error rates |
| CUBIC | Cubic time function | Good fairness | Dispersion in drops | Long paths, data centers |
| BBR | Model-based, pacing | Low latency | Unfair in small buffers | Hosting, global users |
| Hybla | RTT compensation | Fast ramp-up | special case character | satellite, Maritime |
Practical guide: Selection based on latency, loss, and target
I start every decision with clear goals: low latency, maximum throughput, or balance for many flows. With tightly limited buffer sizes and sensitive latency requirements, I turn to BBR first. When long paths dominate and multiple hosts coexist, there is no way around CUBIC. In networks with easily observable drop patterns, BIC continues to deliver impressive rates, provided fairness is secondary. For satellite and very high RTT path costs, Hybla eliminates typical ramp-up disadvantages and ensures fast payload.
Linux, Windows, and containers: Activation and tuning
On Linux, I check the active algorithm with sysctl net.ipv4.tcp_congestion_control and implement it specifically with sysctl net.ipv4.tcp_congestion_control=bbr. For CUBIC, I observe kernel defaults, but adjust net.core.default_qdisc and pacing parameters when I defuse host queues. In containers, I transfer settings to the host because namespaces do not isolate every queue. In Windows, starting with the current versions, BBR can be activated in suitable editions, while older systems continue to use CUBIC or Compound. Without a solid system and QueueWith these settings, every algorithm loses a noticeable amount of its effectiveness.
Web hosting perspective: Multi-flow fairness and goodput
In hosting clusters, it's the sum of many simultaneous flows that counts, not the best value of a single transfer. CUBIC keeps connections predictable and usually distributes capacity fairly, which favors dense multi-tenant scenarios. BBR reduces queues and keeps response times short for APIs and dynamic websites. If you are considering protocol overhead, you should test transport with HTTP versions; my starting point is HTTP/3 vs. HTTP/2 in conjunction with the selected CC method. For global users, I prefer low latency peaks because response time directly affects the perceived Speed characterizes.
QUIC and BBR: Influence beyond TCP
QUIC comes with its own UDP-based congestion control and uses similar principles to BBR, such as pacing and RTT observation. In modern stacks, performance is gradually shifting from TCP to the application layer. This increases the degree of freedom, but requires accurate measurement at the path, host, and application levels. For planning purposes, I recommend using the QUIC protocol against CUBIC/BBR-TCP under real load profiles. This allows me to identify early on where queuing occurs and how I can eliminate bottlenecks through pacing or Shaping smoothness.
AQM, ECN, and buffer discipline: Interaction with algorithms
Congestion control only really comes into its own when combined with the queue management of the network devices. Classic tail drop fills buffers to the brim and then abruptly discards packets, resulting in latency spikes and synchronization effects. Active queue management (AQM) such as CoDel or FQ-CoDel marks or discards packets early on and distributes capacity more fairly across flows. This benefits all methods: CUBIC loses less cwnd due to burst drops, and BBR maintains its PacingStrategy more stable because queues do not „burst.“.
ECN (Explicit Congestion Notification) complements this picture. Instead of discarding packets, routers mark congestion with a CE bit; endpoints throttle without the need for retransmits. Loss-based algorithms thus react earlier and more gently, while model-based methods such as BBR interpret the signals in the context of the minimum RTT. In data centers, DCTCP with consistent ECN enables very low queuing delays at high utilization. In the WAN, I use ECN selectively: only when paths consistently pass through the markings and middleboxes do not intervene. In mixed public networks, it is still important to configure AQM cleanly instead of simply increasing buffers.
Bursts, offloads, and host-side pacing
Many performance drops are caused by transmission bursts on the host. Large offloads (TSO/GSO) bundle segments into very large frames; without Pacing these packets are discharged in short bursts and fill switch queues. I therefore set sch_fq or FQ-CoDel as default_qdisc under Linux and use pacing rates specified by the CC algorithm. BBR benefits from this in particular, but CUBIC also becomes more stable. Excessively large NIC ring buffers and too high a txqueuelen lengthen queues in the host – I choose moderate values and use tc -s qdisc to observe whether drops or ECN marks occur there.
On the receiving end, GRO/LRO affect the latency of small flows. For API-heavy workloads, it is worth testing or throttling these offloads depending on the NIC and kernel so that ACKs are processed faster. In container setups, I check veth pairs and host Qdiscs: the queue lives on the host interface, not in the namespace. If you use cgroups for bandwidth management, you should set limits consistent with CC and AQM, otherwise unpredictable interference between flows will occur.
Workload profiles: short flows, elephants, and streaming
Not every application requires the same congestion control. „Mice“ flows (small transfers) dominate web APIs and dynamic pages. What matters here is how quickly the connection enters the user phase and how low the tail latencies remain. BBR keeps queues flat and favors short-lived flows, while CUBIC delivers solid averages with fair capacity allocation. The initial window size (initcwnd) and delayed ACK settings influence the first RTTs: conservative defaults protect against bursts, but should not slow down the first kilobytes too much.
„Elephant“ flows (backups, replication, large downloads) require stable utilization over long periods of time. CUBIC scales robustly across different RTTs and shares fairly with parallel flows. BIC delivers maximum rates in controlled networks with known drop patterns, but has disadvantages when coexisting. For live streaming and real-time interaction (VoIP, gaming), I consistently avoid standing queues: BBR remains the first choice as long as buffers are small and AQM takes effect. Nagle interactions (TCP_NODELAY) and application batching come into play: If you generate a lot of small writes, you should specifically disable Nagle and leave the fine-grained pacing to the pacing algorithm.
Measurement methodology: realistic tests and meaningful metrics
Good decisions require reproducible measurements. I combine synthetic load with real path conditions: controlled emulation of RTT, jitter, and loss (e.g., on test links) plus real destination locations. I measure bandwidth as goodput and correlate it with RTT histories, retransmits, and out-of-order proportions. P50/P95/P99 latencies tell more than averages—especially for API response times and interactivity. For TCP, I look at cwnd curves and pacing_rate and check host-side Qdisc statistics and CPU saturation so I can correctly identify bottlenecks.
Individual tests can be misleading: parallel flows per host and cross-traffic create realistic competitive situations. Time of day, peering routes, and radio conditions vary; I repeat measurements in time series and check sensitivity to small drop rates. For QUIC, I mirror identical workloads against TCP so that the application and transport layers are evaluated separately. Only when measurements remain stable under interference do I commit to a choice in production.
Common errors and quick fixes
A constant increase in RTT under load accompanied by a simultaneous drop in throughput indicates Buffer bloat Solution: Activate AQM, sharpen host pacing, use BBR if necessary. Many retransmits without clear drop patterns indicate reordering or ACK compression – FQ-based Qdiscs and clean pacing help. Sudden hangs with missing ACKs often indicate path MTU problems; I activate MTU probing and set MSS clamping at relevant transitions.
Unfair distribution between rivers occurs when individual connections have a permanent advantage: CUBIC improves RTT fairness compared to older loss algorithms, BBR requires clean buffer discipline; with small buffers, finer pacing adjustment or a return to CUBIC can ensure coexistence. In container environments, „hidden“ queues arise at veth ends—without coordinated Qdisc and cgroup limits, congestion shifts to the host, far away from the application.
Operational guidelines: Team and platform decisions
I anchor congestion control in platform standards: uniform Qdisc defaults, defined CC profiles per cluster, and playbooks for deviations. For global Users I separate workloads according to latency sensitivity: front-end APIs are prioritized with BBR and strict AQM, bulk transfers with CUBIC. Telemetry is mandatory: RTT distribution, throughput, retransmits, and ECN rates as time series. The team rolls out changes using percentage experiments and compares P95/P99, not just average values. This makes CC decisions repeatable and traceable—and ensures they are not just gut feelings.
Decision checklist
I first check RTT spans and error rates because they dominate behavior. Then I decide whether latency or throughput takes priority and test specifically against that metric. In the next step, I measure fairness with parallel flows to avoid surprises during operation. Finally, I check buffer sizes, AQM methods, and pacing settings on the host and gateways. Finally, I validate under load whether the choice with real users and real Routes carries.
Short balance sheet
Reno and NewReno deliver clear reference behavior, but appear to be slowed down on long paths. CUBIC is standard in almost every Linux hosting because it makes good use of long RTTs and behaves fairly. BIC performs well in networks with noticeable drops when maximum utilization is more important than neighborhood. BBR enables low latencies and consistent response times, but requires attention to buffers and coexistence. Those who carefully balance targets, path characteristics, and host queues can use congestion control as a real Lever for user experience and costs.


