Servers and Virtual Machines

CPU overcommitment: How it slows down virtual servers

CPU overcommitment slows down virtual servers because the hypervisor allocates more vCPUs than there are physical cores, resulting in waiting times. I show the causes, real measured values such as CPU Ready Time and specific adjustments that I use to keep VPS performance stable.

Key points

Before I delve deeper, I will categorize the most important aspects and delineate typical misunderstandings. Many operators confuse high utilization with efficiency, although queues shape response times. Scheduler details determine whether applications run smoothly or stall. I summarize the core topics that I will build on in the following chapters. The list serves as a compact reference for quick decisions.

scheduler and time slicing determine how the vCPUs are allocated.
CPU Ready displays waiting times and gives early warning of bottlenecks.
SMP guests (multiple vCPUs) increase overhead and latency.
Rightsizing and monitoring keep peak loads manageable.
Provider selection without overbooking ensures constant performance.

What does CPU overcommitment mean technically?

Overcommit This means that I allocate more virtual cores than the host physically has and rely on the hypervisor scheduler. KVM or VMware allocate computing time via time slicing, which is inconspicuous under low load and appears to enable high density. Under parallel load, however, waiting times increase because several vCPUs require computing time at the same time and the scheduler has to schedule them one after the other. Red Hat warns that SMP VMs in particular with many vCPUs lose a lot of performance as soon as the sum of the vCPUs significantly exceeds the physical cores [1]. VMware experts quantify this via CPU Ready Time: 1000 ms waiting time per vCPU corresponds to around 5% performance loss, cumulatively per core [3].

Why virtual servers are slowed down

Queues are the main reason why virtual servers slow down when overbooked, even though CPU usage looks high. A vCPU may only run when a physical core is free; until then, CPU Ready increases and the application waits. With several VMs with parallel peaks, the effect is exacerbated because they are all „ready to run“ and the scheduler can only work in time slices. In particular, latency-critical workloads such as databases, APIs or store backends react sensitively, as every additional context change and every delay triggers chain effects. I then observe timeouts, unsteady response times and an increasing variance that noticeably irritates users.

Measured variables: CPU Ready, Steal & Co.

Indicators such as CPU Ready, Co-Stop and Steal Time show me early on whether overcommitment is affecting my VM. CPU Ready in hypervisor metrics should remain well below 5% on average; if the value rises to double-digit percentages, throughput drops noticeably [3]. Co-Stop signals that SMP VMs cannot be scheduled simultaneously, which slows down multi-threading. In Linux guests, I read Steal Time, which shows how much time the host is taking away from my VM; I have explained the background and tuning in an accessible way here: CPU steal time. If you combine these three signals, you can recognize bottlenecks in good time and prevent latency problems from eating through to the application.

Real example: When 5:1 breaks the limit

Practice beats theory as soon as real workloads mix: A host with 4 physical cores and 5 VMs with 4 vCPUs each appears unproblematic when idle, but shows massive waiting times under load. If a VM starts image processing or backups, the scheduler prioritizes, but the remaining VMs accumulate CPU-ready values of over 2000 ms, which means around 10% performance loss per core [3]. In a documented SQL server test, the throughput dropped from 25,200 transactions per minute to less than half [3] when background operation was activated. I/O also indirectly slows down because vCPUs are preempted during block device accesses and the pipelines stall. I then experience a mixture of high peaks, long tails and unexpected jitter in response times.

Special risks for SMP guests

SMP-VMs with many vCPUs require coordinated slots on several physical cores, which increases scheduling effort and waiting times. The more vCPUs a single VM has, the more often it waits for all the required time slices to come together. Red Hat therefore advises favoring several smaller VMs with few vCPUs instead of running individual „wide-track“ guests [1]. An overcommit ratio of 10:1 is considered to be a rough maximum value; I consider significantly less to be sensible in productive environments, especially for services with a heavy load [1]. If you set latency as the top priority, limit vCPUs per VM and optimize threads so that they can manage with less core base load.

Web hosting practice: effects on websites

Websites on overbooked hosts react with longer loading times, unstable time to first byte and poor core web vitals. Search engines downgrade slow pages, visitors bounce faster, and conversion chains break at inconspicuous microdelays [2]. In shared environments, many people are familiar with the „noisy neighbor“; on VPSs with overcommitment, this happens more subtly because nominally more vCPUs are allocated. In the event of traffic peaks, I therefore always first check whether the ready and steal values are high instead of blindly tweaking the web server. If you want to reduce costs, you should be aware of the risks of affordable web hosting and demand clear limits against overbooking [2].

Overcommitment vs. bare metal

Comparison shows: Bare metal delivers predictable latencies and linear throughput, while overbooked virtualization becomes choppy under load. For latency-sensitive workloads such as databases, queues, observability stacks and real-time APIs, dedication pays off quickly. I prefer dedicated cores or bare metal as soon as CPU Ready becomes noticeable or SMP guests come to a standstill. If you need flexibility, you can build a bridge with reserved CPU instances or host groups without overcommit. The comparison offers a structured view of options Bare Metal Hosting, which briefly compares strengths and compromises.

Right dimensioning: How many vCPUs make sense?

Rightsizing starts with the real demand: I measure CPU, run queue, disk and Net-IO as well as lock-wait patterns over several daily profiles. If measured values show a peak thread pool of four, I initially allocate two to four vCPUs and only increase them if Ready and Co-Stop remain inconspicuous. The rule of thumb „maximum 10 vCPUs per physical core“ is a cap, not a target value; I plan more conservatively for production [1]. Large VMs with many vCPUs look attractive, but increase coordination effort and latency fluctuations. I scale small, clean-cut VMs horizontally and thus keep queues short and efficient.

Monitoring and alerts: what I set

Monitoring makes overcommitment visible before users notice it, which is why I set clear limits. CPU Ready in the 1-minute average should ideally remain below 5% per vCPU, Co-Stop should permanently tend towards zero and Steal Time should only be noticeable for a short time [3]. If this is exceeded, I scale horizontally, park background jobs away from productive VMs or move guests to hosts with air. I separate alerts according to severity: Instant alert for sharp increases, ticket for recurring moderate spikes. In this way, I prevent alert fatigue and intervene specifically when latency becomes really business-critical.

Provider selection: What I look out for

Selection of a VPS provider determines consistency under load, so I check offers critically for overbooking. Transparent information on vCPU-to-core ratios, clear promises on dedicated cores and consistent benchmarks are mandatory. In a 2025 comparison, offerings with NVMe storage, modern CPU generation and no CPU overbooking performed best, with stable uptime and consistent latency [6]. Price alone often leads to hidden overselling, which becomes more expensive than honest resources in productive scenarios. The following table shows core parameters that I juxtapose to avoid bottlenecks.

Provider	vCPUs	RAM	Memory	Uptime	Price/month	Test winner
webhoster.de	4-32	8-128 GB	NVMe	99,99%	from 1 €	Yes
Hetzner	2-16	4-64 GB	SSD	99,9%	from 3 €	No
Contabo	4-24	8-96 GB	SSD	99,8%	from 5 €	No

Capacity planning: as soon as load peaks are imminent

Planning I start with workload profiles: Peak times, burst duration, parallelism and latency budgets. When the base load increases, I first increase vertically as long as Ready-Time remains stable; if the curve tips, I split services across several smaller VMs. I consistently separate background jobs from the frontend so that order processes or checkout do not compete with reports. Auto-scaling helps, but without upper limits and clear metrics, it produces expensive misconnections. A step-by-step logic works better: define thresholds, test measures, measure results and then fine-tune thresholds.

What a vCPU really is: SMT and frequency effects

vCPU usually means a hardware thread (SMT/hyper-threading), not necessarily a full physical core. Two vCPUs can be located on one core and share decoders, caches and execution units. Under pure integer or memory load, SMT brings noticeable advantages, but with saturated pipelines, threads compete directly for resources. This explains why hosts with „many vCPUs“ do not scale linearly under load: Although the scheduler can distribute slots, it cannot create more physical computing units. Power and turbo policies also have an effect. If many threads run in parallel, turbo frequencies drop and single-thread performance falls. For latency classes, I therefore consider dedicated cores, SMT-Off or CPU pinning to give threads predictable performance windows.

NUMA awareness: memory locality decides

NUMA separates modern multi-socket hosts into nodes with their own memory connection. Large SMP VMs that stretch across multiple NUMA nodes pay with higher memory latency, remote accesses and additional coordination. I align vCPU and RAM allocation so that a VM preferably fits into a node. In practical terms, this means fewer vCPUs per VM, but horizontal scaling. In the guest, I avoid oversized, globally synchronized thread pools and rely on sharding per instance. Those who virtualize databases benefit twice: better cache hit rate and less cross-node traffic. NUMA misplacement often disguises itself as „CPU problems“, but it becomes visible in increasing memory latency and read misses, while CPU Ready only has a moderate effect.

Burst and credit models: hidden limits

Burst instances with CPU credits deliver good values in idle mode, but throttle when there are no credits, although CPU Ready remains inconspicuous. This is tricky for operators because latency peaks occur „out of nowhere“. I therefore check whether a tariff uses credits or „fair share“ rules and whether minimum performance is guaranteed. Workloads with periodic peaks (cron, ETL, invoice batch) eat up credits quickly and then fall into a hard brake. The solution: either switch to reserved cores or decouple bursts - for example by using a separate batch profile with its own time window so that productive APIs do not run into the throttle. Overcommitment plus credit throttle is the most unfavorable combination for predictable response times.

Containers on the VPS: avoid double scheduling

Container orchestration in an already overbooked VM easily leads to „double overcommit“. The host scheduler prioritizes VMs, the guest scheduler prioritizes containers - both without knowledge of the real core availability. I therefore set clear CPU quotas and use cpuset, to bind critical containers to specific vCPUs. At the same time, I keep the sum of container threads below the realistically available budget of the VM, not below the nominal vCPU value. I define lower shares for build or batch containers so that frontend services retain priority. Important: irqbalance and network stack must not overrun critical vCPUs with interrupt floods; when in doubt, I isolate one or two vCPUs for network and storage interrupts in order to dampen latency peaks.

Measurement practice: How to read the right numbers

In the hypervisor I check CPU Ready (total and per vCPU), co-stop and run queue length per host. On KVM, I correlate domstats of the VMs with host load and IRQ load. In the guest I observe %steal, %iowait, run queue (r) and context changes. A recurring pattern is: High run queue + increasing %steal + fluctuating latency = overcommitment. If %steal remains low, but L3 misses and syscalls increase, I tend to point to lock retention or NUMA problems. I also count active request threads and compare them with vCPU counts: if web or worker pools are permanently over the core budget, I create queues myself. It is better to limit and reject incoming queues instead of processing them with a delay - this improves user perception and stabilizes systems.

Concrete tuning levers in the guest and host

Quick profits I achieve this with a few precise steps: In the BIOS, I set performance profiles, deactivate deep C-states and keep frequency scaling consistent. On the host, I set the CPU governor to „performance“ and reduce noise from background services. In the VM, I lower vCPUs to the real required value, pin critical processes (e.g. database IO threads) to fixed vCPUs and limit application thread pools. The following applies to web servers and runtimes: worker_processes (Nginx), pm.max_children (PHP-FPM) or JVM executor pools should not be larger than the total available core budget minus system overhead. Large pages and consistent timer sources reduce scheduling overhead; at the same time, I avoid aggressive overcommit of RAM to prevent additional swap latencies from entering the pipeline.

Application design: Backpressure instead of overcrowding

Backpressure is the clean answer to scarce cores. Instead of buffering request floods in huge queues, I limit parallel processed requests to the cores plus reserve. Services signal „busy“ at peak load or deliver degraded but fast responses (e.g. shorter caches, fewer details). Databases get shorter lock timeouts and leaner transactions; search and analytics queries run with a time delay. In microservice landscapes, I brake at the edge, not in depth: API gateways and ingress limits prevent internal dependencies from collapsing. The result is orderly queues with short tails - exactly what saves the user experience under overcommitment.

Live migration and background load: hidden stumbling blocks

vMotion/Live Migration or host maintenance windows cause increased latencies in the short term, even if overcommitment is moderate. While memory is copied and CPU states are synchronized, time slices and IO paths are shifted. If the event coincides with batch windows, delays accumulate. I plan migration windows outside peak times and park jobs running in parallel. I also strictly separate backup/antivirus/indexing from latency paths - ideally on my own VMs with low priority. In this way, I prevent „well-intentioned“ maintenance processes from distorting performance measurements or hitting user flows.

Checklist: A clear diagnosis in 15 minutes

Select time period, reproduce load (load test or peak window).
Hypervisor: Check CPU Ready per vCPU, Co-Stop, Host Run Queue.
Guest: %steal, %iowait, run queue, context switch, IRQ load measurement.
Synchronize thread and worker pools of the application with vCPU number.
Identify and move background jobs and cron runs.
Test: Halve or pin the vCPU number, measure Ready/Steal again.
When values fall and latency smoothes: Split horizontally, fix limits.
If no improvement: Change host/plan, check dedicated cores.

10 typical misconceptions that cost performance

Errors I see this regularly: More vCPUs do not automatically mean more speed if the host is already running at a low clock rate. A high CPU value in the VM does not occupy full core usage as long as Ready is high and Steal is increasing. Large SMP VMs do not necessarily provide better parallelism if synchronization and locks dominate. Prioritization functions of the hypervisor do not remove physical limits; they only shift delays. And database or PHP tuning only briefly conceals bottlenecks if the scheduler remains the real bottleneck.

Concrete steps: From symptoms to cause

Procedure I start reproducibly: First define load scenario, then record CPU Ready, Co-Stop, Steal and IO wait times in the same time window. If metrics show typical overcommit signatures, I reduce the number of vCPUs per VM, distribute SMP workloads and move background processes. If values remain high, I move the VM to a host with a low ratio or reserved cores. If the latency does not change until then, I save the new profile as a baseline and anchor alarms to percentage and millisecond values. This way, I don't solve symptoms, but address the cause in the scheduling.

Briefly summarized

SummaryCPU overcommitment sounds efficient, but under load it creates queues that slow down virtual servers. Metrics such as CPU Ready Time, Co-Stop and Steal Time clearly indicate the problem and provide objective thresholds. Red Hat recommends conservative ratios and smaller VMs with few vCPUs, while real-world data from VMware environments shows the impact on throughput and response times [1][3]. For websites and APIs, there is a risk of ranking losses, bounces and error-prone processes if latency fluctuates [2]. I therefore rely on rightsizing, clean monitoring, clear thresholds and - if necessary - dedicated cores or bare metal to keep VPS performance reliable.

Current articles

Database performance optimization with queries, indices and locking in web hosting

Databases

Database performance in web hosting: queries, indexes and locking

Increase database performance in web hosting: Optimize queries, indexes and locking for mysql performance hosting and WordPress MySQL.

February 6, 2026 No Comments

VPS performance analysis with CPU steal time and I/O wait times in virtualized servers

Servers and Virtual Machines

VPS performance analysis: Optimize CPU steal time and I/O wait times

VPS performance analysis: Optimize CPU steal time and I/O wait times in virtualized environments for stable hosting performance.

February 6, 2026 No Comments

Server with file system overload and inode limit problems

Servers and Virtual Machines

Why many web applications fail due to the file system: Inode limits and more

Why many web applications fail due to the file system: **filesystem bottleneck**, **inode limits** and **hosting performance** in focus. Causes & solutions.

February 6, 2026 No Comments