Servers and Virtual Machines

Swap Usage Server: Optimize performance in hosting

I will show you how to control the swap usage servers in a targeted manner so that hosting workloads do not stall under load and no performance trigger issues. I explain the causes, key figures, swappiness settings, size recommendations and practical tuning steps for memory swapping hosting.

Key points

Swappiness Reduce: Avoid aggressive outsourcing
Size check: Align swap to RAM and workload
IO protect: SSD placement, conscious use of Zswap/ZRAM
Monitoring establish: Page faults, kswapd, latency
Workloads customize: Balance cache and DB buffers

What swap really does - and when it slows you down

Swap extends the physical RAM by moving rarely used pages to SSD or HDD, and protects processes from the OOM killer, which helps me in emergencies. Buffer gives. Linux opportunistically offloads to give active pages more space and keep the page cache, but too much activity increases the IO-load. As soon as the system switches frequently between RAM and swap, there is a risk of thrashing and therefore noticeable latency. Especially with heavy web hosting with PHP, database and Node.js, the cache, PHP worker and DB buffer compete for memory. I therefore keep swap available as a safety net, but minimize its use in normal operation.

Recognize symptoms of high swap usage reliably

I check first free -h and vmstat, because high swap-in/swap-out rates indicate bottlenecks. If the rates remain low and RAM is free, the system usually works normally and only uses swap opportunistically. However, if the page fault rates and the IO queue increase, the application latency increases and requests become slower. In logs, I see evidence of busy workers and slow queries that occur at the same time as swap peaks. For more basics on virtual memory, I refer you to this compact introduction to virtual memory, which helps me with the classification.

Advantages and risks of memory swapping hosting

I use swap to cushion RAM peaks and keep critical services running, which in the short term Failure is avoided. This means that smaller VPS instances can manage with less RAM, which can reduce costs in euros as long as the IO load remains within limits. However, if too much is swapped out, SSD/NVMe clearly falls behind RAM and requests come to a standstill. In addition, compression (ZRAM) costs CPU time, which applications would rather use for real work. Swap is therefore not a replacement for me, but a safety net that I actively control.

Swappiness: the most important adjusting screw

The kernel variable vm.swappiness (0-100, default usually 60) controls how early the system offloads pages, and I reduce it to 10 for hosting workloads. Temporarily I test with sysctl vm.swappiness=10, I write permanently vm.swappiness=10 in /etc/sysctl.conf. On SSD hosts, this results in less swapping and more space for the page cache. I then monitor IO, latencies and working sets to confirm the effect. If the key figures remain stable, I keep the setting and document the change for later audits.

Optimum swap size for common servers

I adjust the swap size to the RAM, the workload and any hibernation, as I find files that are too large Memory and files that are too small reduce the buffer. For typical hosting servers without hibernation, I plan moderate values and prioritize more RAM over huge swap volumes. For scarce VPS instances, 1.5-2x RAM can be useful until real upgrading is possible. If you have plenty of RAM, you often benefit from smaller but available swap areas to avoid crashes. I use the following table as a starting point and adjust it according to measured values:

RAM size	Swap without hibernation	Swap with hibernation
≤ 2 GB	2x RAM	3x RAM
2-8 GB	= RAM	2x RAM
8-64 GB	4–8 GB	1.5x RAM
> 64 GB	4 GB	Not recommended

Swap placement and advanced techniques

I prefer swap files to partitions because I can dynamically adjust sizes and make changes faster. live go. If the swap area is on separate SSD storage, it competes less with the OS for IO. For very small VMs, I use Zswap or ZRAM as a test to reduce IO, but keep a close eye on CPU utilization. I limit overcommitment cleanly and set limits for services so that no process drives the machine into thrashing. In the end, what counts is a measurable effect: less latency, quieter IO and consistent response times.

Monitoring: which key figures really count

I measure RAM usage, page cache, swap in/out, the activity of kswapd and IO queues, because these values send me signals early on. If the swap movement increases, I correlate this with application latency and query times. I also check minor/major page faults to identify expensive memory accesses. To help me understand buffer strategies, this guide to Buffer and cache usage. Only when the metrics and logs show consistent pressure do I intervene and change the settings.

How the kernel selects pages: a deeper look into Reclaim

To tune specifically, I understand the internal lists: Linux distinguishes between anonymous pages (heaps/stacks) and file-supported pages (page cache). Both are attached to LRU lists (active/inactive). If the memory is under pressure, the kernel first tries to discard inactive, file-based pages (quickly, as they can be reloaded from disk). If too many anonymous pages are active, it has to move them to the swap - this is more expensive. A high vm.vfs_cache_pressure speeds up the discarding of dentries/inodes, which frees up space but can lead to more file accesses on web servers. I usually keep it around 50-100 and watch how the cache hit rate and latency change.

I influence writing paths via vm.dirty_background_bytes/vm.dirty_bytes (or the ratio variants). Dirty limits that are too high only postpone the problem and later generate large writebacks that slow down swap reclaim. I prefer byte-based limits as they work more precisely on large RAM systems. Another stopgap is vm.min_free_kbytesIf this value is set too low, the reclaim starts hectic cycles; too high, it wastes RAM. I usually leave this value at the distribution default, unless I consistently see „low free watermarks“ in dmesg.

PSI and kswapd: interpreting leading indicators correctly

In addition to classic metrics, I use Pressure Stall Information at /proc/pressure/memory. High some or full Values over several seconds show me that tasks are waiting for memory. This is often the first sign before users notice latency. At the same time, I look at the CPU time of kswapdIf it permanently rises above a few percent, Reclaim runs hot. With vmstat 1 I pay attention to si/so (swap-in/out) and r/b (run/block queue). Consistently high so-values together with growing b-cue indicate thrashing - then I consistently intervene.

Cgroups v2 and systemd: Deliberately limit swap

In multi-tenant or container environments, I prevent a single service from eating up all reserves. With cgroups v2 I set memory.max (hard limit), memory.high (soft choke) and memory.swap.max (swap limit). Under systemd I use per service MemoryMax=, MemoryHigh= and MemorySwapMax= in unit overrides. This means that PHP-FPM cannot drive the entire system into swap, while databases remain reactive. For bursts, a narrow memory.high plus moderate MemorySwapMax=, instead of risking hard OOMs. I document these limits for each service and keep them up to date in the review process.

Create, enlarge and prioritize swap files cleanly

In practice, I need quick, reproducible steps:

Create swap file: fallocate -l 8G /swapfile && chmod 600 /swapfile && mkswap /swapfile
Activate: swapon /swapfile; permanently via /etc/fstab with /swapfile none swap sw,pri=5 0 0
Adjust size: swapoff /swapfile, fallocate -l 12G /swapfile, mkswap /swapfile, swapon /swapfile
Multiple swaps: faster NVMe with higher pri prioritize; with swapon --show --output=NAME,PRIO,SIZE,USED control

On very IO-weak systems, I prefer to reduce the swap size or place swap on faster disks than allow the system to slowly „swap itself to death“.

Zswap and ZRAM: when compression really helps

Zswap compresses pages to be swapped out in the RAM-backed pool and thus reduces physical IO. This protects SSDs, but costs CPU time. For VMs with few cores, I first test lz4 (fast, weaker compression) and observe whether CPU peaks increase. I selectively replace ZRAM with classic swap on very small instances in order to remain almost IO-free - I plan more CPU for this. I deliberately keep compression small (e.g. 25-50% RAM for ZRAM) to avoid creating new bottlenecks. As soon as CPU-bound workloads start to stumble, I revise this optimization.

THP and fragmentation: hidden latency brakes

Transparent Huge Pages (THP) can help with JVMs or databases, but can also burden reclaim and swap in mixed hosting environments. I use THP on madvise, so that only workloads that explicitly use it benefit. In the event of noticeable memory fragmentation, I plan rolling restarts of memory-intensive services in order to clear up heaps that have been shot. For MySQL/MariaDB, I also check whether the InnoDB buffer pool is large enough in relation to the total memory so that the Linux page cache does not starve - duplicate caches cost RAM and drive up swap unnecessarily.

NUMA and multi-socket hosts

NUMA plays a role on larger bare-metal hosts. Unbalanced memory access increases latencies and accelerates reclaim. I distribute workloads across numactl --interleave=all or pin specific services per node. I keep critical services that trigger many page cache accesses (e.g. Nginx) close to the data paths; I encapsulate memory-hungry batch jobs and give them tighter cgroup limits if necessary so that NUMA overflows do not push the entire system into swap.

Process diagnostics: who really swaps?

When the system metrics sound the alarm, I identify the causes at process level: smem -knr shows me PSS/USS (realistic memory shares), pmap -x the segment distribution. In /proc//status I check VmRSS, VmSwap and oom_score_adj. High VmSwap-values for LRU-unfriendly patterns (many anonymous, little-used pages) are a candidate for limits or code optimization. I also use pidstat -r 1, to see fault rates per process and compare this with application latencies.

Runbooks, SLOs and escalation levels

I define clear limit values per host class, e.g.: kswapd-CPU < 5% in 5-minute average, major faults < 50/s/core in normal operation, PSI memory some < 10% over 60s. If two metrics are broken at the same time, I intervene in this order: check swappiness, temporarily throttle workers/buffers, adjust swap placement and priorities, (de)activate compression, increase RAM if necessary. These runbooks are part of my incident response so that teams can act in a reproducible manner and Latency remains under control.

Troubleshooting: typical causes and quick solutions

If the swap rates increase, I first check memory-hungry services and limit them with cgroups or service settings. I then check whether there are too many PHP workers, DB buffers that are too large or a page cache that is too small. I reduce swappiness, clean up temporary caches and move log rotations from peak times. If the IO queue remains permanently high, I relocate swap or reduce it to reduce IO contention. If this is not enough, I increase RAM and measure again until the latency remains stable at a low level.

Tuning for PHP, databases and Node.js

With PHP, I maximize full-page or OPcache hits so that less RAM is used for repeated compilation and thus Response time decreases. In MySQL/MariaDB, I balance the buffer pool and query cache against the page cache to avoid double caching. For Node.js, I set limits for heap and monitor garbage collection so that Event loop does not falter. I also prevent memory fragmentation through roll-outs that regularly restart services and detect leaks. A brief in-depth look at the Memory fragmentation helps to catch such creeping problems more quickly.

Containers and hosting stacks: practical examples

In container environments, I set a hard memory limit per pod or service and only allow a moderate amount of swap. For PHP-FPM, I calculate memory per worker (RSS) plus headroom for the page cache. Example: 512 MB RAM, 30 MB/worker real consumption - then 8-10 workers are realistic, not 20. For Node.js I set --max-old-space-size deliberately below the physical limit so that GC does not come under pressure and the kernel does not aggressively swap anonymous memory. For databases, I plan fixed budgets, separate them from the web tier if possible and give the OS enough space for file caches.

Costs, hardware and when to upgrade RAM

I calculate equivalent values in euros: If swap printing creates permanent latency, additional RAM quickly justifies the price and creates real Performance. NVMe reduces IO latency, but does not replace volatile memory. Before I expand hardware, I optimize swappiness, buffers and the number of workers to increase efficiency. If the workload remains high, I plan a RAM jump in sensible stages instead of just increasing swap. This sequence prevents bad investments and gives me clear measuring points for later comparisons.

Check: Swap Usage Server in 15 minutes

I start with free -h, vmstat 1 and check Swap-movement, page faults and IO queues. Then I set vm.swappiness=10, load sysctl and observe the key figures for five minutes. If it fits, I write down the setting and document the current status. In the next step, I correct worker counts and DB buffers that displace the page cache. Finally, I create alarms that warn me of outliers before users notice them.

Briefly summarized

I use Swap as a safety harness, but keep its usage low so that Latency does not explode and no performance issues occur. The biggest lever remains sensible swappiness, combined with a swap size that matches the RAM and workload. I monitor kswapd, page faults and IO queue, compare values with application logs and act early. For smaller VPSs, memory swapping hosting relieves pressure in the short term, while real relief comes with more RAM. Following this sequence will keep servers responsive, reduce downtime and protect budgets.