I'll show you how to Page Cache eviction and memory pressure in Linux so that your server responds reliably and quickly. I will explain the core mechanisms in the kernel, typical pitfalls in everyday hosting and concrete steps for monitoring, tuning and caching strategies with Practical relevance.
Key points
- Linux page cacheTransparent caching of file blocks in RAM reduces IO accesses.
- Memory printing: Scarce RAM forces evictions, swapping and can trigger OOM.
- Eviction strategiesVariants of LRU prioritize frequently used pages.
- Multi-tier cachesKernel, storage and app caches influence each other.
- Tuning & MonitoringRead key figures, test parameters, avoid thrashing.
How the Linux Page Cache works
The Linux kernel keeps frequently read file blocks as pages in RAM so that read accesses come directly from memory and not from Block devices [9]. This mechanism acts transparently: applications do not need to be adapted because the kernel decides what stays in the cache and what moves away, which means the Cache hit rate increases. Free RAM does not remain unused, it serves opportunistically as a cache and thus increases the responsiveness of running services [9], which I specifically plan for web servers and APIs. When accessing the same files again, I save waiting time because the kernel delivers the data from the RAM and reduces expensive device accesses, which reduces the Latency presses. For a deeper introduction to mechanics and opportunities, this clear guide to the Linux page cache, which I like to use as an accompaniment.
Understanding memory pressure and recognizing it early
Tight RAM generated Memory printingThe kernel registers the shortage and clears the cache, writes back changed pages and accesses swap if necessary [9]. I keep a close eye on when evictions start to increase, because overly aggressive evictions increase the IO load and response times fluctuate, which can affect the User Experience clouds. Heavy pressure increases the risk of OOM killer events that terminate processes and interrupt services, which is why I plan reserves and warning thresholds before bottlenecks escalate [9]. If telemetry shows persistently high swap in/out rates and IO wait, I increase RAM capacity or reduce application caches to give the kernel breathing room for the page cache, which reduces the Resilience lifts. This prevents spontaneous load peaks from turning into endless write-back and swap cycles and hindering productive workloads [9].
Eviction mechanisms in the kernel: LRU and friends
In eviction, Linux uses strategies that are variants of LRU are similar: Frequently used pages remain, rarely used ones give way first [9]. Unmodified pages can be discarded immediately, while modified (dirty) pages first flow to the storage medium before the kernel releases them, which makes the Write latency influenced. Pages move between lists depending on how often processes read or modify them, and under pressure the kernel accelerates this cycle so that running tasks receive memory [9]. It becomes critical when freshly loaded data is immediately displaced again: This thrashing costs performance and leads to repeated device accesses, which eat up time and Jitter are generated. I can counteract this by limiting memory-hungry processes, fine-tuning dirty writeback parameters and keeping warm data sets in memory so that hot data remains present for longer and the IO curve is smoother.
Interaction of kernel cache, storage caches and app caches
Several caching layers work together: The kernel holds file blocks in RAM, RAID controllers or SAN systems buffer underneath, and object caches or Buffer pools [9]. I measure the effect of each level separately, because an app cache that is too large takes the kernel's breath away and thus weakens the file cache, which can increase the overall latency. Conversely, too fast eviction in the page cache forces the storage system to make frequent accesses, although hot data could well remain in memory with a little more RAM, which would increase the overall latency. IO load would be reduced. The goal is a balance: application caches large enough for clear effects, but not so large that the kernel has to fight for every megabyte. Especially with data-intensive workloads, I rely on measurements per layer, because assumptions about the distribution and use of caches are often misleading and the wrong adjustment screw is touched.
File system and mount options: Influence on caching and latency
File systems and mount parameters determine the speed at which the kernel stores metadata and writes back pages. relatime is now standard and significantly reduces atime updates; for intensive scan jobs, I specifically use noatime, to save unnecessary metadata writes. lazytime delays the writing of timestamps in inodes, which smoothes peaks without breaking semantics. I stay on ext4 by default data=ordered, because it provides clean consistency with reasonable latency; risky options such as disabled barriers (nobarrier) if the substructure does not have a secure write cache battery. XFS and ext4 behave slightly differently with metadata caching; with many small files I can feel the effect in the Dentry- and inode-caches directly - this is where vm.vfs_cache_pressure directly. On SSDs I use discard rather asynchronous or via periodic fstrim-jobs so that I don't introduce latencies with every delete. With NFS, I pay attention to attribute caching parameters so that I don't oscillate between staleness and unnecessary IO; metadata caches in VFS keep directory and lookup operations noticeably fast [9].
Everyday life of a web server: warmup, load peaks, backups
After a deploy, the Page cache cold, many initial accesses hit the devices and only then build up heat paths. As soon as enough requests have loaded the frequently used files, the cache takes effect and response times normalize noticeably, as long as enough RAM remains available to hold hot data. Load peaks caused by campaigns, cron jobs or reports put pressure on the memory and trigger evictions, while parallel backups with sequential reads reload cold data and displace hot data, which I take into account in the plan. Warmup routines that specifically touch assets and frequent endpoints are helpful, so that the cache sits before the peak times, resulting in visible Latency peaks reduces. With shared hosts, I isolate memory-intensive tasks in terms of time in order to distribute the pressure and reduce mutual interference from thrashing services.
Avoid read-ahead, direct I/O and cache pollution
Sequential readers benefit from Read-ahead, random patterns suffer as a result. I check the value for each device read_ahead_kb and set it higher for clearly sequential jobs and lower for random-heavy workloads. For full backups and large scans, I avoid cache pollution: Tools with O_DIRECT-support or posix_fadvise(DONTNEED) prevent gigabytes of cold data from pushing hot data out of the cache. If the application cannot use direct I/O, I at least limit the priority (ionice, nice) or use cgroups to regulate the IO throughput so that web traffic continues to benefit. Manual emptying via drop_caches I only use it in maintenance windows and only after a sync, because uncoordinated triggered flushes generate exactly the latency peaks that I want to avoid. For database exports, it has proven useful to stream reads and to create pages with FADV_SEQUENTIAL to announce - this is how the kernel adapts the read-ahead strategy accordingly [9].
Monitoring: key figures that I always keep an eye on
With clean monitoring I recognize Memory printing early: I check the RAM used, available memory, proportion of the page cache and the relation to application caches. I also monitor swap usage, swap in/out rates, IO wait, physical read/write accesses and the error rate of requests in order to clearly separate cause and effect before I make any adjustments. Time series show me whether bottlenecks only occur at peak times or are permanent, and whether configuration changes are actually taking effect, which is what the Decision for tuning or capacity. I correlate deploy times, backup windows and traffic peaks with eviction and IO peaks to visualize patterns and validate planning. Without this view, optimization is flying blind, so I invest in alerts with meaningful thresholds rather than frantic ad hoc responses.
Tools and diagnostic paths for emergencies
When latencies increase, I open first /proc/meminfo and check MemAvailable, Cached, Buffers, Active(file), Inactive(file), Dirty and writeback. Then deliver /proc/vmstat and vmstat 1 the dynamics: pgfault/pgmajfault, pgscan/pgsteal, kswapd-activity and workingset_refault show me whether hot data is falling out. With iostat -x 1 I recognize device saturation and queue depths, pidstat -r -d reveals who is eating file-backed RAM. slabtop helps to detect oversized slabs (dentries/nodes) when vm.vfs_cache_pressure is set too low. Particularly valuable is /proc/pressure/memory (PSI): Persistently high some- and full-values correlate directly with noticeable system inertia - ideal for sharpening alarms and configuring systemd-oomd sensibly.
Kernel tuning: swappiness, vfs_cache_pressure and dirty writeback
The Linux parameters give me flexible levers to Evictions and writeback, but I test changes carefully in steps. vm.swappiness determines how much the kernel pushes pages into the swap: low values keep the page cache longer, high values relieve RAM at the expense of possible swap latency, which I can see from the Workloads vm.vfs_cache_pressure controls how intensively inode and dentry caches are cleared, which keep file system metadata quickly available and accelerate directory accesses. dirty_background_ratio and dirty_ratio define thresholds for asynchronous and forced writing so that changed pages are sent to the medium in good time and memory peaks do not tip over into forced flushes. I provide a solid overview in the following table, which bundles the effects and notes:
| Parameters | Low value | High value | Practical note |
|---|---|---|---|
| vm.swappiness | Swap is used late | Earlier swapping | Often set rather low for IO-sensitive web servers; measure load |
| vm.vfs_cache_pressure | Metadata stays longer | Faster evacuation | Keep lower if many small files need to be quickly accessible |
| dirty_background_ratio | Earlier asynchronous writing | More dirty pages | Flush peaks too high; select moderate |
| dirty_ratio | Forced flushes less frequent | Larger forced flushes | For even writeback-Adjust the center of the curves |
For a deeper understanding of how paging and swapping shape real-world performance, it is worth taking a look at Memory paging, so that I can sensibly weigh up IO costs against cache range. I validate every change with load tests and a rollback option, because workloads react differently and the balance between memory, IO and latency remains sensitive. Without structured measurements, I risk side effects that immediately relativize the supposed gains and create new bottlenecks.
Swap strategies: Zswap, ZRAM and fast NVMe
Swap is not an enemy, but a tool - in the right amount. Zswap places a compressed front page in front of the swap and thus reduces IO, which helps noticeably with short-lived cold pages. ZRAM provides swap in RAM, highly compressed; this is useful on small instances to dampen OOM spikes without hitting disk. Note the CPU overhead: On heavily utilized cores, aggressive compression can shift latency. If real swap is on NVMe, I change vm.swappiness more moderate, because the penalty is smaller - nevertheless: permanent swap-in/out waves are a symptom of insufficient RAM or excessive app caches [9]. For writeback, I prefer to use the byte variants (dirty_bytes, dirty_background_bytes) when RAM fluctuates greatly; this way I prevent percentage values from leading to huge flushes with large amounts of memory.
Application-related caches: size, benefits, side effects
Accelerate HTTP page caches, object caches such as Redis/Memcached and database buffer pools Applications noticeable if I size them correctly [9]. Caches that are too large displace the kernel page cache, increase memory pressure and force the kernel to perform frequent evictions, which slows down the entire IO pipeline and inflates response times. I start conservatively, measure hit rates, latencies and RAM pressure, and only then expand to ensure real gains instead of just consuming memory, which slows down the Efficiency lifts. In CMS and web apps, a well-set page cache significantly reduces the number of dynamic generations per request, which relieves the CPU and IO and indirectly reduces memory pressure [2][9]. In the end, it's the sum that counts: only when the kernel cache and app caches fit together is a smooth flow created that avoids peaks and delivers constant response times.
Practical guidelines for hosting setups
I plan sufficiently RAM not only for process memory, but deliberately with a reserve for kernel and application caches so that hot data can remain in memory. I optimize caches in a coordinated manner instead of to the maximum: database buffer pools, object caches and the kernel page cache are each given enough space so that they work together without slowing each other down. For me, good monitoring is part of operation: I continuously track memory pressure, swap activity, IO wait and error rates in order to quickly detect creeping deterioration and initiate countermeasures. I know load profiles from logs and APM data so that I can time backups, batch jobs and traffic peaks, which means that hard overlaps occur less frequently and the Availability increases. If a project grows, I scale horizontally or vertically before the pressure remains permanently high and optimization at the limit only shifts symptoms.
Containers and Cgroups: Memory limits and protection against global OOMs
In containers, the cgroup v2-configuration twice: File-backed pages are assigned to the cgroup of the reading process, so I set sensible limits and thresholds. With memory.max I prevent runaways, memory.high throttles back early and gives the system time to clean up, memory.swap.max limits swap usage so that a single pod does not flood the disk. I protect critical services with memory.low respectively memory.min, so that their cache shares are not immediately cleared when neighbors push. Combined with PSI-based mechanisms (e.g. systemd-oomd), containers can be specifically terminated before the host has to thrash - the overall platform remains stable. In Kubernetes, it pays to choose requests/limits realistically and plan node reserves so that the kernel always has room for the page cache.
When eviction becomes a real problem
Eviction is part of the Normal operation, but signals such as frequent reloading of identical files, persistent IO peaks and fluctuating response times indicate thrashing and insufficient cache protection. I first check the relationship between RAM, app cache sizes and the actual amount of work, because overcrowding in Redis, JVM heaps or DB pools takes the air out of the kernel and accelerates displacement. If backups or full scans read large amounts of data sequentially, this pushes hot data out of the cache; I then relocate these jobs, use I/O throttling or isolate them so that productive traffic does not suffer and the Hit rate stays up. If the telemetry indicates recurring patterns, I test kernel parameters in small steps to adjust writeback smoothing and metadata cache retention times. If that's not enough, I increase RAM or split workloads, because constant pressure ends up costing more than a clear capacity decision.
Summary and next steps
The most important levers for me are Understand, Measure, adjust. I get to know the access patterns of my workloads, measure cache hit rates, IO wait and swap movements and then adjust cache sizes and kernel parameters until eviction and writeback run smoothly. In virtualized environments, I keep mechanisms like Memory Ballooning because dynamic RAM allocation influences the page cache range and can therefore push performance. I then verify successes with load tests before rolling out changes widely, so that there are no surprises and the Latency remains consistent. Maintaining this cycle regularly keeps memory pressure manageable, protects the page cache from thrashing and delivers reliable response times - exactly what users expect and makes projects predictable.


