Servers and Virtual Machines

File system caching in Linux hosting: Understanding page cache correctly

The Linux page cache determines how quickly hosting workloads read and write files because it keeps frequently used data in RAM, thereby avoiding costly device accesses. I will demonstrate how this works. file system Caching in Linux hosting works, which metrics matter, and how I can manage the cache for everyday use without Server-Increase load.

Key points

Page Cache keeps file blocks in RAM and reduces latency.
Dirty Pages collect write accesses and write them back in batches.
LRUStrategies remove old entries for new data.
Monitoring with free, /proc/meminfo, vmstat, iostat provides clarity.
Optimization through RAM, logrotate, Opcache, and sensible limits.

What is the Linux page cache?

The Linux page cache stores frequently read file blocks in memory, thereby speeding up each subsequent access to Files. I benefit immediately because RAM accesses take microseconds, while even fast SSDs take milliseconds and are therefore significantly slower than Memory in RAM. When an application opens a file, the kernel stores the read blocks in the cache and serves future requests directly from the working memory. This works transparently for programs; I don't have to adjust or reconfigure anything. Hosting workloads such as web servers, PHP-FPM, image delivery, or log reading processes constantly hit the cache and save I/O.

How the cache works during reading

When a file is read for the first time, the system loads blocks into the cache and marks them as hot so that they remain available for repeated access and the Time is extremely short for the second request. If I read a 100 MB file twice in a row, the second pass comes almost entirely from RAM. The kernel uses strategies such as LRU (Least Recently Used) and prioritizes recently used entries so that current web content remains in the cache longer and cold data is removed. This logic fits well with hosting patterns, as many visitors repeatedly access identical images, CSS, and JavaScript files, which I can cache thanks to Cache The hit rate increases with the cache size, i.e., with the available RAM.

Writing and dirty pages explained

When writing, data first ends up in the cache as dirty pages, i.e., as modified blocks that the kernel has not yet written back to the disk and that I can access via writebackI can easily observe the behavior live: If I create a 10 MB file with dd, the dirty values increase until the kernel writes them to the SSD in one go. A manual sync forces the system to make the cache consistent and resets the dirty metric to zero. This bundling conserves I/O because it combines many small operations into larger transfers, thereby reducing the Performance per write operation. The modern per-device writeback approach keeps parallel disks independently busy and reduces waiting times.

Cache architecture: Dentry/inode vs. page cache

To complete the picture, it should be noted that Linux does not only cache file data. In addition to the actual Page Cache For content, there are Dentry and inode caches that store directory structures, file names, and metadata in RAM. They save on expensive path resolutions and inode lookups. In free -m these shares appear in the value cached also, while buffers I'm referring more to block device-related buffers. In /proc/meminfo, I can see a more detailed breakdown (e.g., dentries, inactive(file), active(file)). For hosting workloads with lots of small files, these metadata caches are essential because they further reduce the number of actual device accesses per HTTP request.

Reading key figures correctly

I first check free -m and note the columns for cached as well as the Mem and Swap lines in order to reliably evaluate the effect of the cache and determine the actual Use To understand this, I read values such as Cached, Dirty, Writeback, and Buffers from /proc/meminfo, which together provide a good picture of the memory status. vmstat 1 continuously shows whether the system is waiting due to I/O, and iostat adds details for each device. Crucially, Linux uses free RAM as cache but marks it as occupied for a short time, even though applications can reclaim it immediately if necessary. So I always evaluate the overall situation, including Workload and not just a single number.

Metrics	Source/Command	Meaning	Typical signal
Cached	free -m, /proc/meminfo	RAM share for file data	High value for frequent file access
Dirty	/proc/meminfo	Pages not yet written back	Rises during intensive writes, falls after sync
writeback	/proc/meminfo	Active write-back operations	Values not equal to zero during flush phase
bi/bo (vmstat)	vmstat 1	Block I/O in/out	Peaks indicate cache misses or flushes
r/s, w/s (iostat)	iostat -xz 1	Read/write operations per second	Jumps in misses, constant background noise ok

Advantages in everyday hosting

A well-filled page cache significantly reduces I/O wait times and shifts data access from the disk to RAM, which greatly reduces the latency of individual requests and improves the Response time from websites. Frequently used images, CSS, and HTML files remain in the cache so that the web server can serve them without having to go through the SSD. With heavy traffic, the hit rate counts: the more repeat visitors, the greater the benefit. In scenarios with high parallelism, the cache relieves the storage level and smooths out load peaks. For a deeper understanding of the relationships between memory, web, and proxy caches, it is worth taking a look at Caching hierarchies, so that I can use each level effectively and Resources Don't waste it.

Intelligently influence cache size

I influence the cache effect in two ways: more RAM and fewer useless file accesses, so that space remains free for hot data and the kernel can place the right blocks in the cache. Cache Logrotate with Gzip cleans up large log files, reduces the amount of files in memory, and prevents logs from displacing important web assets. I mark large one-time transfers such as backups or SQL dumps as less relevant whenever possible by processing them outside of peak times. I only use echo 3 > /proc/sys/vm/drop_caches to manually empty the kernel cache in tests, as this destroys the productive cache mix and the Latency increased temporarily. Ultimately, the amount of work is the deciding factor: the better it fits into the RAM, the more consistent the performance will be.

Direct I/O, fsync, and consistency

Not every access goes through the page cache. Some workloads open files with O_DIRECT or O_SYNC, deliberately bypassing caching or forcing immediate persistence. This is useful when you want to avoid double buffering (database buffer pool plus page cache) or when consistency is more important than latency. For web and media workloads, I usually stick with normal, buffered I/O because the hit rate prevails most of the time. It is also important to understand fsync: applications that frequently execute fsync on log files drive writeback cycles and can generate I/O spikes. I bundle such calls where possible or set application flush intervals appropriately to minimize the Throughput hold high.

Mount options: relatime, noatime, and others.

Every file access can update the atime (access time) and thus trigger additional writes. With relatime (now standard), atimes are only adjusted when necessary, which significantly reduces I/O. In pure web workloads where no atime-based logic is used, I often set noatime, to provoke even fewer write accesses. Also relevant in practice: appropriate block sizes, barrier defaults, and, if necessary, compression at the file system level, if the pattern and CPU headroom allow it. These mount options directly contribute to a higher cache hit rate because fewer unnecessary metadata updates are required. Memory-Paths are a burden.

Containers and cgroups: Page cache in multi-tenant operation

In container hosting, multiple workloads share the global page cache. Memory limits via cgroups define how much anonymous memory (heap/stack) is allowed per container, but the file cache is managed by the host kernel. If a container runs hot and reads many new files, it may displace cache pages from other containers. I therefore use memory and I/O controls (memory.high, memory.max, io.max) to smooth out load peaks and increase fairness. OverlayFS, which is often used in containers, adds additional metadata layers. This can affect path resolution and copy-on-write paths. I specifically measure whether overlay layers noticeably increase latency and consider bind mounts without additional layers for static assets.

Preheating and protecting the cache

After a reboot or major deployments, the cache is cold. I can target hotsets preheat, by reading high-demand assets sequentially once. This significantly reduces cold start latency in the first few minutes. Conversely, I avoid cache pollution: I read tools for backups, malware scans, or large sequential copy runs with low priority (nice/ionice) and, if possible, mark them as less important (DONTNEED) using Fadvise so that the pages disappear again after the run. This keeps the cache for web traffic focused on the really hot Data.

NUMA and large hosts

Memory locality plays a role on NUMA systems. The page cache is physically located in nodes, and remote access increases latency. I ensure consistent CPU and memory binding for services with heavy file access and check whether zone_reclaim_mode is useful. In practice, it often helps to bundle central web and PHP processes per NUMA node so that the hottest part of the cache remains local. At the same time, I monitor whether large Java or database processes displace the page cache due to their own memory requirements – if so, I scale RAM or separate workloads.

NFS and shared storage

Caching is trickier in cluster setups with NFS or similar network file systems. The page cache acts locally on the consuming host, while changes on another node must be invalidated via protocols. I therefore calibrate attribute caches and invalidation intervals so that consistency is maintained without generating too much I/O. For static web assets on shared storage, it is worth limiting revalidations and making deployments atomic (e.g., directory replacement) so that the cache is not unnecessarily cleared. Where possible, I replicate hotsets to the individual web nodes to maximize Hit rates to achieve.

Tmpfs and Ephemeral Data

For temporary, frequently read data such as session files, build artifacts, or short upload queues, I use tmpfs This allows me to completely eliminate device accesses and effectively make the page cache the primary storage level. However, I dimension tmpfs carefully: it uses RAM (and swap, if necessary), and tmpfs mounts that are too large can take up space from other caches. A regular cleanup process (e.g., systemd-tmpfiles) prevents data from accumulating and depleting the working memory.

Workload patterns: Small vs. large, sequential vs. random

The ideal cache behavior depends heavily on the pattern. Many small, frequently recurring files benefit most from LRU and a high proportion. Active(file). Large files that are read only once (backups, media transcodes), on the other hand, should not dominate the cache. I set read_ahead_kb moderately so that sequential readers become faster without inflating random accesses. On web servers with many static files, I activate zero-copy paths (sendfile, splice) to avoid copies in user space – the page cache then delivers directly to the socket, which saves CPU and smooths latency.

Extended observation and symptoms

In addition to vmstat and iostat, I take a look at reclaim statistics (e.g., active/inactive, pgscan/pgsteal via /proc/meminfo) to see if the system is aggressively reclaiming page by page. Frequent major page faults, rising IO wait values, and persistently high writeback times indicate that the cache is under pressure. In such phases, I first check whether I can reduce the workload or increase the RAM. If misses remain high, I segment the data (e.g., separating infrequently used archives and frequently used web assets) so that the LRU mechanism favors the right blocks.

Practical rules of thumb

I am planning the RAM so that hot sets (static web assets + active parts of databases) fit in 1–2 times. This doubles the chance of cache hits during traffic peaks.
I consistently avoid swapping: as soon as anonymous pages are outsourced, the pager competes with the page cache for I/O – latencies start to slip. I keep swappiness moderate.
I rotate log files more frequently, compress older generations, and ensure that chatty logs do not compete with web assets for space in the cache.
I group deployments that change many files into a few atomic steps. This way, I invalidate fewer cache entries at once and keep the Hit rate high.

File systems and cache accesses

The file system affects how efficiently the kernel stores and writes back data, which is why I am familiar with the properties of Ext4, XFS, and ZFS and adapt my choice to my workloads so that the Cache works best. Ext4 delivers solid all-round performance, XFS excels at parallel write loads, and ZFS comes with its own caching levels such as ARC. Depending on the pattern—many small files versus large media objects—the metadata and write paths behave differently. I measure real workloads before deciding on the platform. For a compact overview, I use the article Ext4, XFS, and ZFS compared and align settings such as mount options so that the kernel does not perform unnecessary Misses produced.

Databases, Opcache, and Page Cache

In MySQL and MariaDB, the InnoDB buffer pool handles the majority of data pages and indexes, while the page cache additionally accelerates file system blocks, thereby reducing overall I/O, which improves Query-Latencies reduced. I set up the buffer pool large enough to accommodate hotsets, otherwise the engine produces unnecessary hard disk accesses. For PHP applications, I combine Opcache for bytecode and APCu for application-related data, which reduces the pressure on the page cache. Static assets remain candidates for the file system cache and load at lightning speed. This layering avoids duplication of work and keeps the CPU free for dynamic components.

Monitoring and diagnosis

I monitor vmstat 1 for memory and I/O indicators in real time, check iostat -xz 1 per device, and look in /proc/meminfo for dirty, cached, and writeback so that I can quickly narrow down the causes and take targeted action. act can. A consistently high IO wait value indicates bottlenecks, which I first mitigate with caching and RAM. I then check whether the file system, RAID, or SSD firmware is slowing things down. If the IO wait remains critical, I analyze application accesses and caching hit rates. To get started with diagnostic paths, I find it helpful to Understanding IO Wait, to separate symptoms from causes and provide targeted Steps to derive.

Tuning parameters without risk

I only adjust a few kernel parameters and test changes in a controlled manner, because good defaults exist and small corrections are often sufficient to Efficiency vm.dirty_background_bytes determines the threshold at which the system begins asynchronous writing, while vm.dirty_bytes sets the upper limit for dirty pages. Setting these values in bytes instead of percentages provides a stable basis regardless of RAM expansion. In addition, read_ahead_kb influences the preloading of data per block device, which speeds up sequential reading but remains neutral for random accesses. I document all steps and quickly return to the original settings if there are any side effects. Values back.

Modern features explained briefly

Transparent Huge Pages (THP) can bundle file-backed pages into larger units, which reduces the management overhead per page and benefits the TLB when workloads are too large, contiguous quantities fit. In hosting environments with highly random access, I carefully check the effect, as benefits are not guaranteed. Persistent memory, on the other hand, promises very low latencies and opens up new data paths that partially bypass the classic page cache flow. I observe benchmarks here and weigh up whether the application actually benefits from the new memory classes. I run early experiments separately from the Live-Traffic.

Summary: What I take away with me

The Linux page cache speeds up hosting workloads by moving frequent file operations to RAM, thereby reducing latency, reducing I/O load, and improving Scaling improved. I measure meaningful values, recognize misinterpretations with free -m, and use /proc/meminfo, vmstat, and iostat to get a complete picture. With logrotate, sufficient RAM, sensible kernel limits, and PHP Opcache, I increase performance without risky interventions. I select file systems with access profiles in mind and monitor IO wait to alleviate bottlenecks in a timely manner. This allows me to keep recurring web accesses in the cache, relieve the load on the Memorylevel and deliver pages quickly.