Table of contents of the article:
Introduction: A cache that only apparently “doesn't work”
The request for advice came as often happens in our industry: a discussion between systems engineers, prompted by a problem that seemed already established.
A WooCommerce e-commerce, growing traffic, fluctuating performance and a now deeply rooted belief: Varnish does not cacha.
The infrastructure manager had already performed several superficial checks. The HTTP headers showed frequent cache MISSES, the header Age It remained consistently at zero, and the general impression was that the reverse proxy wasn't providing any real benefit. Hence the idea that the problem was the tool itself.
Our intervention was not intended to "disprove" anyone, but to do what we always believe is essential: observe the system as a whole, starting from the data and not from sensations.
The Real Stack: Right Architecture, Wrong Expectations
One of the first things we clarified was which stack was actually in use, because there was some confusion on this point too.
The architecture was as follows:
- NGINX as an HTTPS terminator, publicly exposed
- NGINX reverse proxy to Varnish
- Varnish as HTTP cache
- Varnish in reverse proxy to NGINX
- NGINX connected to PHP-FPM
- MariaDB as a database
This is an absolutely classic stack, which we also use in many production contexts. Nothing experimental, nothing intrinsically wrong.
Precisely for this reason, the idea that “Varnish doesn’t work” in such a scenario deserved a more in-depth analysis.
The application context: WooCommerce and heavy customizations
The e-commerce which was the object of the consultancy consisted of approximately 10.000 products. An important, but not exceptional, catalogue. It must be said that the technological basis was correct: the use of Varnish represents the golden standard An undisputed favorite for any WooCommerce architecture requiring high performance and stability under load. There is currently no more effective HTTP accelerator for handling the dynamic nature of WordPress at high volumes.
The theme, however, was heavily customized and had undergone numerous changes over time, especially on the front end, which had ended up eroding the benefits of this stack. Even before discussing caching, it was clear that the load wasn't so much due to the number of products but rather to the way the pages were generated. In particular, the HTML structure was unusually heavy.
As always, we decided to start with an objective measurement.
First tests: looking at HTML without filters
We made direct requests to product pages via curl, deliberately avoiding any compression negotiation (such as gzip or Brotli) or other transport optimizations. The goal was not to simulate the end-user experience, but to isolate and quantify the "dead weight" of the HTML generated by the application.
The result was immediate and unequivocal: approximately 1,3 MB of raw HTML code per single product page.
This value, in itself, is already an anomaly. But it becomes a critical issue when analyzed in the architecture of a high-performance cache such as Varnish.
Why 1,3 MB is lethal for Varnish
Unlike key-value storage systems like Redis — which can be configured to optimize data density or handle compressed structures — Varnish operates with a completely different architectural philosophy, based on pure I/O speed.
-
No RAM Compression: Varnish does not natively compress objects in memory to save space (allocation malloc). Its goal is to retrieve the content and return it to the network with the fewest CPU cycles possible. Saving 1,3 MB means occupying exactly 1,3 MB of RAM for each single page change.
-
Latency and Memory Access: Varnish is designed to handle hundreds of thousands of concurrent connectionsIn this scenario, direct access to memory (RAM) must be instantaneous.
-
Moving 1,3MB blocks of memory creates enormous pressure on the memory bus.
-
To keep the promise of a Time-to-First-Byte (TTFB) less than a millisecond, objects must be slim.
-
Bottom line: feeding Varnish 1,3 MB of pages is like turning a Ferrari into a loaded bus; the engine is powerful, but the mass it needs to move makes it impossible to achieve the performance it was designed for.
A key clarification: images were already excluded from the cache
During the discussion it was important to immediately clarify one point: the images weren't the problem.
As per good best practices, images were not cached by Varnish, but served directly as static assets by NGINX, possibly benefiting from browser caching or external systems.
This means that the observed weight was not due to JPEG, PNG, or WebP, but exclusively to the HTML markup.
This is far from a secondary detail, as we often tend to blame the images for a problem that actually arises from completely different structural choices.
Inline CSS and Repetitive Components: When HTML Explodes
Analyzing the source of the pages, it emerged that a significant part of the weight was due to:
- large blocks of Inline CSS
- complex menus and navigation structures injected directly into the HTML
- identical repetitive components on every product page
All content that, architecturally, should be externalized and reused, not duplicated thousands of times in markup.
This choice has a direct impact on any caching system: each URL produces a huge, unique object that takes up precious memory.
Do the math: cache is not magic
At this point in the consultancy, we've reached a simple yet often overlooked step: stop and do some calculations. No complex benchmarks, no sophisticated tools, just a basic calculation, one that immediately allows us to properly frame the problem.
If a single product page weighs about 1,3 MB of HTML alone and the catalog contains approximately 10.000 products, the theoretical volume of potentially cacheable HTML exceeds 13 GB. This is purely theoretical, of course, but it's extremely useful for understanding the scale of the system you're dealing with.
This doesn't mean that all pages should or can be cached simultaneously. Real-world traffic patterns are increasingly complex: some pages are requested very frequently, others rarely, and still others almost never. However, the number provides a crucial benchmark for understanding whether the resources made available are at least compatible with the potential load.
Given this scenario, Varnish's configuration included approximately 4 GB of RAM dedicated to the cache. This amount, taken out of context, might even seem adequate. But when put into this context, it becomes clear that it cannot accommodate a significant portion of the cacheable dataset, especially if the objects are so large.
This is where one of the most widespread myths falls: Cache is not magic. It doesn't automatically address application structural issues, doesn't automatically "optimize" excessively large content, and can't work around physical limitations of available resources. It works within well-defined boundaries, and when those boundaries are exceeded, its behavior remains correct, but the benefits are drastically reduced.
From this point on, the observed behavior—frequent cache MISSES, rarely increasing Age header, marginal benefits—becomes perfectly consistent. It's not the symptom of a misconfiguration or malfunction, but the direct consequence of an unbalanced relationship between object size, amount of content, and allocated memory.
And it is precisely at this moment that the problem stops being “Varnish that doesn’t work” and becomes what it really is: a problem of proportions.
What happens when Varnish runs out of memory?
When the Varnish cache reaches the limit of available memory, the process doesn't fail or "hang." Varnish is designed to operate under constant pressure and to autonomously manage cache saturation. What comes into play in these cases is its object eviction mechanism.
Internally, Varnish maintains a metadata-based cached object management structure that includes size, creation time, TTL, grace, and access rate. When the memory allocated to the cache is full, Varnish must free up space to accept new objects. It does this by applying its own eviction policies, which prioritize the elimination of less useful objects according to established criteria.
In practice, older or less frequently used objects are evicted to make room for new ones. This behavior is correct and expected, but it takes on a completely different meaning when the average size of objects is very large.
In a scenario like the one we've analyzed, each individual HTML object occupies a significant amount of memory. Consequently, even a few new requests are enough to trigger a continuous eviction cycle. A new object enters the cache, but to do so, it must "push out" one or more existing objects. The result is an extremely unstable cache, in which the content constantly changes.
From an operational point of view, this leads to a well-defined behavior:
- objects are initially cached
- pressure on memory is growing rapidly
- Newly cached objects are evicted after a few requests
- new requests find the cache empty or partially empty
This cycle repeats constantly, resulting in a constant stream of MISSES, not because Varnish isn't caching, but because it can't hold onto the items long enough to make them useful.
An important thing to understand is that, under these conditions, Varnish works harder than necessary. Each object is still evaluated, stored temporarily, indexed, and then removed. This generates additional overhead, but it doesn't produce a tangible benefit in terms of hit ratio.
From the point of view of someone observing the system from the outside, perhaps limiting themselves to the HTTP headers, the result is misleading. You see MISS responses, an Age header that rarely exceeds a few seconds, and the perception that the cache “never catches up.” In reality, the cache is working properly, but under conditions that nullify its effectiveness.
This is where a key point emerges: a cache isn't useful just because it exists, but because it can stabilize the load. When memory is insufficient compared to the size and quantity of cacheable objects, this stability is compromised. The cache becomes a temporary staging system, rather than a true acceleration layer.
Understanding this internal behavior is essential to avoid misdiagnosis. The problem isn't Varnish's algorithm, nor is it configured "wrongly" in any absolute sense, but the context in which it was implemented. Without consistent scaling, even the best caching algorithm is bound to produce disappointing results.
The problem was not the stack, but the sizing
One of the most interesting aspects that emerged during the consultancy was the sizing criterion of the VPS.
The machine was chosen mainly on the basis of the disk space, probably to host media and databases, while RAM was considered a secondary resource.
This approach may work on traditional hosting, but becomes critical when introducing an HTTP cache in RAM.
With Varnish on the stack, memory becomes a central resource, not an ancillary one.
In this case, the allocated RAM was not consistent with either the size of the pages or the number of URLs involved.
Possible ways: application or infrastructure
Once the causes were clarified, the consultation moved on to possible solutions.
There were essentially two options.
Intervene on the application
The most correct solution from an architectural point of view included:
- drastic reduction of inline CSS
- externalization of common styles
- simplification of menus and repeating components
- lightening the overall HTML
With lighter pages, the same amount of RAM could hold many more objects, making the cache stable and effective.
Adapt the infrastructure
The alternative was to intervene on the infrastructure:
- significantly increase the RAM dedicated to Varnish
- or set up a disk cache to extend overall capacity
This last option doesn't offer the same performance as RAM, but it allows Varnish to continue running consistently even with very large datasets.
Disk Cache: Not a Shortcut, but a Compromise
During the consultancy, we spent time clarifying a point that's often misunderstood, especially in contexts where infrastructure resources are limited: using disk storage for caching isn't a "defeat" or a poorly designed fallback solution. Rather, it's a conscious choice born from an analysis of the project's real-world constraints.
RAM cache remains, by definition, the ideal solution in terms of latency and performance. However, it's not always possible—or economically sensible—to allocate sufficient memory to hold the entire cacheable dataset, especially when the HTML pages are very large or when the number of URLs grows rapidly. In these scenarios, extending the cache to disk allows Varnish to continue to perform its role consistently, avoiding a constant cycle of insertions and evictions that renders the cache ineffective.
Of course, this is a tradeoff: persistent storage can't match the performance of RAM and introduces higher latency. But it's a tradeoff that needs to be considered in the overall context, considering the type of workload, the frequency of page accesses, and realistic performance expectations. The mistake is not using the disk for caching, but doing so without being aware of its implications.
The key is to align expectations with technical reality. When using persistent storage, you can't expect the same performance as volatile memory. But you can achieve stability, predictability, and a tangible improvement over a cache that, despite being in RAM, is constantly under pressure.
The real value of technical consultancy
The most significant moment of this experience was not so much the identification of the problem, but the change in perspective that occurred during the discussion.
Initially, the discussion revolved around a clear judgment: a tool that "doesn't work." As the analysis progressed, however, the focus gradually shifted from symptoms to causes.
Once the hard numbers—the actual page sizes, memory consumption, and Varnish's inner workings—were laid out on the table, the conversation changed tone. It finally became a technical discussion, based on objective data, not on perceptions or unmet expectations.
It was no longer a question of “Varnish not working”, but of concrete and measurable elements:
- the weight of the HTML generated by the application
- the impact of that weight on the RAM cache
- the development choices that had led to that structure
- the relationship between application, reverse proxy and available resources
And it's precisely at this stage that consulting brings real value. Not by imposing solutions, but by providing tools to understand the problem. Not by replacing those who manage the system, but by helping them interpret its behavior.
When the conversation shifts from judgments to indicators, from opinions to numbers, decisions become more robust. And it's then that we can make an informed choice whether to intervene on the application, the infrastructure, or both, knowing exactly what tradeoffs we're making.
Conclusion: first the numbers, then the decisions
Changing technology is often the easiest shortcut. It's an understandable reaction: when something doesn't work as expected, the temptation to replace it with a "more modern," "simpler," or simply different alternative is strong. However, in the vast majority of cases, this choice doesn't solve the root problem. It merely shifts it.
In the systems and infrastructure space, tools don't operate in a vacuum. Any software, especially those designed for high-performance like Varnish, delivers its value only when placed within a coherent context, designed with knowledge and sized based on real data. Without this foundation, even the best technology is destined to seem ineffective.
Before replacing a device, it's essential to understand its internal workings, its structural limitations, and the conditions under which it was designed to operate. In this specific case, the observed behavior wasn't a symptom of a malfunction, but rather the direct consequence of a series of application and infrastructure choices that, combined, had made the cache extremely unstable.
During this consultation, we didn't "defend" Varnish out of prejudice, nor did we claim it was the only possible solution. We simply did what we believe to be essential in any technical analysis: starting with the numbers. Actual page sizes, number of cacheable objects, available memory, and system behavior under load. All elements are measurable, verifiable, and objective.
And it's precisely when numbers enter the discussion that many beliefs begin to lose force. A cache that doesn't seem to work is often simply operating under conditions for which it wasn't designed. A reverse proxy that returns many MISSES is not necessarily misconfigured: It might simply be overloaded with objects that are too large for the available resources.
This experience reminded us, once again, how dangerous it is to think in technological slogans. "This stack is better," "this software is faster," "this system caches better." Without context, without numbers, and without a comprehensive view, these are statements devoid of any real technical value.
Optimizations that truly work, those that last over time and grow with a project, always stem from a deep understanding of the system. From the balance between application, infrastructure, and real-world load. Not from yet another stack change, but from the ability to read what the system is already saying.
And it's exactly in this space—between numbers and decisions—that technical consultancy can make the difference.