10 May 2026

Protect yourself from AI bots, unwanted crawlers, and automated hacker attacks through browser fingerprinting.

How browser fingerprinting helps distinguish real users from automated bots, reducing aggressive scraping, unwanted crawling, and application attacks on websites.

Browser Fingerprinting

Over the past twelve months, as a Linux hosting and systems engineering company specializing in the web performance of major CMSs, we have witnessed an increasingly evident phenomenon: automated traffic to websites has increased significantly, often suddenly, and in some cases with very serious impacts on customer infrastructures.

It's no longer just the classic search engine bots, SEO crawlers, or automated scanners that have been scouring the internet for known vulnerabilities for years. Today, the landscape is much more complex. Traditional crawling has been joined by a new wave of bots and scrapers on a global scale, largely motivated by the massive collection of data to train, update, or enrich new Large Language Models.

The explosion of generative artificial intelligence has made public web content an invaluable resource. Blogs, forums, technical documentation, e-commerce sites, product descriptions, fact sheets, knowledge bases, reviews, editorial portals, and company websites have become attractive targets for anyone looking to collect large amounts of text, structured data, or industry-specific information.

In some cases, this crawling is done transparently. Some well-known operators, such as OpenAI or Anthropic, use declared and documented user agents. A crawler may present itself, for example, with a user agent such as GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot or similar, allowing the site manager to recognize it and apply rules via robots.txt.

This is the correct approach: the bot announces itself, declares its identity, and allows the webmaster to decide whether to allow or deny access to certain sections of the site. The directives Allow, Disallow or any rate limiting indications, where supported or respected, at least allow for the establishment of a clear relationship between those who publish content and those who acquire it.

Unfortunately, not everyone behaves this way.

We're increasingly seeing crawlers that don't even disclose their automated nature. Instead, they masquerade as real users using seemingly legitimate user agents copied from the most common browsers: Chrome, Safari, Firefox, Microsoft Edge, Opera, DuckDuckGo Browser, mobile browsers, WebView Android, WebView iOS, and so on.

http traffic crawler

In many cases, these bots originate from cloud infrastructures, foreign providers, proxy networks, residential IPs, or network classes that are difficult to accurately attribute. Sometimes the traffic comes from Asian countries, sometimes from European or American datacenters, sometimes from seemingly domestic networks. The goal is always the same: to blend in with legitimate traffic and make it difficult to block using simple rules.

The underlying problem is that the user agent, by itself, no longer has any reliable value. It's a string that can be freely set by the client. A bot written in Python, Go, Rust, Node.js, or any other language can easily declare itself as a normal Chrome on Windows or Safari on macOS. But that doesn't mean that behind that request there's actually a user sitting in front of a monitor, with a real browser, a real GPU, a real viewport, a real graphics environment, and real human behavior.

When crawling becomes a performance issue

On a well-configured site, such as a blog with efficient page caching, even mass crawling can be relatively load-neutral. If requests are served by Varnish, Nginx FastCGI Cache, an application cache, or a CDN, the impact on PHP, CPU, RAM, and databases can be minimal.

A bot that downloads already cached pages generates traffic, consumes bandwidth, fills up logs, and can alter statistics and metrics, but it rarely causes the application backend to crash. In these cases, the problem exists, but it's often manageable or completely ignoreable because it doesn't cause any problems.

The situation changes radically when the site isn't adequately cacheable or when we're dealing with a complex e-commerce site. WooCommerce, Magento, PrestaShop, and similar platforms are much more delicate than a static or semi-static blog. A product catalog can generate thousands, tens of thousands, or even millions of different URLs, especially when faceted navigation is involved.

faceted-navigation

Faceted navigation allows users to filter products by color, size, price, brand, availability, material, category, attribute, sorting, pagination, and other criteria. From a user experience perspective, this is often an essential feature. From a security and performance perspective, however, it can become a huge attack surface.

An unauthorized crawler can begin following endless combinations of filters and parameters. It can request ever-changing URLs, often not cached, forcing the backend to dynamically generate each individual page. The result is predictable: PHP-FPM begins to saturate, MySQL or MariaDB receives increasingly heavy queries, the CPU increases, RAM is consumed, the load average increases, and the site's response time deteriorates dramatically.

In these scenarios, the passage of a bot is no longer a simple crawling activity, it becomes application flooding. We're not always dealing with a volumetric DDoS in the traditional sense, but the end effect is very similar: the site slows down, checkout becomes unstable, real users are unable to navigate, conversions plummet, and customers perceive the service as unavailable.

The paradox is that malicious or unwanted traffic often doesn't immediately appear as such. In the logs, it may look like browser traffic. The user agent declares Chrome. The IP isn't necessarily blacklisted. The requests are formally valid HTTP requests. There are no obviously malicious payloads. There's no recognizable exploit. There's simply an abnormal number of requests, spread across many URLs and capable of generating real load.

The traditional approach: more hardware, more optimization, more blocks

When a site begins to suffer from these phenomena, the first reaction is often to increase hardware resources. This involves switching to more CPUs, more RAM, faster disks, a separate database, a larger server, more expensive cloud instances, or more complex architectures.

In some cases, this is necessary. If a site has truly grown, if legitimate traffic has increased, if sales volume justifies a more robust infrastructure, scaling is the right thing to do. But when the root cause of the load is unwanted automated traffic, increasing resources risks simply becoming a more expensive way to fuel the problem.

After hardware upgrades, the optimization phase usually comes. We analyze slow queries, improve database indexes, reduce unnecessary calls, and address cumbersome plugins, inefficient modules, slow templates, incorrect configurations, and missing or poorly implemented caches. This, too, is a matter of course and is part of our daily work.

However, optimizing an application to better serve real users is one thing. Optimizing it to handle hundreds of requests per second generated by unwanted bots is another. In the latter case, the issue isn't just performance, but traffic control.

At this point, we often turn to firewalling, IP blocking, rate limiting, and geoblocking systems. Entire network classes, suspicious ASNs, countries from which customers shouldn't be arriving, or geographic areas deemed irrelevant to the business are blocked. This approach can work when the site only sells to certain markets and the malicious traffic clearly comes from countries not served.

But it's not always like this.

If an e-commerce site sells in Italy and the malicious traffic comes from abroad, geoblocking may be an acceptable solution. But if both legitimate and malicious traffic come from the same country, blocking the entire nation becomes impossible.If bots use residential IPs, distributed proxies, or infrastructure located in the same markets as real customers, geoblocking becomes too crude a tool.

In these cases, a more surgical filter is needed. It's not enough to ask where the request is coming from. You need to start asking yourself what the client generating it actually is.

AI bots and hacker scanners: different purposes, similar methods

The same problem, in various forms, has existed in the field of application security for years. Whenever a critical vulnerability is published in a WordPress plugin, WooCommerce module, PrestaShop extension, Magento component, or popular CMS, large-scale automated scans begin.

Attackers know that many sites aren't updated promptly. They know that time can pass between the publication of a flaw and the application of a patch. They know that there are thousands of exposed installations, often poorly managed, with abandoned plugins, outdated themes, or unmaintained modules. That's why they develop automatic scanners capable of searching for vulnerable versions and, when possible, immediately exploiting the flaw.

The purpose of these scanners is obviously different from that of AI crawlers. The former involves data collection, scraping, or mass content acquisition. The latter involves compromise, exploit, webshell loading, data theft, spam, defacement, or unauthorized access. The severity of the offenses varies.

hacker nmap firewall

But from a technical point of view, there is a fundamental common element: in most cases, these systems do not use real browsers.

They use automated HTTP clients. These can be scripts based on curl, Python modules like requests o aiohttpAgents written in Go, Rust, Node.js, or Java. They can rename the user agent, spoof headers, simulate cookies, follow redirects, and even handle some rudimentary responses. But in most cases, they aren't real browsers.

They don't have a full graphics engine. They don't have a real GPU. They don't have plugins. They don't have a history. They don't have a user moving the mouse, clicking, scrolling, switching tabs, interacting with the page. Often they don't even have a JavaScript interpreter. They simply send GET and POST requests in a cadenced, systematic, and massive manner.

Even when we look at more advanced solutions based on headless browsers like Chromium, controlled by Puppeteer, Playwright, or Selenium, there's still a significant difference compared to a real browser used by a person. A headless browser can run JavaScript, but it often exhibits distinctive features. It may have anomalous properties, uniform fingerprints, artificial viewports, software GPUs, the absence of plugins, predictable behavior, or inconsistencies between the declared and real environments.

And this is where browser fingerprinting comes in.

What is browser fingerprinting?

Browser fingerprinting is a set of techniques that collect signals from the browser and its execution environment to build a technical profile of the client. This profile can be used for many purposes: fraud analysis, security, bot detection, scraping prevention, abuse protection, anomaly identification, or consistency checking between what the client claims and what it actually is.

In the context of anti-bot protection, fingerprinting isn't necessarily used to identify a person. Rather, it helps determine whether the client making the request resembles a real browser or an automated environment.

WAF-Browser-Fingerprinting

The question is not: “Who is this user?”

The correct question is: “Does this client behave and present itself as a real browser or as an automated process pretending to be one?”

This distinction is fundamental. A modern protection system doesn't necessarily have to block everything it doesn't recognize. It must assign a risk level. It must observe signals. It must combine them. It must decide whether to accept the request, block it, limit it, challenge it, or monitor it more closely.

A single test can produce false positives. A properly weighted set of tests can, however, provide a much more robust indication.

The most useful techniques for distinguishing real browsers from bots

A modern WAF, especially one designed to distinguish real users from automated bots, can use numerous client-side signals. Some are simple, others more sophisticated. No single technique is perfect, but combining multiple techniques can create a highly effective profile.

One of the best known controls is the one on navigator.webdriverThis property is often set when a browser is controlled by automation tools like Selenium, Puppeteer, or Playwright. If the flag is set, this is a very strong automation signal.

Another control concerns the browser languages, available via navigator.languagesIn real browsers, it's common to find an array consistent with the user's settings, such as Italian, Italian-Italy, English, or other combinations. In headless browsers or artificial environments, this value may be absent, empty, or reduced to a single generic value.

Also the number of installed plugins, readable via navigator.plugins.length, can provide interesting information. Modern real-world browsers often expose some built-in plugins or components, while many headless environments do not. This isn't a definitive indicator, as browser evolution has greatly reduced the role of traditional plugins, but it remains useful as a correlation.

The color depth, obtainable from screen.colorDepth, can reveal anomalies. Values ​​that are too low, zero, or inconsistent often indicate unrealistic or poorly configured rendering environments. Similarly, screen resolution, viewport-to-screen ratio, device pixel ratio, and other graphics parameters can help distinguish a real device from a simulated environment.

Browser Fingerprinting Techniques

The time zone, which can be obtained via Intl.DateTimeFormat().resolvedOptions().timeZoneThis is another valuable piece of data. If an IP address appears to be geolocated in Italy but the browser displays an incompatible time zone, this doesn't automatically mean we're dealing with a bot, but it's certainly a sign worth considering. VPNs, travel, and unusual configurations do exist, but with large volumes, the inconsistencies become statistically significant.

The navigator.maxTouchPoints This can be useful. A real mobile device often has multiple touch points, while an emulated environment may declare values ​​that are absent or inconsistent with the user agent. If a client presents itself as an iPhone but doesn't expose features compatible with a touch device, something isn't right.

navigator.hardwareConcurrency, which indicates the number of available logical cores, can highlight suspicious values. A value of zero, absent, or excessively high may indicate an artificial environment. The same applies to navigator.deviceMemory, when available, which provides an estimate of the device's memory. Many headless browsers or automated environments report missing, generic, or implausible values.

Among the best-known techniques is canvas fingerprinting. An image is generated on a 2D canvas element by drawing text, shapes, colors, and transformations. The result is then converted into a measurable representation. Each combination of operating system, graphics engine, fonts, GPU, and drivers can produce small differences. Headless browsers, especially if standardized and replicated at scale, tend to generate more consistent and predictable output.

A similar argument applies to WebGL. By querying certain renderer properties, when available, you can obtain information about the GPU or software renderer being used. Headless browsers or virtualized environments may expose renderers like SwiftShader, Mesa, or other software backends, rather than a hardware GPU consistent with the declared device.

Then there are specific controls for older automation frameworks. PhantomJS, for example, can leave global objects as window.callPhantom o window._phantom. Nightmare.js can expose window.__nightmareLegacy versions of Selenium may leave traces such as document.__selenium_unwrapped, window.__selenium_evaluate o document.__webdriver_evaluate.

Another set of checks concerns the presence of Node.js or JSDOM environments. In a real browser, global objects like process o Buffer, which are typical of server-side or simulated environments. If these elements are available in the context of the page, we're likely dealing with a non-browser environment.

Other signals can come from the Permissions API, the status of notifications, the presence of window.chrome, audio API behavior, font handling, WebRTC support, media codecs, available sensors, consistency between user-agent and client hints, cookie handling, JavaScript execution time, and client behavior after the first visit.

The value is not in the single test, but in the overall score

The important point is that none of these tests should be used blindly and completely. Blocking a user simply because a single parameter appears abnormal can generate false positives. These include special browsers, privacy extensions, corporate configurations, embedded devices, WebView, in-app browsers, restrictive modes, anti-tracking systems, and real users with non-standard environments.

For this reason, the most effective approach is the scoring one.

Each test contributes to a risk score. Some signs are very serious, others only weakly suspicious. For example, navigator.webdriver An active one can be quite significant. An inconsistent time zone can be less significant. The absence of plugins can be a minor red flag. A software WebGL renderer can be more significant when combined with other elements. The presence of legacy PhantomJS or Selenium objects can be considered very serious.

At that point the WAF can make progressive decisions.

A client with a very low score, for example, less than 20, can be considered legitimate. In this case, it can be automatically accepted and, if appropriate, issued a signed cookie that allows it to be recognized on subsequent requests without having to continually re-verify.

A client with a very high score, say close to 100, can be blocked immediately. If it fails virtually all tests and shows clear signs of automation, there's no point in wasting backend resources.

The intermediate cases are the most interesting. A score of 50 or 60 may indicate a questionable client: not clean enough to be automatically accepted, but not serious enough to be blocked without appeal. In these cases, a more in-depth challenge can be applied: a JavaScript interstitial, a behavioral check, an interaction request, a human checkbox, or a method similar to those popularized by “Under Attack” systems.

This progressive model is much more effective than the binary block. It doesn't divide the world into "good" and "bad" based on a single header, but builds a dynamic and adaptive evaluation.

Why block before cache and before backend

An often overlooked aspect is the point in the infrastructure where the filtering is performed. If the bot is blocked after reaching PHP or the database, the damage has already been partially done. Even a 403 response generated by the application consumes resources. Even a WordPress security plugin, if run within WordPress, requires CMS bootstrapping, PHP loading, database queries, and memory usage.

For this reason, the most effective protections must act as far upstream as possible.

Ideally, the filter should intervene at the reverse proxy, WAF, Nginx, OpenResty, Varnish level, or in any case before the request reaches the application backend. Blocking an unwanted request in less than a millisecond, without involving PHP-FPM or querying MySQL, is vastly more efficient than leaving the CMS to decide.

This principle is especially important on e-commerce sites. If a scraper is generating thousands of requests to uncached product pages, each request intercepted before the backend represents CPU savings, avoided queries, and preserved response time for real users.

A well-designed anti-bot WAF must not become a bottleneck itself. It must be fast, lightweight, deterministic for the most frequent operations, and capable of making decisions based on signed cookies, blacklists, whitelists, scores, and behavior rules.

Our experience with WAF 4 NGINX

In our recent experience we have developed internally a proprietary WAF, AntiBot and AntiCrawler system called WAF 4 NGINX, not to be confused with the commercial application of the same name historically associated with F5 Networks and now known by another commercial name “App Protect”

WAF-4-NGINX

The goal was clear: Having a proprietary, on-premise, controllable solution that integrates with our infrastructure and is conceptually similar to some of the features popularized by Cloudflare, but without forcing the customer to change nameservers, delegate traffic to third parties, or adopt an external service that is not always compatible with organizational, bureaucratic, or GDPR constraints.

Many customers can use Cloudflare without any issues. Others, due to internal policies, contractual requirements, data sensitivity, compliance constraints, or infrastructure choices, prefer or need to maintain full control over the request path. In these cases, local protection, integrated directly into the Nginx reverse proxy, represents a significant advantage.

WAF 4 NGINX is designed with multiple operating modes. It includes whitelisting and blacklisting systems, temporary bans, custom filtering rules, request behavior controls, JavaScript challenges, transparent modes, interstitials, and more invasive verification such as the human checkbox.

WAF-4-Linux-custom-rules

Transparent mode is particularly interesting because it allows you to carry out checks without displaying banners, logos, intermediate pages, or communications visible to the user. The real visitor continues browsing normally, while the system collects signals, evaluates the fingerprint, assigns a score, and decides whether to issue a signed HMAC cookie certifying the successful verification.

Interstitial mode, on the other hand, is useful in the most critical cases, when the site is under attack, undergoing aggressive crawling, or when anomalous traffic has already begun to compromise backend performance. In this scenario, an explicit barrier is introduced before actual access to the site: the visitor is temporarily stopped on an intermediate page while the system runs client-side checks, verifies the ability to execute JavaScript, evaluates the browser's fingerprint, and decides whether to allow continued browsing.

WAF-Interstitial-Banner

This mode is more visible than transparent control, but it has the advantage of immediately reducing the burden on the infrastructure, preventing suspicious clients from directly accessing dynamic pages, catalogs, filters, search endpoints, or particularly computationally expensive areas. It is particularly useful during sudden spikes, massive scraping campaigns, large-scale automated scans, or application flooding attempts.

The human checkbox represents the most invasive level of verification, but also one of the most effective when it is necessary to distinguish with greater certainty between human traffic and automation.By requiring an explicit form of interaction, it allows you to block many bots that, while capable of running JavaScript or partially simulating a browser, are not designed to pass manual verification. Precisely because of its greater invasiveness, it should be used wisely, preferably in high-risk cases or as an escalation for clients with an intermediate or high suspicion score.

WAF-Human-Checkbox

Results achieved in production

The results achieved in production were very significant. In cases where the problem was intensive crawling, aggressive scraping, or L7 DDoS attacks, the introduction of the WAF allowed us to drastically reduce the backend load and, in all cases, eliminate the manual emergency interventions that were previously necessary to mitigate sudden saturations.

Specifically, we observed a 50-70% reduction in backend load on heavily crawled sites, especially when the crash occurred before the Varnish cache and the application backend. This translates into fewer PHP requests, fewer database queries, less CPU saturation, and greater perceived stability for real users.

The WAF response time remained consistently below 1 millisecond (during development we even went as far as thinking about nanosecond optimizations) per request in 99,9% of cases measured in production using ngx.now() on real loads. This is crucial: anti-bot protection must protect, not burden. If the defense system introduces significant latency on every request, it risks becoming part of the problem.

CPU-Load-Optimized-AI-Bot-and-Crawler

Another important result was the elimination of false positives detected on real users in transparent modes, thanks to the combination of challenge, fingerprinting, and signed HMAC cookies. Once the client is verified, the system can efficiently recognize it without continuously repeating costly or invasive checks.

Compatibility has been tested with major modern browsers, including mobile WebView and in-app browsers like those used by Facebook, Instagram, DuckDuckGo, and other integrated environments. This is crucial because many real users no longer browse exclusively from desktop Chrome or classic Safari. They come from apps, embedded browsers, mobile devices, privacy-oriented environments, and configurations with restrictive Content Security Policies.

A modern anti-bot system must take this diversity into account. It must be effective against automated systems, but it must not penalize real users simply because they're browsing from a less traditional context.

Why an on-premise solution may be preferable

Cloudflare has had the great merit of making advanced protection, caching, DDoS mitigation, traffic optimization, and anti-bot challenge tools accessible to a very wide audience. It has also helped pave a technical and conceptual path that, inevitably, has also inspired the development of rare proprietary solutions like ours.

It would be hypocritical to argue that a solution like WAF 4 NGINX could have been born in the same way without the existence of platforms like Cloudflare. Cloudflare popularized a modern approach to application security: filtering traffic before it reaches the backend, evaluating client behavior, applying progressive challenges, distinguishing legitimate users from automated attacks, and protecting the site not only from volumetric attacks but also from application-layer abuse.

For many sites, services like this offer a valid, convenient, and quick-to-implement solution. In just a few steps, you can activate effective protection, benefit from a globally distributed network, reduce the load on the origin, mitigate DDoS attacks, improve cache management, and introduce anti-bot mechanisms without having to develop complex components in-house. For blogs, corporate websites, editorial portals, and many e-commerce sites, this model can be perfectly suited.

However, this is not always the ideal answer.

Delegating the management of HTTP and HTTPS traffic to a third party means passing requests, headers, cookies, IP addresses, visited URLs and, in some cases, even sensitive content or data through an external infrastructure.For many companies, this isn't a problem, especially when the operational benefits far outweigh the architectural implications. For other companies, however, this choice requires much more careful consideration.

Some organizations, due to internal policy, do not want web traffic to pass through an external intermediary. Others cannot change nameservers because DNS is managed by separate departments, third-party providers, corporate policies, or very strict bureaucratic procedures. In other cases, the domain is part of a larger infrastructure, with records, delegations, mail configurations, legacy services, and operational constraints that make changing DNS management complex or risky.

Then there are contexts in which legal, contractual or compliance constraints take on a decisive weight. Companies that process sensitive data, public administrations, healthcare portals, B2B platforms, e-commerce sites with specific data processing obligations, or entities subject to very restrictive GDPR policies may prefer to keep the entire request management cycle under their own infrastructure control.

WAF-4-NGINX-Rules

An on-premise solution allows you to decide exactly where requests are processed, what data is collected, how long it is stored, how cookies are signed, what logs are written, and what rules are applied. This level of control is particularly important for companies that process sensitive data, e-commerce with large volumes, B2B portals, public infrastructures, or customers with stringent compliance, and the WAF 4 NGINX solution we developed and implemented is particularly important. It elegantly solves the problem and the governance and privacy requirements, running on premise directly on the customer's server.

Furthermore, a local solution can be precisely adapted to the individual stack. A WordPress site with WooCommerce doesn't have the same needs as a Magento site. A PrestaShop with faceted navigation doesn't behave like an editorial blog. A headless portal doesn't have the same patterns as a legacy site. A WAF integrated into the reverse proxy can be customized for the specific case, instead of applying generic rules that apply to everyone.

Fingerprinting does not mean blocking privacy-oriented users

A legitimate objection concerns privacy. Browser fingerprinting is often associated with advertising tracking and invasive profiling. It's therefore important to clarify the difference between fingerprinting used for advertising and fingerprinting used for security.

In the first case, the goal is to identify or track a user across different sites, often without explicit consent, for commercial or advertising purposes. In the second case, the goal is to assess whether a client is real or automated, protect the infrastructure, prevent abuse, prevent scraping, and mitigate application attacks.

A properly designed system doesn't need to build invasive persistent profiles. It can simply calculate a technical score, sign a validation cookie, apply a reasonable expiration date, and retain only the information necessary for security. The principle should be one of minimization: collect just enough to distinguish a bot from a real browser, not as much as possible to profile the user.

It's also important to avoid a punitive approach toward privacy-conscious users. Browsers with anti-tracking protections, privacy extensions, restrictive configurations, or partially masked fingerprints shouldn't be automatically treated as malicious. They should simply receive a consistent score and, if necessary, be subjected to a non-invasive challenge.

Anti-bot fingerprinting should be a security tool, not a surveillance tool.

The future of filtering will be increasingly behavioral

Simple IP blocking is destined to become increasingly less effective. Modern bots can change IP addresses, use residential proxies, distribute requests, rotate user agents, simulate realistic headers, and mimic certain browser features.

This is why the future of protection will increasingly rely on a combination of signals. Fingerprinting, behavioral analysis, IP reputation, geographic consistency, request frequency, browsing patterns, JavaScript validation, signed cookies, progressive challenges, and traffic observation will all need to work together.

A real user doesn't navigate like a scraper. They don't systematically open thousands of filter combinations. They don't request hundreds of URLs per minute at perfectly regular intervals. They don't completely ignore assets, images, CSS, and JavaScript. They don't jump from one category to another following alphabetic or parametric patterns. They don't hit known vulnerable endpoints sequentially on dozens of different sites.

Behavior tells a lot. Fingerprints tell a lot. Consistency between declared and actual data tells a lot. The strength of a modern WAF lies in bringing these elements together.

Conclusion

Automated traffic has become one of the main operational challenges for modern website owners. The rise of AI-related crawlers, combined with the persistence of malicious scanners looking for vulnerabilities in CMSs, has made one thing clear: we can no longer trust the user agent, the IP address, or overly simple static rules.

When unwanted traffic hits poorly cacheable sites, complex e-commerce sites, or platforms with faceted navigation, the results can be devastating: skyrocketing CPU usage, strained databases, slow checkouts, penalized real users, and needlessly increased infrastructure costs.

Adding hardware can help, but it doesn't solve the root problem. Database optimization is essential, but it's not enough if the load is generated by bots. Blocking entire countries can work in some cases, but it becomes impractical when legitimate and malicious traffic originate from the same geographic areas.

A more intelligent, surgical and adaptive approach is needed.

Browser fingerprinting, when used correctly and with respect for privacy, allows for a more accurate distinction between a real browser and an automated client. Combined with JavaScript challenges, signed cookies, progressive scoring, temporary bans, whitelists, blacklists, and reverse proxy filtering, it becomes an extremely effective tool for protecting websites and application infrastructure.

Our experience with WAF 4 NGINX shows that you can achieve a level of protection comparable to, and in some scenarios even better than, external solutions like Cloudflare, while maintaining full control over your infrastructure, data privacy, and regulatory compliance.

The point isn't to block everything. The point is to distinguish.

Distinguish the real customer from the bot.
Distinguish the declared crawler from the masked scraper.
Distinguish the real browser from the HTTP client pretending to be one.
Distinguish useful traffic from traffic that consumes resources, reduces performance, and puts service availability at risk.

In a web increasingly populated by automated agents, this ability to distinguish is no longer a technical luxury. It's an operational necessity.

If your site has started to suffer from unjustified loads and you are unable to elegantly contain bots and crawlers, contact us, we have the skills and the right solution for you!

Do you have doubts? Don't know where to start? Contact us!

We have all the answers to your questions to help you make the right choice.

Chat with us

Chat directly with our presales support.

0256569681

Contact us by phone during office hours 9:30 - 19:30

Contact us online

Open a request directly in the contact area.

DISCLAIMER, Legal Notes and Copyright. RedHat, Inc. holds the rights to Red Hat®, RHEL®, RedHat Linux®, and CentOS®; AlmaLinux™ is a trademark of the AlmaLinux OS Foundation; Rocky Linux® is a registered trademark of the Rocky Linux Foundation; SUSE® is a registered trademark of SUSE LLC; Canonical Ltd. holds the rights to Ubuntu®; Software in the Public Interest, Inc. holds the rights to Debian®; Linus Torvalds holds the rights to Linux®; FreeBSD® is a registered trademark of The FreeBSD Foundation; NetBSD® is a registered trademark of The NetBSD Foundation; OpenBSD® is a registered trademark of Theo de Raadt; Oracle Corporation holds the rights to Oracle®, MySQL®, MyRocks®, VirtualBox®, and ZFS®; Percona® is a registered trademark of Percona LLC; MariaDB® is a registered trademark of MariaDB Corporation Ab; PostgreSQL® is a registered trademark of PostgreSQL Global Development Group; SQLite® is a registered trademark of Hipp, Wyrick & Company, Inc.; KeyDB® is a registered trademark of EQ Alpha Technology Ltd.; Typesense® is a registered trademark of Typesense Inc.; REDIS® is a registered trademark of Redis Labs Ltd; F5 Networks, Inc. owns the rights to NGINX® and NGINX Plus®; Varnish® is a registered trademark of Varnish Software AB; HAProxy® is a registered trademark of HAProxy Technologies LLC; Traefik® is a registered trademark of Traefik Labs; Envoy® is a registered trademark of CNCF; Adobe Inc. owns the rights to Magento®; PrestaShop® is a registered trademark of PrestaShop SA; OpenCart® is a registered trademark of OpenCart Limited; Automattic Inc. holds the rights to WordPress®, WooCommerce®, and JetPack®; Open Source Matters, Inc. owns the rights to Joomla®; Dries Buytaert owns the rights to Drupal®; Shopify® is a registered trademark of Shopify Inc.; BigCommerce® is a registered trademark of BigCommerce Pty. Ltd.; TYPO3® is a registered trademark of the TYPO3 Association; Ghost® is a registered trademark of the Ghost Foundation; Amazon Web Services, Inc. owns the rights to AWS® and Amazon SES®; Google LLC owns the rights to Google Cloud™, Chrome™, and Google Kubernetes Engine™; Alibaba Cloud® is a registered trademark of Alibaba Group Holding Limited; DigitalOcean® is a registered trademark of DigitalOcean, LLC; Linode® is a registered trademark of Linode, LLC; Vultr® is a registered trademark of The Constant Company, LLC; Akamai® is a registered trademark of Akamai Technologies, Inc.; Fastly® is a registered trademark of Fastly, Inc.; Let's Encrypt® is a registered trademark of the Internet Security Research Group; Microsoft Corporation owns the rights to Microsoft®, Azure®, Windows®, Office®, and Internet Explorer®; Mozilla Foundation owns the rights to Firefox®; Apache® is a registered trademark of The Apache Software Foundation; Apache Tomcat® is a registered trademark of The Apache Software Foundation; PHP® is a registered trademark of the PHP Group; Docker® is a registered trademark of Docker, Inc.; Kubernetes® is a registered trademark of The Linux Foundation; OpenShift® is a registered trademark of Red Hat, Inc.; Podman® is a registered trademark of Red Hat, Inc.; Proxmox® is a registered trademark of Proxmox Server Solutions GmbH; VMware® is a registered trademark of Broadcom Inc.; CloudFlare® is a registered trademark of Cloudflare, Inc.; NETSCOUT® is a registered trademark of NETSCOUT Systems Inc.; ElasticSearch®, LogStash®, and Kibana® are registered trademarks of Elastic NV; Grafana® is a registered trademark of Grafana Labs; Prometheus® is a registered trademark of The Linux Foundation; Zabbix® is a registered trademark of Zabbix LLC; Datadog® is a registered trademark of Datadog, Inc.; Ceph® is a registered trademark of Red Hat, Inc.; MinIO® is a registered trademark of MinIO, Inc.; Mailgun® is a registered trademark of Mailgun Technologies, Inc.; SendGrid® is a registered trademark of Twilio Inc.; Postmark® is a registered trademark of ActiveCampaign, LLC; cPanel®, LLC owns the rights to cPanel®; Plesk® is a registered trademark of Plesk International GmbH; Hetzner® is a registered trademark of Hetzner Online GmbH; OVHcloud® is a registered trademark of OVH Groupe SAS; Terraform® is a registered trademark of HashiCorp, Inc.; Ansible® is a registered trademark of Red Hat, Inc.; cURL® is a registered trademark of Daniel Stenberg; Facebook®, Inc. owns the rights to Facebook®, Messenger® and Instagram®. This site is not affiliated with, sponsored by, or otherwise associated with any of the above-mentioned entities and does not represent any of these entities in any way. All rights to the brands and product names mentioned are the property of their respective copyright holders. All other trademarks mentioned are the property of their respective registrants. MANAGED SERVER® is a European registered trademark of MANAGED SERVER SRL, with registered office in Via Flavio Gioia, 6, 62012 Civitanova Marche (MC), Italy and operational headquarters in Via Enzo Ferrari, 9, 62012 Civitanova Marche (MC), Italy.

JUST A MOMENT !

Have you ever wondered if your hosting sucks?

Find out now if your hosting provider is hurting you with a slow website worthy of 1990! Instant results.

Close the CTA
Back to top