Web Server Logs Compliant for Judicial Authority and GDPR - 🏆 Managed Server
November 24, 2024

Web Server Logs in accordance with the law for judicial authorities and the GDPR

Legal and GDPR compliant log management in conjunction with the use of CDN and Reverse Proxy technologies such as CloudFlare and the like.

Introduction

Over the past few decades, the Internet has become an essential part of the daily lives of billions of people. Surfing the web is a common activity for both personal and professional purposes, involving individuals of all ages and backgrounds. However, this growing dependence on the Internet has also led to an increase in illegal behavior, ranging from simple insults on social media to more serious crimes such as computer fraud, online scams, and child pornography crimes.

Behind every digital action there are traces, often recorded in web server logs, or files that collect information about the activities carried out by users during navigation. This data can reveal who visited a particular page, uploaded a file or left a comment.

For this reason, webmasters, hosting providers and systems engineers sometimes find themselves having to respond to requests from judicial authorities, who need to access web server logs to identify those responsible for illegal acts. However, the management of this information must be in accordance with the law, in particular respecting Italian legislation and the General Data Protection Regulation (GDPR).

This article looks at how to keep web server logs in a legally compliant manner, both in compliance with the requirements of the authorities and those of personal data protection, offering practical guidance to IT professionals.

What is GDPR?

Il General Data Protection Regulation (GDPR), introduced by the European Union and operational since 25 May 2018, represents one of the most important regulatory frameworks on the protection of personal data. Its main objective is to ensure that the data collected and processed by organizations are managed in a transparent, secure and respectful manner of the rights of individuals.

The GDPR applies to any entity that processes personal data of EU citizens, whether operating within the Union or outside. Personal data means any information that can identify a natural person, directly or indirectly, such as a name, IP address, email or location information.

What is a web server log?

Un web server is a software designed to receive and respond to HTTP and HTTPS requests from clients, typically browsers or other applications. In other words, it is the component that allows the viewing of websites and web applications on the Internet. Among the most used and well-known web servers we find:

  • Apache HTTP Server: One of the longest-running and most popular web servers, very versatile and supported by a large open source community.
  • Nginx: Famous for its efficiency in handling simultaneous connections, often used as both a web server and a reverse proxy.
  • LiteSpeed: Valued for its high performance, especially in shared hosting environments and with CMS like WordPress.

Every web server generates files called log, which record the activities performed on the server itself. Web server logs contain a detailed history of what happens when the server handles requests from users. Typically recorded information includes:

  • User's IP address: Identifies the device from which the request comes.
  • Date and time of the request: Indicates when the user interacted with the server.
  • URLs requested: Specifies which resources were requested (pages, files, images, etc.).
  • HTTP method used: Reports the type of operation performed, such as GET, POST, or DELETE.
  • Server response: Reports the status code returned by the server, such as 200 (success), 404 (page not found), or 500 (server error).
  • Useragent: Contains information about the browser, operating system and device used by the user.

access log

This data is essential for the operation, security and optimization of websites, allowing you to diagnose problems, monitor traffic and detect suspicious activity. However, much of this information, such as IP addresses, can be considered personal data under the GDPR, requiring careful and regulatory-compliant management.

Italian law and log retention

In Italy, the management and storage of web server logs are regulated by a complex set of regulations that are intertwined with the General Data Protection Regulation (GDPR). These rules often add specific details and obligations to the European framework, especially in relation to security needs, judicial investigations and prevention of crime.

The Electronic Communications Code

A central reference is the Electronic Communications Code (Legislative Decree 259/2003), which establishes precise obligations for providers of telecommunications and internet access services. In particular, the Code provides that traffic data, collected for the purposes of ascertaining and repressing crimes, must be retained for a minimum period of 12 months. This data includes:

  • Source and destination of the communication (IP addresses and port numbers).
  • Date, time and duration of the connection.
  • Protocol used (for example, HTTP, HTTPS, or FTP).

The requirement to retain for at least one year is designed to ensure that information needed for an investigation can be retrieved by the competent authorities.

Obligations for service providers

Internet service providers, hosting and telecommunications are subject to more stringent rules than other operators. In Italy, these entities must retain logs not only for technical or administrative purposes, but also to respond to any requests from judicial authorities. Failure to comply with these obligations may result in administrative or criminal sanctions, depending on the severity of the violation.

As regards the Hosting Providers, while not expressly equated to telecommunications providers, there are indirect obligations to cooperate with the authorities. For example, web server logs could be requested to identify who has uploaded illegal content to a hosted platform or to track down the perpetrator of a fraud.

Prolonged storage in specific cases

In special situations, such as investigations into serious crimes (e.g. terrorism or child pornography), the retention of logs may be extended beyond the standard terms. In such cases, upon request of judicial authorities, data may be retained for a longer period, provided that the request is justified and occurs in compliance with the legislation on the protection of personal data.

Differences between GDPR and Italian law

Unlike the GDPR, which leaves room for more general interpretations regarding the duration of retention, the Italian regulation is often more detailed and prescriptive. This creates a complex framework, where IT professionals must balance:

  1. Minimum duration: In Italy, the retention of traffic data for at least 12 months is mandatory for public safety purposes and for the detection of illegal activities.
  2. Maximum duration: The GDPR imposes the principle of “storage limitation”, whereby data must not be retained longer than is necessary for the stated purposes.
  3. Compatibility between regulations: Italian law can expand the requirements of the GDPR, but it cannot contradict them. For example, it cannot require the retention of logs for an indefinite period without a solid legal basis.

Other regulatory references

Other legislative instruments that influence log retention in Italy include:

  • Personal data protection code (Legislative Decree 196/2003): Integrated with the GDPR, it specifies the criteria for the conservation of personal data at national level.
  • Law against terrorism (Legislative Decree 155/2005): Introduces specific cooperation obligations for the monitoring and retention of data related to potential terrorist threats.
  • AGCOM Regulation: For digital content providers, it imposes indirect responsibilities in the management of data relating to users and online activities.

Mistakes to avoid in managing logs and IPs when using CDN and Reverse Proxy

When using a CDN (Content Delivery Network) or a technology of reverse proxies including Varnish, Nginx, Cloudflare o Caddy, an intermediate layer is introduced between the user and the originating server that, if not configured correctly, can alter the data collected in the logs. A common mistake is to record the IP address of the reverse proxy in the log files instead of the real IP of the user who made the request. This happens because, in the default configuration, the originating server sees as the source address that of the proxy or CDN node that forwarded the request, not that of the final client.

This issue can have significant implications. Recording the proxy IP address compromises the validity of the logs for several purposes. From the perspective of security, makes it difficult to identify the actual origin of any attacks, such as intrusion attempts or suspicious activities. In terms of legal compliance, this error can impair the ability to comply with requests from law enforcement authorities, who often need logs to trace the identity of the user responsible for an illegal online activity. Furthermore, in the context of traffic analysis, failure to record users' real IPs can lead to inaccurate data, reducing the reliability of key metrics for website optimization and management.

This challenge becomes even more relevant in complex scenarios, where traffic is distributed across multiple CDN nodes or through multiple proxy levels. To avoid these errors, it is essential to configure the system to preserve the user's original IP, using specific headers provided by the adopted technology, such as X-Forwarded-For o CF-Connecting-IP. Without such a configuration, the logs not only become unusable for investigative purposes, but can also expose the service provider to potential regulatory violations, especially in regulated environments such as GDPR or of Italian legislation.

Why does this error occur?

Reverse proxy technologies and CDNs act as intermediaries between the user and the originating server. When a request passes through a proxy like Varnish or a CDN like Cloudflare, the originating web server only sees the IP address of the proxy and not the actual client. This is because the connection between the proxy and the originating server does not preserve the user's IP unless it is configured correctly.

How to avoid logging proxy IP

With Varnish

Varnish, being a reverse proxy caching technology, does not directly handle HTTPS traffic, but relies on a server such as Nginx o Caddy for SSL termination. To ensure that the user's real IP is logged, you need to configure logging during HTTPS termination. For example:

  • Configuration Nginx o Caddy to add the header X-Forwarded-For on the incoming request. This header includes the client's original IP address.
  • On Varnish, configure logs to read and log the IP from the header X-Forwarded-For, rather than the IP address of the direct connection.

Example configuration in Varnish VCL:

sub vcl_recv { set req.http.X-Real-IP = client.ip; }

With Cloudflare and other CDNs

Cloudflare and similar services (Akamai, Imperva's Incapsula, Sucuri, Fastly) provide specific guides to restore the user's real IP. You need to:

  1. Configure the native web server (e.g. NGINX or Apache) to read the custom headers added by the CDN, such as CF-Connecting-IP (Cloudflare) or True-Client-IP (Akamai).
  2. Enable the Real IP Module in NGINX or the equivalent module in Apache, specifying the CDN IP ranges as trusted proxies.

Example configuration for NGINX with Cloudflare:

set_real_ip_from 173.245.48.0/20; # Cloudflare IP Ranges set_real_ip_from 103.21.244.0/22; real_ip_header CF-Connecting-IP;

With other CDNs

This approach also applies to other CDNs that operate as reverse proxies. For example:

  • fast: Use the header Fastly-Client-IP.
  • Sucuri: Use the header X-Forwarded-For.
  • Encapsulate (Imperva): Use the header True-Client-IP.

Implications and risks

Failure to properly configure real IP address logging can have significant consequences:

  • GDPR Compliance and Italian Law: If the logs only contain proxy IPs, they may not be considered valid for reconstructing a user's activities in case of legal requests.
  • SafetyIntrusion detection or anomaly analysis tools can be ineffective if they don't work with users' real IPs.
  • Debugging and traffic analysisVisibility into real user behavior is compromised.

To avoid errors in log management with the use of reverse proxies and CDNs, it is essential to configure the servers to record the real IPs of users, using the specific headers provided by each technology. Whether you use Varnish, Nginx, Cloudflare, Akamai, Incapsula, or other solutions, following IP recovery best practices is essential to ensure reliable and compliant logs.

Conclusion

Web server log management is a crucial aspect to ensure legal compliance, security and transparency. In Italy, regulations require careful log storage, respecting both the stringent rules of the Electronic Communications Code both the general principles of the GDPR, such as data minimization and data retention limitation. This balance must be carefully maintained, especially when using advanced technologies such as reverse proxies e CDN.

The use of services such as Cloudflare, Akamai, fast o Varnish introduces complexity in managing real IP addresses, which can be replaced by proxy IPs if not configured correctly. This error can invalidate logs for legal or security purposes and reduce their usefulness for technical analysis. It is essential to implement the necessary configurations to correctly log real IPs, for example by using specific headers of reverse proxy technologies.

Additionally, it is essential to establish clear communication with customers, especially when they manage their own domains. Hosting providers should inform customers about the potential impacts of CDNs or reverse proxies on logging in the preliminary stages of the contract. If a customer decides to use technologies such as Cloudflare, it is important to notify the provider to ensure the system is configured correctly.

Do you have doubts? Don't know where to start? Contact us!

We have all the answers to your questions to help you make the right choice.

Chat with us

Chat directly with our presales support.

0256569681

Contact us by phone during office hours 9:30 - 19:30

Contact us online

Open a request directly in the contact area.

INFORMATION

Managed Server Srl is a leading Italian player in providing advanced GNU/Linux system solutions oriented towards high performance. With a low-cost and predictable subscription model, we ensure that our customers have access to advanced technologies in hosting, dedicated servers and cloud services. In addition to this, we offer systems consultancy on Linux systems and specialized maintenance in DBMS, IT Security, Cloud and much more. We stand out for our expertise in hosting leading Open Source CMS such as WordPress, WooCommerce, Drupal, Prestashop, Joomla, OpenCart and Magento, supported by a high-level support and consultancy service suitable for Public Administration, SMEs and any size.

Red Hat, Inc. owns the rights to Red Hat®, RHEL®, RedHat Linux®, and CentOS®; AlmaLinux™ is a trademark of AlmaLinux OS Foundation; Rocky Linux® is a registered trademark of the Rocky Linux Foundation; SUSE® is a registered trademark of SUSE LLC; Canonical Ltd. owns the rights to Ubuntu®; Software in the Public Interest, Inc. holds the rights to Debian®; Linus Torvalds holds the rights to Linux®; FreeBSD® is a registered trademark of The FreeBSD Foundation; NetBSD® is a registered trademark of The NetBSD Foundation; OpenBSD® is a registered trademark of Theo de Raadt. Oracle Corporation owns the rights to Oracle®, MySQL®, and MyRocks®; Percona® is a registered trademark of Percona LLC; MariaDB® is a registered trademark of MariaDB Corporation Ab; REDIS® is a registered trademark of Redis Labs Ltd. F5 Networks, Inc. owns the rights to NGINX® and NGINX Plus®; Varnish® is a registered trademark of Varnish Software AB. Adobe Inc. holds the rights to Magento®; PrestaShop® is a registered trademark of PrestaShop SA; OpenCart® is a registered trademark of OpenCart Limited. Automattic Inc. owns the rights to WordPress®, WooCommerce®, and JetPack®; Open Source Matters, Inc. owns the rights to Joomla®; Dries Buytaert holds the rights to Drupal®. Amazon Web Services, Inc. holds the rights to AWS®; Google LLC holds the rights to Google Cloud™ and Chrome™; Microsoft Corporation holds the rights to Microsoft®, Azure®, and Internet Explorer®; Mozilla Foundation owns the rights to Firefox®. Apache® is a registered trademark of The Apache Software Foundation; PHP® is a registered trademark of the PHP Group. CloudFlare® is a registered trademark of Cloudflare, Inc.; NETSCOUT® is a registered trademark of NETSCOUT Systems Inc.; ElasticSearch®, LogStash®, and Kibana® are registered trademarks of Elastic NV Hetzner Online GmbH owns the rights to Hetzner®; OVHcloud is a registered trademark of OVH Groupe SAS; cPanel®, LLC owns the rights to cPanel®; Plesk® is a registered trademark of Plesk International GmbH; Facebook, Inc. owns the rights to Facebook®. This site is not affiliated, sponsored or otherwise associated with any of the entities mentioned above and does not represent any of these entities in any way. All rights to the brands and product names mentioned are the property of their respective copyright holders. Any other trademarks mentioned belong to their registrants. MANAGED SERVER® is a trademark registered at European level by MANAGED SERVER SRL, Via Enzo Ferrari, 9, 62012 Civitanova Marche (MC), Italy.

Back to top