November 15, 2023

Google Adds New Documentation for a Mystery Crawler

Google has updated the list of its official crawlers by adding information for an unknown crawler.

Google Safety User Agent

Google has updated its list of official crawlers by adding the name and information for a relatively unknown crawler that publishers have seen from time to time, but for which no documentation existed to date.

Although Google has added official documentation for this crawler, the information provided seems to encourage further clarification.

Special Crawlers Google has several types of crawlers (also known as bots and spiders).

The different forms of crawlers include:

  1. Common Crawlers These bots are mainly used for indexing different types of content. However, some common crawlers are also used for search testing tools, for internal use by Google's product team, and for AI-related crawling.
  2. Fetchers Triggered by Users These are bots activated by users. This includes uses such as retrieving feeds or site verification.
  3. Special Crawlers These are for special cases like checking the quality of mobile ad pages or for push notification messages via Google APIs. These bots do not respect global user guidelines in the robots.txt file which are marked with an asterisk (*).

The new documentation concerns the User Agent Google Safety. The crawler isn't new, but the documentation is.

Google-Safety Crawler The Google-Safety crawler documentation in the Special Crawler category is used by Google processes to detect malware.

Uniquely among Specialty Crawlers, the Google-Safety Crawler completely ignores all directives in the robots.txt file.

Here's what the new documentation for the Google-Safety Crawler says:

The Google-Safety user agent handles crawling specifically for reporting abuse, such as malware discovery for publicly available links on Google properties.
This user agent ignores the rules of the robots.txt file.

 

Do you have doubts? Don't know where to start? Contact us!

We have all the answers to your questions to help you make the right choice.

Chat with us

Chat directly with our presales support.

0256569681

Contact us by phone during office hours 9:30 - 19:30

Contact us online

Open a request directly in the contact area.

INFORMATION

Managed Server Srl is a leading Italian player in providing advanced GNU/Linux system solutions oriented towards high performance. With a low-cost and predictable subscription model, we ensure that our customers have access to advanced technologies in hosting, dedicated servers and cloud services. In addition to this, we offer systems consultancy on Linux systems and specialized maintenance in DBMS, IT Security, Cloud and much more. We stand out for our expertise in hosting leading Open Source CMS such as WordPress, WooCommerce, Drupal, Prestashop, Joomla, OpenCart and Magento, supported by a high-level support and consultancy service suitable for Public Administration, SMEs and any size.

Red Hat, Inc. owns the rights to Red Hat®, RHEL®, RedHat Linux®, and CentOS®; AlmaLinux™ is a trademark of AlmaLinux OS Foundation; Rocky Linux® is a registered trademark of the Rocky Linux Foundation; SUSE® is a registered trademark of SUSE LLC; Canonical Ltd. owns the rights to Ubuntu®; Software in the Public Interest, Inc. holds the rights to Debian®; Linus Torvalds owns the rights to Linux®; FreeBSD® is a registered trademark of The FreeBSD Foundation; NetBSD® is a registered trademark of The NetBSD Foundation; OpenBSD® is a registered trademark of Theo de Raadt. Oracle Corporation owns the rights to Oracle®, MySQL®, and MyRocks®; Percona® is a registered trademark of Percona LLC; MariaDB® is a registered trademark of MariaDB Corporation Ab; REDIS® is a registered trademark of Redis Labs Ltd. F5 Networks, Inc. owns the rights to NGINX® and NGINX Plus®; Varnish® is a registered trademark of Varnish Software AB. Adobe Inc. holds the rights to Magento®; PrestaShop® is a registered trademark of PrestaShop SA; OpenCart® is a registered trademark of OpenCart Limited. Automattic Inc. owns the rights to WordPress®, WooCommerce®, and JetPack®; Open Source Matters, Inc. owns the rights to Joomla®; Dries Buytaert holds the rights to Drupal®. Amazon Web Services, Inc. holds the rights to AWS®; Google LLC holds the rights to Google Cloud™ and Chrome™; Facebook, Inc. owns the rights to Facebook®; Microsoft Corporation holds the rights to Microsoft®, Azure®, and Internet Explorer®; Mozilla Foundation owns the rights to Firefox®. Apache® is a registered trademark of The Apache Software Foundation; PHP® is a registered trademark of the PHP Group. CloudFlare® is a registered trademark of Cloudflare, Inc.; NETSCOUT® is a registered trademark of NETSCOUT Systems Inc.; ElasticSearch®, LogStash®, and Kibana® are registered trademarks of Elastic NV This site is not affiliated, sponsored, or otherwise associated with any of the entities mentioned above and does not represent any of these entities in any way. All rights to the brands and product names mentioned are the property of their respective copyright holders. Any other trademarks mentioned belong to their registrants. MANAGED SERVER® is a registered trademark at European level by MANAGED SERVER SRL Via Enzo Ferrari, 9 62012 Civitanova Marche (MC) Italy.

Back to top