Table of contents of the article:
One of the little known cases to the end user of a website is that inherent to problems of site availability that occur when the server performs the MySQL database backup.
Although maintenance and backup operations are scheduled at night time where low traffic flow and a lower server load are expected, what happens when Google's crawlers pass by to scan the new content and index them right at that moment?
If the backup is designed correctly, with the right technologies and on a dataset of adequate size, everything will simply be fine and Google will be able to scan our site, the new contents and then index them later, otherwise will be hit by a series of 500 type errors and once the crawling budget for crawling activities on your site has been spent it will go away without recovering the new content and indeed detecting numerous errors of type 500 that in the medium / long term it could also compromise the ranking of your site given the very high error ratio.
But why does this happen?
To understand the reasons why these problems occur, which are not frequent but not too rare, we must understand how normally a hosting provider takes care of carrying out the backup activities at the server level and more specifically when it will back up the database of ours site. What happens, for example, if while it is copying a request arrives to our site that through PHP deals with executing multiple SQL queries to the database?
In a context of perfect idiocy we will dirty the backup by generating an inconsistent situation since maybe in that millisecond we would have updated a new record in the hypothetical personal data table without writing the relative record in the tax code table.
In a standard context, the system administrator or the control panel will execute mysqldump with the locking parameters which will block the tables for both reading and writing, will guarantee an intact and consistent dump but will inevitably generate locking problems, i.e. we would have blocked resources and therefore the query will not be successful within the time established by the webserver, triggering a timeout at best, at worst "hanging" the crawler for several tens of seconds and generating timeout on the crawler side.
Avoiding these situations is possible by using the right precautions and knowing how to evaluate the scenario based on the technology, the size of the dataset, the type of tables and essentially the customer's budget in order to be able to plan with serenity the best solution for their pockets.
MySQL backup via mysqldump
It is probably the simplest and most common backup method as well as the most incorrect, partly because it is part of the MySQL client programs, so it will be installed by default. It makes logical backups and can back up all databases, one database and all tables or one or more tables in the same database. The dump is printed on the standard output, so that the output can be sent to other programs for interoperability. For example, you can link the output to gzip, since mysqldump doesn't compress the output. The biggest downside is that it is painfully slow on large databases, not only because it performs logical backups, but also because both the backup and the restore process are single-threaded. The dump contains SQL statements to create the tables, populate them with data, or both.
As we mentioned above, this is the most incorrect way you can use to make a backup of a MySQL, Percona Server or MariaDB database, i.e. use the mysqldump system utility. If you haven't noticed yet, we are in the year 2019 and no longer in 2005 and perhaps it is time for you and your hosting provider to have a good wake up taking note of how many problems this utility creates especially on large databases of several gigabytes.
If it doesn't, feel free to switch suppliers if your project is extremely ambitious and important to you.
Mysqldump can extract and copy the contents of tables row by row, or it can extract it entirely from the table and put it in a memory buffer before copying it. Using buffering can be a problem when copying large tables.
Essentially the purpose of mysqldump is to generate a .sql file in which there are instructions to recreate the database called dump as we said before.
As we said before to ensure a consistent database backup we use parameters for the lock of the tables:
- For tables InnoDB is used
--single-transaction
as an option. - For MyISAM tables the option is used instead
--lock-tables
If you have a database of a few megabytes or ten megabytes you can "safely" use this approach which takes a few seconds or so to produce a .sql dump, but what if we are faced with a database of several gigabytes? It could take tens of minutes or even hours.
At that time the database will be in a locked state and no one will be able to write or read. Not even Google returning this type of error to you.
Percona XtraBackup. How to make a MySQL backup like a pro.
While the method listed above performs logical backups, with grueling slowness, such an approach becomes unusable as a database begins to grow a few gigabytes. Xtrabackup is the most popular MySQL physical backup software. It can create a hot backup (which means there is no need to stop the database server), is much faster, and supports fully non-blocking backups, as long as all tables are InnoDB or XtraDB. It can create local backup files or a standard output stream, so it's very versatile. For example, using tools like gof3r you can stream backups to Amazon S3 and upload the backup on the fly (without storing it locally). The size of the backups is almost as large as the entire database, but xtrabackup supports compressed backups using qpress, greatly reducing the final size. Another great feature is that it can encrypt backups or streaming with AES128, AES192 and AES256.
In addition to having many other advantages, first of all a practically immediate restore of the entire DB and many other features that would go far beyond our scope focused on systemic problems that result in SEO problems, the main advantage is that of not locking the database and therefore continue to respond to any queries from night visitors, spiders and crawlers.
Simply stated, while mysqldump has to lock the tables, back them up and then unlock them, Percona XtraBackup uses a native function of MySQL and derived DBMS software such as MariaDB or Percona Server called binary log where all the writing and modification instructions are saved. come to the DB.
Contact XtraBackup in short, it takes care of copying the data in a rough way, continuing to log the modification instructions in these binary logs and then read these log files and apply the instructions contained within them (i.e. database modifications) to the copy of the database of the backup, thus generating a “photograph” that is fully intact and compliant with the initial situation.
Performance MySQLdump VS Percona XtraBackup
If we want to touch real numbers or measure to decide, we can briefly see a benchmark between the two tools to understand what we are talking about and why mysqldump must be considered as an inappropriate toy for systems in production, going to create slow mysql backups.
As we can see while working on the same 73GB dataset a backup made with mysqlsump is 50 times slower than one made with Percona XtraBackup.
It must also be said that Percona XtraBackup is absolutely free without any limitations and currently the best existing backup solution for systems based on MySQL and derivatives. So good that it remains even better than MySQL Enterprise Backup made by the parent company of MySQL itself and offered at the modest sum (obviously sarcastic tone) of $ 5000 per year per server.
In short, there is no reason not to use Percona XtraBackup other than superficiality and overt ignorance.
Do you use panels like cPanel or Plesk? Attention, they also use mysqldump
Unfortunately, amateurism and carelessness does not only concern Sunday systems engineers and improvised handymen but also companies of a certain depth that produce commercial control panels such as cPanel or Plesk.
In fact, reading on their website we can read in January 2019 regarding the use of Percona:
Furthermore, even compared to other types of backups, here is a simple summary that shows why you should use this way of working instead of improvising with antiquated and highly non-performing utilities.
Percona XtraBackup is the only solution able to satisfy all the requirements, it can be used in an absolutely profitable way even together with a MASTER / SLAVE replica but this is another matter.
Conclusion
If you are used to finding backups of individual databases and the various .sql in your daily or weekly backup space, start to worry. If you notice that MySQL backup is very slow, ask yourself the right questions. If you see Google spider errors, start worrying, if you become aware that your hosting provider uses mysqldump, start worrying, if your web project gives you a living and you are sure it does not use Percona XtraBackup, change provider immediately . We don't like to discredit anyone, but an amateur in trouble is a danger to himself and your business.