Table of contents of the article:
When you write on a corporate blog you always tend to have that pun in a corporate / institutional style, telling everything and nothing with cold and really not very compelling words. Unlike the big American corporations and the small companies that mimic the big corporations, we want to share in an empathic way the professional experiences of real life that will allow you to live real situations in the third person in order to learn from the experiences of others. Because in a world made up of companies, customers and suppliers and mere VAT numbers, there are stories of people who deserve to be told.
Today I want to talk to you about backup and data security telling you what I have and have to say about an all too underestimated problem.
If something can go wrong, it will go wrong.
This pseudo-scientific axiom better known as Murphy's law was the cornerstone of the organizational processes regarding the management of security and backups.
With this awareness in the last 6 years we have evaluated and implemented robust and proven backup solutions to ensure integrity of customer data.
Operating with the old Dreamsnet.it brand since 2005 and with previous systemic experiences since 2000, up to now we had always used an incremental snapshot backup solution that allowed a selective restore of both individual files and entire images. At the database level, for example, we used (and still use) tools that allow you to take snapshots quickly and respecting the logic of hot backups and integrity, that is to backup the DB without having to turn it off and making it always available the service even if it happens at 4 am (maybe the users are asleep, but the search engines index).
In short, not exactly the last of the class, if we think that today we have famous Italian hosting that still use the ancient and ancestral mysqldump. Really unbelievable.
In short, in thirteen long years, and after several hundred restored backups, no customer has ever lost a single file of their projects.
In the summer of 2018, something new, unexpected and upsetting happened that would forever question ourselves and anyone else after that ugly misadventure.
It was a sunny day, warm but not muggy, short-sleeved T-shirt and shorts with the notebook over the shoulder, I had just finished lunch at the river park that borders the house in Arad in Romania. I remember I was walking back in office at the Arad Business Center, along the beautiful bike path of the city very popular at that time by all those who like me were on their lunch break.
The view towards the river in full tranquility inspired peace and tranquility, the sight of people relaxed on the benches conveyed calm and positivity, I was literally immersed in that moment of bliss when the mobile phone starts ringing who was hooking up a call forwarding directly from the main office.
Return to planet earth, the one made of other people's problems to solve (after all, work generally has this purpose, right?) and I respond with a warm response
Hello I'm Marco from Managed Server how can I help you?
A male voice answers, it was a male, a boy about my age, between 30 and 35 I would have estimated by ear. We immediately overcome the formalisms of the Italian language and we put ourselves at our mutual ease by immediately giving us the TU (by the way, on the net and on Social networks we give ourselves the TU says the Netiquette did you know?)
Start with a bit of agitation to talk to me about Backup, restore restore, failed restore, lost data and data recovery. Too many abstract and confusing concepts, too many fragmented inputs, and a lot of confusion in my head. I do not understand.
Who is the one who is calling me? Are you a customer of ours?
What happened ?
Talk about backup, but need to restore a backup? Lost a backup? Don't have a backup?
Could it be one of those kids who can't tell the difference between a washing machine and a VCR?
I immediately stop this flood of random terms and phrases and ask you to calmly explain everything that has happened from the beginning, and ask me what your needs were.
Let's start again with much more calm and finally we start a discussion made up of sentences with complete meaning and above all logically connected.
In simple terms
This guy, hosted by a well-known French Hosting company, said that the night before he had made a mistake with the production of some FTP files and taken by frustration in solving the problem he had decided to make a "clean sweep", or delete everything And restore a backup to the day before.
Therefore he is about to connect with his FTP client, delete all the folders of his site and once finished, from the customer area perform the backup restore procedure from the convenient Web interface.
In short, four clicks and away!
After a few minutes of progress of the task, a message announced the completion of the restore operation.
Oh joy, joy and jubilation! What better news?
Houston we have a problem
What happened is easy to understand and as you have already guessed, the backup was damaged or rather EMPTY. The restore wizard had restored exactly ZERO FILES leaving the destination directory completely empty, instead of the files from the previous day.
He doesn't get discouraged, doesn't give up, and tries again. Maybe maybe something went wrong.
Same procedure, same archive, same restore message. Then you check with the FTP client and… EMPTY. Again no file. Nothing, Nada, zero, nisba.
Return to the interface and select the backup of the day before, same restore procedure. Same successful completion message, same result. EMPTY.
Select the backup of the day before the day before. Restore. EMPTY
And so on with the backup of the day before the day before the day before yet. 3 days before, 4 days before, 5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30, XNUMX days before.
Same result: EMPTY.
The archives to be restored were finished, and the conclusion was only one: none of the 30 backups up to 30 days ago had been possible to restore. This person had no way to restore his backups, his site feeding him.
The problem had become something really big as it was apparently without solution.
At that moment I felt a mix of emotions between them. Sadness mixed with joy.
Do you know when you live some bad event in a direct or indirect way in which you feel sadness for what happened but basically also a pinch of joy in knowing that that bad thing did not happen to you? Here, just that feeling, that mix of emotions at the antipodes, which collide with each other.
I was happy that he was not a customer of ours, I would not have been able to face such a situation in which the backup service exists but none of the archives can be restored. What could I have said to a PERSON who lives there with the site and feeds their family who was on the phone there with me at that moment?
I immediately thought that this moral ethical problem, that sense of responsibility would not have affected the huge hosting company of which this person was a customer in the least.
Ultimately it was clear how the story would go: this person would have talked to the telephone support, some guy put them a few pennies a month, would have turned the problem over to the Italian technical department that would have turned it over to the French one and to one of his technicians. . The technician would probably have found that something was wrong and would have sent back the message to the Italian technical support that they would have probably claimed the lack of data and possibly to ask for a refund or open a legal dispute.
Legal dispute. In Italy. Already here it would be funny, if it were not the tragic situation. Instead, it makes you cry when you think back to the stipulation of a contract and an SLA (written in such tiny fonts that they should be illegal), in which the supplier company indemnifies itself from any damage of this kind and any request for compensation.
It would have been easier to climb Everest with bare hands. (Okay, the one in the picture is not Everest, but it gives the idea).
The CEO of the company would never even know about the affair of this father of a family desperate for having lost 5 years of work. One of the many among millions of customers, a drop of the ocean. What do you want a problem like this to be for a company that makes hundreds and hundreds of millions of revenue a year with millions and millions of customers? Absolutely insignificant. Unfortunately.
What could we do?
Nothing. Faced with that complex situation and absolutely out of my and our room for maneuver (the supplier was another, we did not know what backup system they used, what the storage system could be, have access to the media, and why had this bad misadventure happened), what could we do?
The most correct thing to do was obviously to ask for a comparison with their technical support to understand if there was any possibility of recovery, and in the meantime to look if CASOMAI had a local backup on your PC maybe a few days or months before.
The first advice did not give any positive result, in fact the support was limited to saying that there was no file inside the backup that was done correctly anyway. This affirmation was also confirmed by their top-level French technical support after about two weeks who simply liquidated with a "there is nothing“, An absolutely grotesque and unpleasant situation.
A little better instead for what concerns the advice to look for a backup locally, since actually with a lot of dedication and various rummaging through the files of his PC he managed to find and restore a backup of a few months before, which although it was not was the optimal situation of how to repair such damage, however, it allowed this person to get back on track and avoid bankruptcy inevitable if this were not the case.
All is well what ends well!
The lesson we have learned
Throughout this bad story, our role was absolutely spectatorial. Absolutely irrelevant in short for the outcome of the story fortunately resolved if not in the best of all however certainly not in the worst.
All this, however, gave us the opportunity to learn some important considerations hitherto underestimated. Essentially we asked ourselves the following questions:
1. Why can't what happened to them happen to us?
Why should a failing backup system only happen to other hosting providers and not to us?
It would be hypocritical right? It's like saying that we could easily avoid not wearing seat belts when we drive because accidents happen only to others, don't they? Instead, the most correct and appropriate answer to the case is that until then, that memorable summer day, the only reason it didn't happen to us was chance, pure and simple luck. In short, the fact that a snapshot backup system is in any case very advanced has never generated corrupt backups. Certain remote eventuality, unlikely, but not impossible as we saw that very day.
2. How much does data loss weigh on people's lives?
Losing a website or data can spell the end of a business. This means creating an economic damage on people's lives, probably putting them in a position not to be able to afford goods of primary necessity. This cannot and must not happen, at least it must not happen through our fault, neither as a cause nor as a contributing cause.
3. How much does data loss affect our company?
The loss of a customer's data can most likely mean a legal dispute. It is irrelevant to be wrong or right or to go and see the contracts for the various indemnities and SLAs signed, there are clear legal obligations such as those imposed by the new European GDPR law that would see us accused of various omissions and therefore condemned to fines and compensation. . Better to invest in data security by allocating 10% of the turnover on monitoring systems, RAID systems, redundant storage, multiple backups, rather than risking courts, litigation, compensation and fines.
What did we do then?
Having understood the three points above loud and clear, we “simply” added a secondary one to the already functioning backup system, in turn redundant on a C14 data storage system with military grade certification in France.
In short, if before on our RAID1 systems we had only one backup that flowed into a storage area in RAID5, today we have three backups with two different technologies that converge on three different RAID5 storage systems and in turn one of these is mirrored on a service from C14 military grade secure anti-nuclear storage redundant in France.
In short, in this modus operandi, an unfortunate customer who needs to restore a backup can rely on a remote backup which if it does not work (very remote hypothesis) could rely on the second backup system, which in turn if it does not work (hypothesis itself very remote) could rely on Rsync mirroring on other storage.
If there is a major attack (let's assume a nuclear explosion) on the datacenter, we would still have a weekly mirroring on the C14 anti-atomic storage in France.
In short, the possibility of losing customer data has really been reduced to a limit tending to the impossible. More of these precautions currently do not exist to our knowledge in Italy but also at an international level, given that even very large companies with turnover of hundreds of millions of euros continue to use one and only backup as a definitive solution to protect themselves. itself and its customers.
In terms of internal policies (as we have always done) we continued to offer the Backup service included in the offer. It does not exist that it can be an additional value-added service with a separate price. THEWe offer backup included, it will be our care to protect customers in the best possible way, regardless of their optimism that nothing will ever happen to their data, or that it won't help.
In short, atomic bomb proof!