Table of contents of the article:
If there is one thing that has remained in my mind in the staging I did in fifth grade (23 years ago) at Elettromeccanica Cognigni in Civitanova Marche, it was these few but significant words "The right tools are already half the job".
This sentence was uttered by the grandfather of a dear friend of mine, who used to come to visit the owner of the company and browse or talk about this or that, as the elderly who have time during the day to cultivate their friendships usually do. and make visits here and there.
I was engaged in emptying an armature of an electric motor (an electric winding) by beating the chisel with a remedied pliers, when I heard these high-sounding words and was subsequently invited to use a hammer.
More precise, faster and more comfortable work. In a few weeks' time, the school internship would have finished and I would have finished this experience in contact with accumulators, windings, electric motors, and the like, leaving behind many of the notions learned in a "disposable" way, however that precious advice and that solemn sentence pronounced to effect is still today one of the cornerstones of my profession, and more widely of my way of life.
We could eat the soup with a fork, but surely a spoon is better.
The problem with applications using MySQL
There is a problem that affects almost everyone, mainly involving all those situations in which you have to deal with software written by others that uses a MySQL database or related forks such as MariaDB, Percona Server, which are in any case based and derived from MySQL itself.
When working with CMSs such as WordPress, WooCommerce, Magento, Prestashop and the like, in the end we always end up dealing with an RDBMS such as MySQL or a derivative, and it seems that it is normal, right and proper to have an RDBMS that looks also extremely fast and performing.
The problem isn't a big problem when the database doesn't become the bottleneck of our application, and of the performance that inevitably affects user experience, SEO, turnover and corporate profit.
Although it may have significantly improved the speed and performance of MySQL over the years, for example by switching from the old engine MyISAM to InnoDB you always have to ask yourself if MySQL (or related forks) is the best solution on the market in terms of features, performance and speed.
MySQL is extremely slow compared to PostgreSQL
For example, if we were looking for an RDBMS compatible with SQL Standard that is open source, free, well supported and documented, multi-architecture, portable, extremely performant, we will surely exclude commercial and proprietary closed source solutions such as Oracle DB, SQL Server or DB2 from IBM.
However, we could and should consider PostgreSQL or just Postgres.
PostgreSQL is a powerful open source object-relational database system with over 35 years of active development that has earned it a solid reputation for reliability, feature robustness, and performance.
Also known as Postgres , is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally called POSTGRES, referring to its origins as a successor to the Ingres database developed at the University of California, Berkeley. In 1996, the project was renamed to PostgreSQL to reflect its support for SQL . After a revision in 2007, the development team decided to keep the PostgreSQL name and the Postgres alias.
PostgreSQL presents transactions with Atomicity, Consistency, Isolation, Durability properties (ACID), automatically updatable views, showed materialized , triggers , foreign keys and stored procedures . It is designed to handle a wide variety of workloads, from single machines to data warehouses or web services with many concurrent users. It is the default database for macOS Server and is also available for Windows , Linux ,FreeBSD and OpenBSD .
But how slow is MySQL compared to PostgreSQL?
Or rather, how much faster is PostgreSQL compared to MySQL ?
How did I benchmark the databases?
Benchmarking a database can be a tricky business. Just look around the Internet. There are different benchmarks, which measure different things, sometimes metrics, which don't tell you anything significant.
From my point of view, I see the database as something that exactly matches its name. A basis for a fillet . Nothing more. So no application logic in the database. The database should hold the data and perform two main operations as quickly as possible: read e to write .
I see read as something that doesn't change the database and write as something that changes the database. To me, delete and update are both subsets of writing.
Also, when reading from the database, I tend to render my own selects
as simple as possible. I don't use join on join on join on join. I prefer to read multiple different tables as fast as possible and then process the data outside the database. But I use max
, avg
e min
very sum
. I use simple things a lot and complicated things as little as possible.
In my view, my ideal database should be humming quietly all the time, serving reads and writes as quickly as possible. Nothing more.
When I decided to make my point of reference I was looking for three things:
- Which database has the fastest writes
- Which database has the fastest reads
- Which database has the least memory and CPU usage
Preparation
As I did my benchmark two years ago in 2019 (which led to the switch to PostgreSQL), it will be wise to repeat it now, in 2021.
At first, we need to run all databases, we want to benchmark. We can run all major databases in Docker with the commands (and a little more info) from this article, but find all those commands below.
I've added some popular forks of all three majors engine families.
Postgres engine family: PostgreSQL, TimescaleDB
MySQL engine family: MySQL, MariaDB, Percona
Microsoft SQL Server engine family: SQL Server
Here are the Docker commands to run all the databases we need to test.
PostgreSQL
docker run --name postgres -e POSTGRES_PASSWORD=password -p 5433:5432 -v postgres_data:/var/lib/postgresql/data -d postgres:alpine
Timescale DB
docker run --name timescale -e POSTGRES_PASSWORD=password -p 5434:5432 -v timescale_data:/var/lib/postgresql/data -d timescale/timescaledb:latest-pg12
MySQL
docker run --name mysql -e MYSQL_ROOT_PASSWORD=password -p 3306:3306 -v mysql_data:/var/lib/mysql -d mysql:latest
MariaDB
docker run --name mariadb -e MYSQL_ROOT_PASSWORD=password -p 3307:3306 -v mariadb_data:/var/lib/mysql -d mariadb:latest
Tap Server
docker run --name percona -e MYSQL_ROOT_PASSWORD=password -p 3308:3306 -v percona_data:/var/lib/mysql -d percona:ps-8
Unfortunately, thanks to Microsoft's EULA for SQL Server 2019, it's not possible to present benchmarks for SQL Server, but I can tell you that's a shame. You have to do your own benchmarks.
First comparison
At this point, you should have all six databases running. All six in their default states. No setup.
We can make our first comparison. We can compare database image size, initial memory usage in Docker and initial CPU usage.
From these results, it appears that PostgreSQL is the winner and SQL Server the loser. But we haven't done any read and/or write benchmarks yet.
Writing benchmarks.
To write benchmarks, I developed a simple Go program (github repo at the end of the article). This program creates a table called benchmark_data
with 6 columns. You need to create the database benchmark
manually using create database benchmark
.
This program inserts 10.000 rows into this table, one by one. No batch inserts, just simple inserts.
The benchmark will be run on Macbook Pro 2019, with Docker desktop running, all unnecessary applications are closed.
This first part of the benchmark will measure how long those inserts will take to complete. From my point of view, time with respect to some operation is the only reasonable measure that can be done. In the end, you always want to know How long does it take o what has been done in a specified time.
In this benchmark, I ran 5 consecutive insert batches, each of 10.000 rows. Clearly, PostgreSQL is the winner and Percona is the loser.
PostgreSQL is about twice as fast as Percona, when it comes to inserts. What strikes me here is the difference between MariaDB and MySQL, because they both belong to the same engine family. It seems that the people behind MariaDB have done some magic.
A simple conclusion: the Postgres engine family is about twice as fast in the MySQL family of engines, except MariaDB.
Reading benchmarks.
This reading benchmark was done like this: 2 000 cycles with two readings in each cycle: average
e sum
di data
column. Again, I did it all 5 times.
Here are the results. PostgreSQL is the winner, Percona is the loser. MariaDB is second worst.
The Postgres engine family is about twice as fast as the MySQL engine family.
Why is MySQL more used and popular than PostgreSQL ?
MySQL is one of the most popular database management systems in the world, although its performance is not comparable to other databases such as PostgreSQL. There are several reasons why MySQL is more popular than PostgreSQL, despite the latter's superior performance.
First of all, MySQL has a longer history than PostgreSQL and therefore had more time to spread and become popular. MySQL was first released in 1995, while PostgreSQL was released in 1996. This means that MySQL had one more year to spread and become known by programmers and industry professionals.
Additionally, MySQL is often included as a default component in many operating systems and development stacks, making it easily accessible to anyone who needs a database. Additionally, MySQL has very extensive documentation and a strong online presence, making it easy to learn and use for novice programmers.
Finally, MySQL is often chosen by companies due to its simplicity and its ability to handle large amounts of data. MySQL does not require advanced system administration knowledge to use and can easily handle large amounts of data, making it ideal for companies that need a scalable and easy-to-manage database.
Environmental impact of PostgreSQL compared to MySQL.
Using PostgreSQL instead of MySQL could have a positive impact on the environment. PostgreSQL is known for its efficiency and speed, which means it can handle large amounts of data with less resource consumption than MySQL. This can lead to a reduction in CO2 emissions, as servers will use less energy to perform the same tasks.
The use of PostgreSQL rather than MySQL could mean, for example, reducing the number of servers within an organization, avoiding the use of MySQL Clusters to compensate for performance problems, as well as avoiding a replacement or an upgrade of the hardware both for vertical scaling (increase of resources on a single machine), and for horizontal scaling, i.e. increasing nodes and machines.
Conclusions.
We have seen how, based on the benchmarks mentioned above, PostgreSQL is decidedly better and more recommended for a series of advantages both in terms of performance and speed and in the more efficient use of memory, without having to give up open source or a system of free license that allows it to be adopted without restrictions on free software and applications.
One therefore wonders why and why many realities, including the most used CMSs such as WordPress, Prestashop, Magento, Joomla, have decided to continue using a DBMS such as MySQL and derivatives which in fact does not have that caliber and those requirements such as make it palatable compared to the PostgreSQL counterpart.
Many considerations regarding the efficiency of the software as well as the impact on consumption and on the environment would come up if we consider this reasoning applied on a large scale, allowing for example to reduce the number of nodes in a cluster or to avoid in some cases horizontal or vertical scaling to address database inefficiency.
However, except for some rather immature sporadic projects to be used in production with the necessary guarantees, there still seems to be no future for PostgreSQL to be used as a backend for the most popular CMSs mentioned above.
A real shame when you consider that the degree of maturity of PostgreSQL is so high that it has virtually no rival (except perhaps Oracle DB).