The Turing Award winner, whose research and startups have shaped the industry for five decades, tells The Reg he still has more in store.
What if we built the operating system on top of the database instead of the other way around?
It sounds like a college student's idea after a microdose too many, but it's not. It's a serious idea from someone who has already revolutionized the computing industry and whose influence has spread into well-known products from Microsoft and Oracle.
Celebrating its 80th birthday this year, Michael Stonebraker continues his work in database research, but his impact on the industry was cemented with PostgreSQL, the open-source relational database system that, for the first time, has become the most popular choice among developers this year, according to the Stack Overflow 2023 survey. In addition to being a popular open-source DBMS, vendors including industry giants cloud, CockroachDB and YugabyteDB offer database services with a PostgreSQL-compatible interface.
Stonebraker's early influential work began with Ingres, the first relational database system, which arose as his research topic after he was appointed assistant professor at UC Berkeley in 1971.
Speaking to The Register, he says:
My doctoral thesis was on an aspect of Markov chains, and I realized that it had no practical value. I went to Berkeley, and I had five years to make a contribution and get tenure. I knew it wouldn't be my thesis topic. Then Eugene Wong, another faculty member at Berkeley, said, 'Why don't we look at databases?'
The two read a recent proposal on relational databases by Edgar Codd, an IBM researcher, called “A relational data model for large shared databases."
Stonebraker and Wong found the idea of English elegant and simple.
The obvious question was to try to build a relational database system. Both Eugene and I had no experience building systems software but, as academics, we thought, let's give it a try and see what happens. So, based on no experience, we started building Ingres. And that got me my professorship.
Ingres had competitors. IBM's System R was the first to demonstrate that the relational approach could deliver operational transaction performance and was the first to implement the now ubiquitous SQL. Oracle started its relational system in the late 70s. Ingres also faced a platform problem.
We had many visitors to Berkeley asking us who the biggest user of Ingres was. Then Arizona State University wanted to use it for a database of 35.000 student records but they couldn't get past the fact that they had to get an operating system that wasn't supported by these guys at Bell Labs, which was Unix
Ingres' targeting of mid-range systems, in which Unix had just emerged, also meant that it did not support COBOL, the dominant language for enterprise computing at the time. The only solution was to start a company.
says Stonebraker.
He founded Relational Technology to commercialize Ingres. It was later renamed Ingres Corporation and then purchased by ASK Corporation in 1990, which in turn was purchased by Computer Associates in 1994. Another member of the Berkeley Ingres team, Robert Epstein, founded Sybase, which for a decade was second to Oracle in the relational database market. In 1992, its product line was licensed to Microsoft, which used it for early versions of SQL Server.
But Stonebraker recognized that Ingres' commercial code was far ahead of the open source research project — other researchers could obtain the code for a nominal fee that covered the tape required to store it and postage costs — so his team decided to throw out the code over a cliff and start over. What comes after Ingres? Obviously Postgres.
A new era In 1986, a 28-page document [PDF] — co-written with Larry Rowe — announced the design of Postgres, as it was then known, setting out six guiding ambitions. Among these, two have proven to be relevant to the longevity of the database system. One was to provide better support for complex objects. The second was to provide user extensibility for data types, operators and access methods.
Stonebraker tells us he knew from conversations with Ingres customers that being extensible would be important for a successful database in the future. “A client once called me and said, 'You implemented the timing all wrong,'” he said.
The Berkeley professor was perplexed because his team had gone to great lengths to ensure they correctly implemented the Julian calendar, including leap years. But some financial obligations are paid in 12 equal months in a 360-day year, which you can't implement in Ingres but can in PostgreSQL, he says.
The motivation to make the database extensible also came from wanting to support new data types. An early project with Ingres sought to use it as a geographic information system, away from its home turf of corporate data. It was “arbitrarily slow and unsolvable,” Stonebraker says.
The vision has paid off in the last decade. Ten years ago, PostgreSQL added support for Json documents, the file format around which the NoSQL database MongoDB and Couchbase are based.
Stonebraker has been critical of the NoSQL movement in the past. He tells The Register that he was converging with relational databases because they adopted SQL or SQL-like languages and accepted the need for consistency.
The biggest good idea of NoSQL was the out of box experience, because with SQL databases, you have to build the database, and then you have to define the cursor. They are difficult to use. This is one of the very valid criticisms made against SQL databases: the out-of-the-box experience sucks. You should be able to turn it on and say, 'Here's some data.'
The various services available to provide PostgreSQL databases and PostgreSQL-compatible databases go some way to addressing this, but the emergence of the DBMS as a popular open source system was a happy accident, and one that Stonebraker had little to do with.
Although the search code for the database was – and remains – open source, building a database company around it was, at the time, impossible, as Stonebraker discovered when founding Illustra in 1992. “When we got venture capital funding both for Ingres than for Postgres, the VCs wanted nothing to do with open source, that was a later phenomenon,” he says.
In 2005, Stonebraker founded Vertica based on a shared column-oriented DBMS for data warehousing, which he now says “would have benefited immensely from being open source but the vitality of the open source code and VC community is a relatively recent phenomenon.”
'Closed source databases are not the wave of the future' Illustra was successful for a time. It was eventually sold to Informix for about $400 million in 1996, with Stonebraker's stake valued at $6,5 million, Forbes wrote in 1997. Stonebraker became CTO of the parent company for four years.
That's a comfortable sum, but chicken crumbs compared to Larry Ellison's estimated net worth of $145 billion. It goes without saying that Stonebraker is dismissive of Oracle, another early adopter of the relational model. “Ingres has always been technically better and Postgres is practically better. It's more flexible, and it's open source. And today, PostgreSQL is generally comparable in terms of performance. In general, closed source databases are not the wave of the future and I think Oracle is very expensive and not very flexible,” says Stonebraker.
However, it was Oracle who made a decision that provided an impetus to open source PostgreSQL. He purchased open source MySQL, which part of the community did not trust in the hands of the proprietary software giant. At the same time Illustra and other companies commercialized Postgres, Berkeley released the code for POSTGRES under the MIT license, allowing other developers to work on it.
In 1994, Andrew Yu and Jolly Chen, both Berkeley graduates, replaced the POSTQUEL query language with SQL. The resulting Postgres95 was made freely available and modifiable under a more permissive license and renamed PostgreSQL.
What ended up happening was Illustra was gaining traction, but the big coup was when this group of totally unrelated people who I didn't even know, picked up the open source Postgres code, which was still around, and ran with it , totally without my knowledge. It was a wonderful accident. When MySQL was purchased by Oracle, developers became suspicious in droves and switched to PostgreSQL. It was another happy accident. Its commercial success is wonderful, but it was largely serendipitous.
adds Stonebraker.
Meanwhile, database services have grown around PostgreSQL. It has become the most dominant front-facing interface for compatible, or nearly compatible, systems available from Google (AlloyDB and CloudSQL), Microsoft (Azure PostgreSQL), AWS (Aurora and RDS), CockcroachDB, YugabyteDB, EDB, and Avien.
The whole world is moving to the cloud, and Google, Amazon, and Microsoft are all betting the ranch on PostgreSQL compatibility. I think it's a great idea. CockroachDB is compatible with PostgreSQL. You can take a PostgreSQL application and drop it on CockroachDB. PostgreSQL does not have any distributed database capabilities but both YugabyteDB and CockroachDB do.
Stonebraker's influence also reaches rival Oracle's portfolio. Its federated Mariposa database became the basis for Cohera, a database company purchased by PeopleSoft in 2001, before becoming part of Oracle in 2004. In 2014, Stonebraker was recognized for the influence of his work on Ingres and Posgres with the Turing Award, earning Google $1 million in the process.
Despite many of his ideas being so widely used in the database industry, which Gartner said was worth $91 billion in 2022, Stonebraker is relaxed about other people using his ideas.
I did well financially. I knew Ted Codd, who was very magnanimous in saying that you all should run with [ideas]. You want to change the world; each particular person is only a part of it. I've always made open source code and shared code with anyone who wanted it. In the process, I did well financially so yes, I have no regrets at all.
But that doesn't mean he's ready to retire. In his latest project, Stonebraker is ready to change the world again.
The idea for DBOS, a Database Oriented Operating System, was born out of a conversation with Matei Zaharia, author of Apache Spark, co-founder of the analytics and machine learning company Databricks, and associate professor at Berkeley.
Spark and Databricks manage Spark instances in the cloud. Zaharia explained that, at any given time, Databricks often handles around a million Spark subtasks for various users. It wasn't possible to do this using traditional operating system programming techniques – something that could scale was needed. The obvious solution was to put all the planning information into a database. And that's exactly what the guys at Databricks did: they put everything into a PostgreSQL database, and then complained about Postgres' performance.
says Stonebraker.
Never one to shy away from a challenge, Stonebraker thought, “Well, I can do better than this.”
The new project replaced Linux and Kubernetes with a new operating system stack underlying a database system, the multi-core, transactional, highly available multi-node prototype VoltDB, which Stonebraker started.
"At its core, the operating system is a database application, rather than the other way around" he claims.
An article co-authored by Stonebraker and Zaharia among others explains:
All operating system state should be uniformly represented as database tables, and operations on this state should be performed via queries from otherwise stateless tasks. This design facilitates OS scalability and evolution without having to refactor the entire system, inspect and correct system state, update components without downtime, manage decisions using machine learning, and implement sophisticated security features.
Whether successful or not, the OS-as-database application idea is unlikely to be Stonebraker's last. After turning 80 in October, he tells The Register he has no plans to slow down.