The Evolution of Databases

iStock_000016066249XLargeOne of the beauties of a long career is that you get to see how things in your sector evolve over time. Being in IT is no different other than the phenomenal rate of change.

Nowhere is this more evident than in databases. When I started out in the late eighties I was a COBOL programmer working on hierarchical databases and specifically IDMSX on ICL’s VME platform. These databases worked on the basis of linked lists and you had to “walk” through a whole chain in order to get to the record you wanted.

Hierarchical databases were straightforward to learn and I fondly remember the ICL training manual referred to as “cows and bulls” as that is what it used as examples to describe the concept.

Over time hierarchical databases got a reputation for being poor performing and inflexible and so were superseded by relational databases and I moved on to being a SQL Server DBA.

Relational databases brought a new way of thinking about how data was represented and stored. Gone were the linked lists and in their place came tables of data much like in a spreadsheet. Multiple tables could be “joined” together to give a subset of the data often visualised as a Venn diagram.

During my time I have worked with Microsoft’s SQL Server, Oracle, Firebird and latterly MySQL. All work in broadly similar ways and have their strengths and weaknesses but have proven to be a highly flexible way of storing and accessing data.

Now I am about to embark on a new side project and one that should it become successful would require massive amounts of data to be held. Logically, in the relational world, this would be in a single table.

Thinking about how a single table containing hundreds of millions of rows might perform led me to the latest iteration of databases – NoSQL.

There is an initial problem with NoSQL in that there is no accepted definition of what constitutes a NoSQL database, so I am using the definition set forth by Martin Fowler in that they:

  • don’t use the relational data model, and thus don’t use the SQL language
  • tend to be designed to run on a cluster
  • tend to be Open Source
  • don’t have a fixed schema, allowing you to store any data in any record

It is the fluidity compared to a very fixed structure of a relational database that is the appeal for many that are using NoSQL, myself included. That coupled with the ability to scale really large sets of data makes it an invaluable tool.

However, I don’t believe that this is the end of the relational database as there are still too many use cases that lend themselves to that structure and design for it to simply disappear. So that leaves the question as to whether there still a place for hierarchical databases today? I think that with a very particular use case there very well may be.

A few years ago I toyed with a project which was to be called the Golden Thread. The basic idea being that you create lists that can be shared and are very social. It quickly became clear that the best way to implement this would be as a linked list, something that is possible in a relational database but isn’t easy.

There doesn’t seem to be a modern hierarchical database to match the ease of, say, MySQL or Mongodb, but I am pretty certain that if there were I would be able to dust off my copy of “cows and bulls” and my life in IT would finally have come a full cirle.