The Evolution of Data Management Systems

Over time, Data has been the centre of almost everything we experience and consume.

Data is being considered the new currency, oxygen, oil, etc., for our modern, consumer-driven world.

In our data-centric world, the very fabric of data storage, processing, and retrieval has undergone a profound transformation. These changes have not only birthed new types of users and use cases but have also spurred the development of new technologies.

Today, we find ourselves in a world where data is king, let’s delve into how data systems evolved.

Don’t worry; I will not go back to the 19th century; somewhat, I will limit myself to 1960(s) onwards.

Hierarchical Database & model (1960 – 1970)

A diagram of a family

Banking systems around 1960 -1970(s) primarily used Hierarchical data systems, which were also used in mainframe systems to manage data for IMS Information management systems.

Inventory management also uses this data model to manage product categories, subcategories, SKU management, etc.

This kind of data model organizes data in a tree/hierarchical structure, establishing parent-child relationships (esp. One-to-Many relationships)

This data model works with structured data & top-down approach adopted for information retrieval.

Ideally, this kind of model works with complex data structures, especially, and helps maintain and retrieve information with hierarchical relationships, such as Family structures, Organizational structures, product categories, etc.

During the pivotal years of 1960s-1970s, IBM IMS (Information Management Systems) stood as the primary player in this field, shaping the landscape of data management.

Even today, a hierarchical data model is used for information retrieval standards like XML, HTML, etc., to work with web data and exchange data across systems.

Network Database & model.

A diagram of product categories

Around the same time, in the 1960s- 1970s, the Network Database came into the picture as an extension of the hierarchical model discussed above. Charles Bachman developed a Navigational database/ Network data model while working with General Electric (The division later became Honeywell Information Systems). GE popularized the Network database as IDS (Integrated Data Store)

The thought process for the Network data model was similar to that of the Hierarchical model, but it had a different philosophy. This data model focused on entity-relationship.

Network models could deal with more complex data structures.

Unlike the Hierarchical model where the structure converges to a single root, the Network model allows for multiple parents for a single object, thereby showcasing relationships among different entities in a business process.

(It supported Many-to-Many Relationships.) Pointers and links were used to indicate relationships, and it was more suited for complex data relationships and scientific applications.

Airline Management Systems & Telecommunications primarily adopted this data system during the 70s.

Though it was blessed by the CODASYL group (Committee on Data Systems Language), IBM did not accept IDS as a replacement for their product IMS and did not accept the Network data model for extending the capability of IMS as the concept differed, even though IDS was IBM Mainframe compatible.

Eventually, around the 1970s and early 1980s, with the advent of Relational Databases, the Network database system lost its charm.

Relational database & model:

British computer Scientist Edgar F. Codd, who worked with IBM in the 1960s, worked heavily with data systems using Hierarchical and network data systems.

He always had a concern about both data models mentioned above, as they offered little to no data searching capability and were not query-supportive. He started thinking of a novel or different method to store and process data with reliability and easy-to-learn language instead of training heavy query language.

In his 1970 paper, “A Relational Model of Data for Large Shared Data Banks,” he expressed his idea on how a relational model can be established across different entities in the business process (It follows Entity Relation and follows the ACID approach for reliability, deals with structured data, and mainly has a query language associated with it).

The working philosophy differed from its predecessors (Hierarchical & Network databases).

Conceptually, the Relational model, with its basic features, is a practical and efficient way to organize data (not exhaustive).

Rows & Columns organize data in tabular format about a business process ( say, Order transaction).

Columns represent an attribute/feature of a business process (Order - Order ID, Order Type, Order Line Item, Product ID…etc.)

Rows represent a record with some uniqueness at some level of granularity. (Order ID or Line-item ID, Product ID, Order system key, etc., or combination defines, or transaction key defines unique row. (uniqueness depends on the context of the reporting (I will not go deep into this)

A key identifies uniqueness & level, associated values & associated attributes in a single record.

A particular data subject can connect to another data object/table through a foreign key.

While not exhaustive, this explanation provides a comprehensive understanding of how the relational model works.

This model is still relevant & active as a database from multiple players for structured information management.

IBM's eventual acceptance of the relational database model, despite initial reluctance due to concerns about its existing product IBM IMS, was a significant turning point. This acceptance, before the launch of their first relational database (SQL DB) in 1981, came after Oracle, a pioneer in the field, had already launched Oracle DB in 1979, based on Mr. Codd's publication.

They say to value your employees’ opinions as if you don’t, someone else will 😊.

RDBMS (Relational Database Management systems) remain a flexible solution for every organization, adapting to the use-case requirement for speed & data type/ format.

Based on our data storage and retrieval needs, we still use Oracle DB, IBM DB2, MySQL (Open source), Microsoft SQL Server, PostgreSQL, etc.

From 1980 onwards, RDBMS replaced almost 100% of hierarchical and network database usage.

With Peter Chan proposing the ER model in 1976, RDBMS became more robust and easier to design.

Around 1990, Data warehouses started picking up pace for improved business reporting and slicing and dicing. Database companies started packaging data warehouse solutions. Multiple players entered Business Intelligence tools and data integration services. Teradata took control of the market post 2000s, implementing concepts like SNA (Shared Nothing architecture & MPP (Massively Parallel Processing).

Our focus on data warehousing and ETL increased around the Mid-90s when Ralph Kimball and Bill Inmon focused on the same area. This gave RDBMS players more motivation to improve their architecture.

ETL tools included Ab-Initio, Informatica, IBM Data Stage, etc, BI tools such as SAP Business Objects, Oracle Discoverer, etc. popularized during 2000s.

Visualization tools like Tableau came around the late 2000s.

The industry focused on business process-based data modelling (Dimensional modelling) for OLAP systems instead of a transaction-based approach (OLTP—Online Transactional Processing).

With a focus on OLAP (Online Analytical Process), Multidimensional databases (cube-based reporting) for OLAP have not just become popular but have also proven their enduring value in some legacy divisions in many organizations.

During this time, the industry focused on business Leadership for accurate decision-making rather than data consumerism and explosion.

Object Oriented Database:

Object-oriented programming gave rise to Object-oriented databases.

It's not so popular in driving business processes & reporting & not as easy as working with RDBMS. Instead of tables, it works with objects, instances & classes.

While it's not widely used, it's used in some use cases, such as Engineering designs (CAD), Molecular Biology, Spatial Engineering, etc.

It works well with Programming languages such as Java, Java Script, Delphi, Python, C, C#, C++, Visual Basic (VB), .NET, etc.

It's still used in niche projects, and it's not new as a concept either. In fact, research on the object-oriented model started in the 1970s in universities, Labs (Bell Labs), and companies like HPE, Microelectronics, and Computer Technology Corporation (closed now since 2004), which worked on this data system.

Some examples are ObjectDB, db4o, IBM Informix, etc.

No-SQL Database

With the advent of the Internet, social media, and photo-sharing apps from around 2000 onwards, the data world started shifting towards a new focus.

Initially, until the 1980s, data and computing were in the hands of mathematicians and scientists (no consumer as such). From the 1980s to the 2000s, data and computing were matters between technology companies (no direct consumerism involved); they were mostly B2B technology sellers and buyers.

From 2000 onwards, with the millennium, we saw a rise in data consumers. The public was generating and using data, and new technology-based toys such as smartphones and many other smart things started coming into the market.

Hardware challenges started diminishing in areas such as performance, cost & scale.

Companies started realizing the value of collecting & processing different kinds of data (Not just structured reporting).

Data consumers also began to demand speed in their data consumption, further highlighting the need for efficient data management systems.

All these factors converged, highlighting the need for a different kind of data management system than the traditional RDBMS.

This new system had to be capable of handling unstructured and semi-structured information such as Photographs, Messages, emails, and systems supporting file in csv, txt, JSON etc. format.

This new system had to handle unstructured and semi-structured information such as Photograms, Messages, emails, and systems-supporting files in CSV, text, JSON, etc.

Enter the big data ecosystem, with its unique NO-SQL (Not Only SQL) databases, designed to efficiently handle this data deluge with their distinct features.

It works differently than RDBMS & does not follow ACID properties.

Broadly, NO-SQL data system has different flavours & is used based on specific use cases.

Key Value Pair: Amazon DynamoDB, Redis, Couchbase, Memcached, etc.

Column Oriented: Cassandra, HBase, etc.

Document Oriented: Amazon DocumentDB, Cosmos DB, MongoDB,

Graph-Based: Amazon Neptune, Neo4j, ArangoDB, MemGraph etc.

Time Series: Druid, CrateDB, InfluxDB, Prometheus etc.

While I could delve deeper into this topic, it's clear that a concentrated discussion on each type of NO-SQL database is necessary to fully appreciate the merits of these technological advancements.

My time is justified if it helps readers walk down memory lane and see how far we have come.