AWS on How to Leverage Multiple Database Types - Brightwork Research & Analysis

Table of Contents: Select a Link to be Taken to That Section

Executive Summary

AWS and Google Cloud are offering great opportunities to leverage multiple database types.
Learn why multi-bases are taking over from the single-base approach.

Introduction

While SAP has proposed using a single database type (the one database type SAP wants customers to focus on, which is HANA) and finds its most willing customers in those that know the least about databases, AWS offers their customers the choice of leveraging many different database types.

An excellent article by Werner Vogels, who is the CTO of AWS, explains this.

“A common question that I get is why do we offer so many database products? The answer for me is simple: Developers want their applications to be well architected and scale effectively. To do this, they need to be able to use multiple databases and data models within the same application.”

Notice the last part of this paragraph, where Werner Vogels describes using..

“multiple databases and data models within the same application.”

Wait, let us back up. What was that?

We all know that applications have a single database, right? How does a single application use multiple databases?

What is Werner talking about?

It turns out Werner is describing software development that is different from the monolithic environment, for which most of those reading this book are most likely most familiar.

Werner goes on to say..

“Developers are now building highly distributed applications using a multitude of purpose-built databases.”

That is the application that we think of is one way of developing, but this gives way to distributed applications that can access multiple databases.

It is an unusual way of thinking about applications. Particularly for those of us who came up under the monolithic model (most of the people who work in IT). It means rethinking much of what has been part of the standard model in IT for many years.

The Limitations of the Relational Database

Werner goes on to describe the limitations of the relational database.

“For dec,ades, because the only database choice was a relational database, no matter the shape or function of the data in the application, the data was modeled as relational. Is a relational database purpose-built for a denormalized schema and to enforce referential integrity in the database? Absolutely, but the key point here is that not all application data models or use cases match the relational model.”

We have seen in the rapid growth of databases like MongoDB and Hadoop that specialize in either unstructured data or data with lower normalization levels. However, this brings up the question of why these different database types have begun to proliferate.

The proliferation is in part covered by Werner when he describes how Amazon ran into the limitations of using the relational database.

“We found that about 70 percent of our operations were key-value lookups, where only a primary key was used, and a single row would be returned. With no need for referential integrity and transactions, we realized these access patterns could be better served by a different type of database (emphasis added). This ultimately led to DynamoDB, a nonrelational database service built to scale out beyond the limits of relational databases.”

The Speed of Database Growth

Let us consider that AWS has a very fast-growing relational database service in RDS. However, they also have fast-growing non-relational databases like DynamoDB, as well as many other database types. AWS is offering a menu of databases, each targeted towards specific applications. So how many database types are there? Werner has an answer for this as well.

The Different Database Types According to Werner
Below we have provided a synopsis of the different database types, their intended usage, and the database that Werner reflects them.

Relational: Web and Mobile Applications, Enterprise Applications, Online Gaming (e.g., MySQL)
Key-Value: Gaming, Ad Tech, IoT (DynamoDB)
Document: When data is to be presented as a JSON document (MongoDB)
Graph: For applications that work with highly connected datasets (Amazon Neptune)
In Memory: Financial Services, Ecommerce, Web, Mobile Applications (Elasticache)
Search: Real-time visualizations and analytics generated by indexing, aggregating, and searching semi-structured logs and metrics. (Elastisearch Service)

Werner sees applications leveraging multiple database types. But SAP and Oracle do not see it that way. And as pointed out by AWS’s documentation, databases are often selected based upon factors that are not part of a thorough examination of the requirements.

“Although a workload’s database approach (RDBMS, NoSQL) has significant impact on performance efficiency, it is often an area that is chosen according to organizational defaults rather than through a data-driven approach. As with storage, it is critical to consider the access patterns of your workload, and also to consider if other non-database solutions could solve the problem more efficiently (such as a using a search engine or data warehouse).”

This means that IT departments will need to spend more time matching the database processing type to the database.

The Multi-Application Nature of Solutions Distributed by AWS

The multi-application nature of solutions is explained as follows by Werner.

“Though to a customer, the Expedia website looks like a single application, behind the scenes Expedia.com is composed of many components, each with a specific function. By breaking an application such as Expedia.com into multiple components that have specific jobs (such as microservices, containers, and AWS Lambda functions), developers can be more productive by increasing scale and performance, reducing operations, increasing deployment agility, and enabling different components to evolve independently. When building applications, developers can pair each use case with the database that best suits the need.”

However, what are packaged solutions offering? Monolithic applications are the exact opposite of this.

SAP and Monolithic Applications and Database

SAP is a perfect example of a monolithic application provider. SAP wants customers to use a single database. Furthermore, they want customers to use “their” single database as in HANA, which according to SAP, can do all the processing and all the different database types described by Werner above.

When SAP talks to their customers, they propose that they have the best of everything. The best applications. The best coding language (ABAP), the best middleware (PO/PI/SAP Cloud Platform Integration). Of course, the world’s greatest database in HANA can outperform any database in any database processing.

It’s a great story.

The only problem with the story is that HANA can’t perform anywhere near where SAP says it can. And not only is HANA not a universal database that performs all types of processing the best, but no database also works the way described by SAP.

The AWS Customers Using Multibase Offerings

AWS states the following customers are using these various database types in conjunction with one another to support one overall application.

Airbnb: DynamoDB, ElastiCache, MySQL
Capital One: RDS, Redshift, DynamoDB
Expedia: Aurora, Redshift, ElastiCache, Aurora MySQL
Zynga: DynamoDB, ElastiCache, Aurora
Johnson and Johnson: RDS, DynamoDB, Redshift

With monolithic applications supported by (usually RDBSM) a single database, the traditional design is called the monolithic database approach or architecture, which is a particular approach to software development.

Microservices

A microservice design breaks up the codebase into microservices, each with its own database. This is how multi-base development is working in many companies.

Werner clearly sees this as a multi-base strategy with a strong future. He goes on to say.

“Purpose-built databases for key-value, document, graph, in-memory, and search uses cases can help you optimize for functionality, performance, and scale and—more importantly—your customers’ experience. Build on.”

Now the question becomes,

“Where do SAP and Oracle stand on leveraging the multibase approach?”

It turns out there is a problem for SAP and Oracle in this regard.

The Problem that SAP and Oracle Cloud Face Leveraging a Multibase Approach

Something conveniently left out of SAP, and Oracle’s cloud descriptions are primarily focused on SAP and Oracle products. With SAP and Oracle (and this is a critical distinction), the cloud is viewed as merely a pathway to lead to SAP and Oracle’s products. SAP allows you to bring up services on AWS, Google Cloud, or Azure, but they support SAP products. Both SAP and Oracle dabble in connecting to non-SAP and non-Oracle, but only to co-opt an area so they can access markets.

AWS and Google Cloud are quite different. Notice the variety of databases available at Google Cloud.

There are over 94 databases out at Google Cloud and far more out at AWS. These databases can be brought up and tested very quickly. Selecting one of the databases brings up the configuration screen. It is a fantastic thing. The number of database and database type services is continually increasing with AWS and Google Cloud. AWS and Google Cloud do not demonstrate a pattern of redirecting customers to use their internally developed products instead of open source products.

After this is launched, one can bring up a different database type (say NoSQL or Graphic) and immediately begin testing. Under the on-premises model, this would not be possible. Instead of testing, the company would go through a sales process, and a commitment would be made, often from inaccurate information provided by sales reps. The customer would be stuck with (and feel the need to defend) whatever purchase had been made.

Leveraging the Multibase Design and The Pendulum Swinging from Packaged Software

We have entered a period of multi-base capabilities, and AWS and Google Cloud are the leaders in offering these options. This is transforming how databases are utilized. Moreover, the more open source databases are accessed, the worse commercial databases look by contrast. Furthermore, the use of multi-base databases dovetails with a rise of custom development that leverages different databases and different languages, all to support one application. While it may look like one application to the user, it is multiple applications or microservices.

Packaged solutions ruled the day for decades. Each of these packaged solutions was built around a single database, normally the RDBMS, which meant getting everything from one database type.
After the 1980s, custom-coded solutions were for “losers.” (This is no joke, those that did not purchase packaged solutions were subjected to critiques.) Many companies bought ERP systems because financially biased parties told them that anyone with a brain needed an ERP system.
According to SAP, all systems that were not SAP were classified as “legacy,” as we covered in the article How SAP Misused the Term Legacy. And all legacy systems were to be replaced by “fantastic ERP” systems that would make your dreams come true.

And who agreed to all of this?

Migrating from RDBMS to NoSQL with DynamoDB

Earlier in the book, we covered the migration of RDBMS to RDBMS. However, it is now understood that the RDBMS was selected in many cases when it was not the best database type.

This is explained in the following quotation.

“Even though RDBMS have provided database users with the best mix of simplicity, robustness, flexibility, performance, scalability, and compatibility, their performance in each of these areas is not necessarily better than that of an alternate solution pursuing one of these benefits in isolation.(emphasis added)

For an increasing number of applications, one of these benefits is becoming more and more critical; and while still considered a niche, it is rapidly becoming mainstream, so much so that for an increasing number of database users this requirement is beginning to eclipse others in importance. That benefit is scalability. As more and more applications are launched in environments that have massive workloads, such as web services, their scalability requirements can, first of all, change very quickly and, secondly, grow very large. The first scenario can be difficult to manage if you have a relational database sitting on a single in-house server. For example, if your load triples overnight, how quickly can you upgrade your hardware? The second scenario can be too difficult to manage with a relational database in general.”

Which Databases are Growing

This means that some of the migrations from RDBMS should be to other database types, and in many cases, to NoSQL and others. This could be considered a re-platforming of the database.

Six of the top ten databases are RDBMSs (Oracle, MySQL, SQL Server, PostgreSQL, DB2, Access). They average a year-on-year change in popularity as measured by DB Engines of -43%. And that is propped up by PostgreSQL with a +46% year over year change. And the total negative year over year change is far more negative than this.
If you take the top 20 RDBMSs, taken as a total group, they are significantly negative year over year growth.
Four of the top ten databases are non-RDBMSs (MongoDB, Redis, Elastic search, Cassandra). They have a year over year change of +19%.

Changes are afoot in the database market. The only RDBMSs with any significant growth is open source RDBMSs like PostgreSQL and MariaDB. And non-RDBMSs, which are primarily open-source, is growing faster than the RDBMS category. This means that increasingly RDBMSs are being broken apart and re-platformed, with some of that data moving to non-RDBMSs.

NoSQL/Key-Value Databases

A NoSQL/key-value store database like DynamoDB does not set up its tables the same was as an RDBMS. A key-value database stores data with a simple key value.

Key-value databases function by assigning a unique ID or key to each combination. Notice in the graphic above, each ID number applies to a different combination of values to the right.

Key-value databases look very odd to people familiar with the relational model. Interestingly, there is a simple allegory to help explain the benefits of the key-value database.

Like a lot of people, I use Evernote to organize articles and notes. The entries are organized for the articles, and it is relational in that a tag can be applied to each article. This particular article belongs to both the Safety Stock tag and the Statistics tag.

However, one of the most effective note-taking methods is described in the book How to Take Smart Notes, which has associated software that supports the method. Notice that the software looks remarkably similar to a key-value database.

In this software called Archive, each note is only designated by a number, which is in a sequence. However, a key-value data management system like this can be integrated with a relational note system by introducing hyperlinks. Notice I have a hyperlink out to an Evernote shareable link. This demonstrates in a simple way how a relational system (although not a relational database) can be used in conjunction with a serialized system.

A key-value database allows one to store complex data in a single file.
Two perfect use cases for the key-value database are to retrieve the comments by user for a certain post, or the browsing history for a specific cookie for a specific site which are examples of serialization data, or to store the state of application..

However, the disadvantage of the key-value store is that the schema is undefined. This means that SQL can’t be used to query the data, and you need to write code to do that. This is a problem on every project that uses key-value databases because reports require big efforts. However, for reports, one can use an additional database with SQL capabilities.

“Because DynamoDB does not support a standard query language like SQL, and because there is no concept of a table join, constructing ad-hoc queries is not as efficient as it is with RDBMS. Running such queries with DynamoDB is possible, but requires the use of Amazon EMR and Hive.”

DynamoDB offers flexible data modeling, higher performance, and scalability, along with operational simplicity.

NoSQL differs from SQL in advantages and disadvantages. As explained by Rick Houlihan, “all databases contain relationships. It just depends upon the type of relationships.” Furthermore, NoSQL is a misleading term, as NoSQL databases often use SQL. The preferred description of NoSQL is now the elongated “Not Only SQL.”

DynamoDB

DynamoDB is a “serverless” database that does not use a static schema or a static connection between the tables with related fields. DynamoDB scales better than an RDBMS; it scales far better horizontally (so to multiple nodes in different availability zones) and is very good at handling a large number of concurrent requests, with mobile applications being the perfect use case.

The major disadvantage being the joins of columns from two separate tables.

DynamoDB is perfect for low response times, proposed by AWS as being single-digit millisecond responses.

DynamoDB is shown alongside Lambda (in the middle) as DynamoDB is “serverless.”

AWS considers the migration from RDBMS to DynamoDB to have the following steps.

Migrate tables being used by workloads involving non-relational data make excellent choices for migration to DynamoDB.
Setup the backup process for both the RDBMS and DynamoDB (potentially running the databases in parallel for a time).
Understand the composition of the source data.
Understand the cost and performance of running a workload on DynamoDB.
Denormalization of the RDBMS data.
Testing (Functional, Non-Functional, User Acceptance)

Notice how much the schema changes from being stored in the RDBMS to DynamoDB.

So many options are becoming available. The re-platforming of databases brings up all manner of discussion and testing points. For example, SAP and Oracle have proposed using in-memory database designs where the RDBMS contains a column store. However, there are so many options on AWS that can beat this design in performance and cost, and flexibility. One is the use of Elasticache. Another is the use of Redis (an in-memory store with a column-oriented design). Elasticache can be used with anything (database or not database) to speed performance, and Elasticache can be used along with Redis (in fact, Elasticache can speed just about anything).

The options for accessing different components and then combining them in beneficial ways currently are like never before. And we owe this to the cloud service providers as well as to the open-source communities.

Who knows when these developments would have taken place if they had been left to the on-premises vendors?

Migration to MongoDB

And the benefit of migrating SQL to NoSQL applies to other popular NoSQL databases like MongoDB is a document-oriented database.

“When I joined MongoDB, about 5 per cent of all projects were relational migrations – now it’s 30 per cent as companies look to transform. Cost can be a factor, but more often it’s development speed and running at scale. It’s not unusual to see developer productivity up 3 to 5x after switching [from an RDBMS], coupling MongoDB with a shift to cloud, microservices, and agile/devops.” – Mat Keep, Director of Product Marketing