- The SAP HANA database is often described as a columnar database, but what is this?
- We fact-check whether SAP statements the SAP HANA database being lower maintenance than what SAP HANA replaces.
SAP has given a strange storyline to its HANA architecture and database, and as a consequence, there are a great many misunderstood topics on the SAP HANA database. One of the greatest is that SAP HANA database is somehow not a relational database.
See our references for this article and related articles at this link.
To explain why this is not the case, we will get into the HANA architecture. That topic is what this article will focus on.
HANA Architecture Changes in Hardware, Changes in Query Capability and Changes in Table Structures with the SAP HANA Database
The changes to the data storage structure are that data can be stored in columns rather than as “rows.” This is a primary component to the HANA architecture. This is often stated in the literature on the topic, but it is not normally clear why this is the case. In fact, at first glance, it appears that it should be the opposite. A columnar database still has rows, but each row is only one field. It is perhaps more accurate to say that the data can be stored in columns, that would look a bit like this.
Tom, Jason, Julie, Stan, Jack, Jack, Blake, Fred, Rebecca, etc..
The Name Table
There are several flaws in the conventional explanation of columnar databases, which explain why columnar databases are still considered mysterious to many people that consume the messaging on this topic. It is also why often, the hype appears to overshadow the understanding of this subject.
Several of these flaws have been described up to this point. But a third flaw in the standard explanation is that columnar databases are not the opposite of or set to replace relationally (or should I say row based) databases. A columnar database is as relational as a row based database. Because a columnar database still has tables pointing to other tables.
Therefore it is not like a Big Data database like NoSQL.
Database Maintenance Issues
Relational databases can take a lot of maintenance, which is part of the role of a database administrator. Tables often have custom columns added to them, which makes each table less efficient for a variety of tasks.
Using Specialized Structures Build for Analytics
The disadvantage of standard relational databases for analysis has been known since analytical applications have been in use, and have been addressed at various points in the history of database design with data cubes.
Data cubes pre-build relationships. And which are run from specialized or not standard relational databases, as well as star schemas, which allow queries to be performed on a dedicated data structure where a predefined combination of tables is used within a standard relational database.
The Use of Data Cubes
Data cubes have been utilized for a long time to speed the ability to retrieve data.
Data cubes do have more overhead, and HANA architecture or a column based on memory database generally, can have many advantages over cubes, in addition to requiring much less IT overhead. Reports development is a major bottleneck in most companies. But the HANA architecture has had problems making the database dual mode, which is as good at other types of database processing other than analytics processing for which column oriented designs are optimized.
Once queries can be changed, this means how that how data is stored needs to change to leverage this shift.
HANA Architecture and Reducing Unruly Table Growth
Another benefit of columnar databases is that they do not lead to the unruly growth of tables. As was previously noted, many analytical tables have 100 or even 200 columns. This is because new attributes are added to existing tables over time. As each column is added, queries slow. The more columns that are added, the more time it takes the database to find any one field as it must scan the entire row. Therefore table growth does directly lead to slower database queries. This is why the columnar database scales so much better than the row based design. Each new column is added to its table.
However, it should be understood that this does increase the number of tables in the database and the relationships that have to be created. Although because of less redundancy, the total amount of data that is stored is lower.
For instance, if we take the example of color, if a color is not recorded at one point. And then recorded and added to the database, rows may often increase in number as what was once aggregated to a universal color is now broken into multiple colors that apply to that record.
This much depends upon the details, but several scenarios are possible.
Other Areas of Maintenance
As tables grow in columns, they require more maintenance. Row-based or what is known as relational databases do require more indexes and do have other overhead. Some of this overhead is reduced with column based databases. However, different types of maintenance increase with column based databases.
Secondly, the column-based database has far less history on which to base support claims. This is because there are so few column based databases in use. But maintenance is not necessarily lower simply because of the HANA architecture. There are other factors at play.
- Column-based databases also have far fewer people trained in how to work with them, so the similar metaphors would apply to purchase a car that very few people own.
- Few mechanics who work on that car will mean higher cost and lower availability of the required skills.
- Overall, much more data is required before anything substantial can be said about the maintenance costs of column-based databases. And this is not the only question.
What is the Accuracy of SAP’s Communicated HANA Architecture?
When HANA was introduced and for several years after SAP gave a distinct impression that the HANA architecture was 100% column-based, however, after several years, it was communicated that many of HANA’s tables were row-oriented.
Columnar databases are relational databases, as are row-based databases. A better explanation would be to call databases that store mixed data in single tables that are not organized as non-relational, but even that can be a confusing topic.
Relationships still exist with a columnar database, and in fact, there are more of them because there are now more “tables” that are each table being a column.
The SAP HANA is most definitely a relational database design.
SAP’s statements regarding lower maintenance on the SAP HANA database are most undoubtedly unproven, and while SAP can point to some technical support improvements, it is mostly guesswork. The SAP HANA architecture, as with any columnar databases, has some areas that are lower in maintenance than traditional databases. But other areas are higher in maintenance.
Secondly, SAP HANA is quite new and has much less history behind it than other databases likes Oracle 12C or other alternatives. Therefore it would be very premature to conclude that the SAP HANA database is lower in maintenance than alternatives.