Why SAP HANA Is a Fast Database For Analytics

Executive Summary

  • We explain what the HANA database is at the most basic level.
  • We then explain the primary beneficial usage of HANA and then its limitations.

Video Introduction: Why SAP HANA Is a Fast Database For Analytics

Text Introduction (Skip if You Watched the Video)

SAP has made many exaggerated proposals around HANA’s performance. But in many discussions, it has come across that the majority of people that discuss HANA do not appear to understand the logic of the performance claims or whether or not or the degree to which they are true. SAP asserts entirely proprietary and unique technology in HANA, which is another claim which must be checked for accuracy. Furthermore, rarely, people who cover HANA or publish material on HANA do not have some pro-SAP financial bias. You will learn the truth about HANA’s capabilities from a research and not a marketing or sales perspective.

Our References for This Article

If you want to see our references for this article and related Brightwork articles, see this link.

Notice of Lack of Financial Bias: We have no financial ties to SAP or any other entity mentioned in this article.

  • This is published by a research entity, not some lowbrow entity that is part of the SAP ecosystem. 
  • Second, no one paid for this article to be written, and it is not pretending to inform you while being rigged to sell you software or consulting services. Unlike nearly every other article you will find from Google on this topic, it has had no input from any company's marketing or sales department. As you are reading this article, consider how rare this is. The vast majority of information on the Internet on SAP is provided by SAP, which is filled with false claims and sleazy consulting companies and SAP consultants who will tell any lie for personal benefit. Furthermore, SAP pays off all IT analysts -- who have the same concern for accuracy as SAP. Not one of these entities will disclose their pro-SAP financial bias to their readers. 

Understanding SAP’s Marketing Around HANA

As I explained in this previous article titled Has SAP’s Relentless HANA Push Paid Off?, I brought up the fact that SAP has redirected its marketing efforts to focus to a very significant degree on SAP HANA. However…

  • What is SAP HANA
  • What are the actual SAP HANA technology underpinnings of SAP HANA?
  • That is what makes SAP HANA go so fast?

In this article, you will learn what SAP HANA is. This will be explained in a way that should be accessible to anyone from executives to users to anyone interested in understanding the parts about HANA that is real.

What is SAP HANA Database at the Most Basic Level (What is SAP HANA)

SAP HANA is the name SAP uses to describe its offering that combines software and hardware for enhancing speed. Principally based upon leveraging hardware performance improvements and cost reductions in random access memory and solid-state memory, and then changing data management software that interacts with this hardware.

SAP HANA must be simplified for decision-makers to move beyond the marketing hype and simplistic platitudes in their determinations of how and when to use HANA.

The first thing to understand is that SAP HANA is simply the branding of some technologies. These are not technologies on which SAP has a monopoly. The two key technologies are the following:

  1. Storing a Databases in Memory (in RAM or SSD) Versus on Disk
  2. The Columnar Database

I should also point out that SAP is a relative newcomer to databases — they have had some small database projects like SAPDB/MaxDB, and they made a quite large acquisition of Sybase back in 2010. However, for almost all of SAP’s history, their software has resided on other software vendors’ databases. This is a long way of me saying that there is not a lot of databases that SAP knows that other software vendors do not also know. SAP is simply the most important and aggressive marketer of this approach.

SAP HANA Database Overview: Improvements to Storage Hardware and the Corresponding Changes to HANA Database Queries

As is explained quite nicely by Professor Sam Madden at MIT, data queries change when a database is moved from spinning hard disks to either random access memory or solid-state memory.

How Data Access Works

  • When a spinning disk is used, even if a query requires only three fields within a 10-field table, the software must read all ten fields.
  • If the table is 1,000,000 records and 100 megabytes, then the entire table must be read to complete the query.

And in fact, this is not even the worst part. This is because questions often involve multiple tables.

A query that pulls three fields but distributed in 3 tables must read every one of those tables to completion.

This is a function of how the data is accessed on a spinning disk.

However, this is not the case when data is stored in random access memory or solid-state memory. Here, a query that uses only three fields is only required to read three fields, regardless of the number of fields in that table.

The Elimination of Swapping

Traditional database systems stored on a disk spend lots of time in an activity known as swapping. Swapping is where data is read into memory from the disk. It is processed, written back to disk, purged, and a new cycle is begun.

In memory, databases remove this swapping because all of the manipulated data is loaded into storage. As previously stated, disks within a HANA implementation are not used for primary processing but offline data backup.

SAP HANA is often justified by performance. It is essential to consider that performance does not correlate directly with business benefits. SAP wants companies to make this error as it puts them in the driver’s seat.

SAP is not interested in the sticky questions related to SAP HANA’s actual business benefits because then the story begins to be much less impressive.

Why is Compression So Effective in HANA Database?

Column-based databases like an SAP HANA database have a significant advantage when it comes to data compression. This is because once placed into columns. Files can be compressed very easily.

This is because there are often so many duplicates in any particular column.

An excellent example of this is a column that contains the color of a product.

  • If the column has 1000 records…
  • And five colors are possible…
  • Then, of course, most of the fields are duplicates…
  • so 200 white, 300 blue, 250 red, etc.…

This means less data redundancy to begin, in addition to the compression — which comes from having fewer unique combinations.

The Importance of the Commonality of the Data Type

This compression is possible because all of the data in a table/column is the same data type. And in many cases, the compression is very significant. It is common to be able to compress columnar databases in the 80% and up range. This process of moving from a standard “row-oriented” relational database to a columnar (every table is one column of data) is called horizontal partitioning.

This is because the regular relational table is broken into columns or partitioned.

SAP’s Proposed Reasons for Its Position

SAP has presented HANA as so unique that customers should be prepared only to run the new ERP system — S/4HANA on HANA, and that S/4HANA will not ever be ported to any other database platform. As the vast majority of SAP ERP systems currently run on Oracle, that is quite a bold stance.

SAP proposes that the reason for taking this position is the following:

  • No other database vendor can match SAP’s performance.
  • SAP has moved code — called stored procedures — into the HANA database, and by implication, the application is now tied to the database.
  • SAP has been more obscure about other applications outside of ERP. But SAP desired state is only to replace other databases with HANA so that its applications and databases are tied at the hip.

What is HANA, and Is the Technology Unique to SAP?

What is HANA? Well, HANA is a columnar-oriented database. But what SAP does not explain is that all the other major database vendors have some similar design available. Columnar databases have been in existence since the 1970s. What changed was that the price and capacity of random access memory or solid-state memory had improved so significantly that the database design can now be better leveraged. Interestingly with Oracle 12c, a competitive database can not only match HANA in analytics, but it can switch between row based and column based tables and change for the same table.

Now that SAP has a competitive product with Oracle’s database, the issue of SAP not issuing the certification to Oracle has arisen, something that is explained in the article, Which is Faster, HANA, or Oracle 12C?

Columnar databases have their advantages. However, they are not universal benefits. SAP HANA is presented by SAP as a universally advantageous combination of database design combined with faster memory/storage. The less one knows about databases, the more credible this seems.

What Improves with the Columnar Database?

While it is true that a columnar database is a radically different row-based table database, and the differences do not stop at the configuration of the tables – it also means the use of fewer indexes. This is a positive development for database cost because building indexes for analytical purposes is a source of overhead and rework for companies that use standard relational databases for analytics.

This means that by breaking the table into one file per column, the database stored in random access memory or solid-state memory can access only the fields that are part of the query.

Now queries that only pull two fields perform much faster than queries that pull four areas. And this example has been when a table contains ten fields or columns. However, tables much larger than this are very common.

It is not at all uncommon to find tables with 100 fields/columns. On average, a query will have between 3 to 5 fields/columns.

An Important Lesson on SAP HANA Benefits

As I explained in this previous article titled Has SAP’s Relentless HANA Push Paid Off?, I brought up the fact that SAP has redirected its marketing efforts to focus to a very significant degree on SAP HANA.

SAP HANA does not benefit every business function equally because not all activity functions are a bottleneck due to speed limitations, and SAP HANA does not work the same way regarding its speed for every application.

  • To understand the benefits of SAP HANA and its limitations, it is necessary to get into the computer science of what makes up SAP HANA.
  • A solution like SAP HANA allows businesses to do things that they could not do before. So some forecasting or projection is required to fully leverage SAP HANA because merely doing the same things one is currently doing faster is not where the opportunities with SAP HANA lay.

The Limitations of HANA’s Performance Even Within Analytics

We have been saying that HANA’s only beneficial area of performance is for analytics, which is called a read operation in database speak.

However, there is a level below this in detail. HANA’s primary beneficial area is for short SQL queries. An excellent example of a short SQL query would be a query for BW.

Long Versus Short SQL Queries

HANA’s performance degrades for longer queries.

An excellent example of a longer query is within ECC or S/4HANA. This is where the data is less prepared.

However, in SAP’s marketing material, they propose that HANA is excellent for reporting ERP systems. There is no evidence of this up to this point. The evidence points in the opposite direction quite strongly, as we cover in the article Why HANA is a Mismatch for S/4HANA and ERP.

Ironically, we have had many people tell us that once reports can be run from the ERP system, there will be no reason to have a centralized BI system. But the performance of HANA does not support this vision.

What Happened to SAP’s Row Store Performance?

The following quotation can be found in Oracle’s “Analysis of HANA HA” document.

“The SAP HANA database consists of two database engines:

The column-based store, storing relational data in columns, optimized for holding data mart tables with large amounts of data, which are aggregated and used in analytical operations.

The row-based store, storing relational data in rows.

This row store is optimized for write operations and  has a lower compression rate, and its query performance is much lower compared to the column-based store”

Not all row-based stores are created equal, as HANA’s ECC performance is worse for transactions than ECC on Oracle or IBM’s pre-column store databasesThis explained the performance differences between Oracle DB, DB2, SQL server, MaxDB, and Sybase ASE even though all are row based by default. 

One thing to remember is that HANA is still a relatively new database. When discussing this with a very experience database resource, they pointed out the following observation.

“That’s is no way a brand new row based DB can beat all these databases above which are optimised over so many years especially Oracle DB.”

Stupid Uses for SAP HANA

SAP is feverishly pitching HANA for all types of inappropriate and wasteful uses and uses, which have nothing to do with what it does well. I am getting information about SAP CRM being placed on HANA. For what reason?

Let us stop to think. Who would put CRM on HANA?

HANA For CRM?

This is just about one of the silliest uses of HANA. CRM systems have very low requirements. Honestly, one can barely justify even using a CRM system versus a spreadsheet. Why would one need such fast read access for a CRM system?

HANA for Big Data?

So SAP has HANA to offer in the Big Data space, but HANA can’t possibly be a good use for Big Data because it is not for unstructured data, and it is quite expensive, and it is priced per GB or TB. This topic is covered in the article The Secret to Not Talking About HANA Pricing. SAP argues that HANA can be connected to Hadoop using Vora, but why would Hadoop need HANA? And is it a Big Data database if it just connects to a real Big Data database that does all the heavy lifting?

On G2Crowd, SAP is not even listed on Big Data processing ratings, in which Hadoop, Cloudera, HortonWorks, and DataBricks are the top rated.

HANA for the Enterprise Data Warehouse?

None of this background has stopped SAP from proposing companies use HANA as the EDW. And when SAP suggests using HANA for the EDW, they also mean using an SAP data application to sit on top of HANA.

  • Expense: The first problem is that HANA is still so expensive that it is not financially feasible in most cases to push so much data inside of it. HANA’s high TCO is not merely related to its license cost but to its hardware requirements, its high implementation and maintenance overhead, difficulty finding experienced skills, and its immaturity.
  • 100% In memory for an EDW?: Much of the data in an EDW does not require the performance capabilities of an in-memory database. In memory databases only make sense for supporting databases with a high query volume, not data stores used to feed the queried database.
  • Migrating All Data to Column Oriented Tables for Storage?: Most data in companies is not stored in the column-oriented tables that are used in HANA. For this reason, moving data into a HANA “powered” Enterprise Data Warehouse would mean changing the data from its original storage structure into the HANA structures. This is an unnecessary and overcomplicated amount of work and overhead to maintain before one even gets to the question of what SAP product should be used. SAP BW is too high in cost to be a useful EDW tool. And Business Objects has been so starved of development by SAP that it is no longer an application that companies should be considered for investment due to competitiveness and support issues. And once one eliminates these two data applications, there is nothing else that SAP offers that could fill this role to interoperate with HANA, even if HANA were the right choice for being the EDW database.

The Problems with Finding Uses Cases for SAP HANA Technology

It is also well known that it is difficult to come up with use cases for HANA, which is a problem for closing a HANA sale. So once we get past the presentations about HANA’s capabilities, there are real customer interest and adoption issues.

This should be acknowledged when discussing the continuation of the HANA marketing strategy.

Who is Measuring the Benefits of SAP HANA Technology

HANA is slated to be the infrastructure of all SAP applications eventually. Let me first address one of the most common implementations of HANA.

This is porting SAP’s BI/BW onto HANA, which is considered one of the most straightforward implementations of HANA that can be performed. However, at companies where I have seen this accomplished, while the reports do run faster, the major bottleneck, which is the backlog of reports that have yet to be created, does not change.

Noting this is the difference between merely observing a performance improvement versus understanding the overall benefits of implementing technology.

SAP Applications on Top of SAP HANA

HANA is proposed in other cases, such as Simple Finance (where FI/CO is ported to HANA with a new user interface called Fiori). Being able to process financial transactions quickly has not been a constraint in ERP systems for many years. In this case, there is an extra complexity. This is because the front end of Finance is different than ECC. It does operate more efficiently than the SAPGUI. That is not HANA – that is, it is not the infrastructure change out that is the primary differentiating factor.

This is another common problem with HANA, the descriptions of what it improves often morphs into discussions of other new SAP products that are not, in fact, HANA.

Therefore a discussion that starts with SAP HANA technology ends up with a discussion on some other technology.

SAP HANA Warning

It is about as easy to get incorrect information on SAP HANA as to get it on Big Data. 

This issue with the excessive hyperbole on SAP HANA is a serious problem concerning understanding what it can do and its use. It was developed to help cut through the hyperbole on HANA and provide a basis for analyzing SAP HANA statements.

However, this is, of course, only one dimension of understanding SAP HANA. None of the consulting companies will touch this issue and have served primarily as sales arms of SAP since — well, since they started partnering with SAP. The vast majority of analysts either have a conflict of interest in bringing this up, don’t understand databases well enough to know what part of the SAP HANA story is real and what part is smoke. One perfect example of this inaccuracy that flows through the HANA explanations is the following:

“Relational databases typically use row-based data storage. However Column-based storage is more suitable for many business applications. SAP HANA supports both row-based and column-based storage, and is particularly optimized for column-based storage.” – SAP HANA Tutorial

SAP and its consulting network continue to present all other databases as “traditional.” However, Oracle, DB2, and SQL Server all have column stores. And because each company is better at databases than SAP (a newbie to DBs), the evidence indicates is that both Oracle, DB2, and SQL Server are better than HANA at even the column/analytics processing. However, SAP is not updating the information it first began distributing back when these other vendors were further back than SAP on column oriented processing. SAP wanted to freeze all of its competing database vendors back in 2011. Here is another quote that continues the inaccuracy that only SAP has columnal storage.

“Can we just increase the memory of the traditional database (like Oracle) to 1 TB and get similar performance?

NO. You might have performance gains due to more memory available for your current Oracle/Microsoft/Teradata database, but HANA is not just a database with bigger RAM. It is a combination of a lot of hardware and software technologies. The way data is stored and processed by the In-Memory data base is the true differentiator. Having that data available in RAM is just the icing on the cake.” – SAP HANA Tutorial

But in fact, HANA does memory optimize. The curious thing is that SAP does not seem to have the same capabilities to optimize memory, so it has to brute force the solution with extensive hardware specs. Benchmarks by a vendor shared with me illustrate that the hardware that HANA has is not addressed correctly. So a lot of the hardware ends up being wasted.

HANA as a Major Marketing Tentpole

SAP HANA has been a marketing tentpole of SAP for over 4.5 years. Still, the knowledge of SAP HANA is fragile. Secondly, few people have implemented a HANA system. A shockingly few have implemented any SAP HANA once one gets past the most common implementation, porting the SAP BW to SAP HANA. SAP has seen little return on its SAP HANA investment, but SAP HANA is still rising as a topic of interest — perhaps not among those that work in close collaboration with SAP, but overall. This was verified by web metrics and was a surprise.

There are many interesting storylines to cover on SAP HANA, and we will cover as much as we have the time and the information and understanding to include.

Conclusion

This article was designed as an SAP HANA technology overview. Columnar databases have speed advantages. However, they are not universal benefits.

A columnar database like SAP HANA is not at all new. Most of what I have written above was true decades ago. However, the difference is columnar databases have a particular advantage when used with memory versus disks. This is due to how the media is read, and it turns out that a database with “narrow” tables — (the narrowest being 1 column) combined with that database being stored in memory makes for speedy retrieval. However, this is an analytics purpose. And not all computational functions are analytical.

While all data actions improve with faster hardware, specific steps are better or worse, depending on the type of action taken. Therefore, it should be specified that the column based database is only optimized for data retrieval action.

HANA’s underlying technologies — in-memory databases and the column-based database (as opposed to the row based) were not invented by SAP, and the techniques are not unique to SAP.

  1. It is incorrect to propose columnar databases as a newer design since columnar and row-based databases have been around for about the same amount of time — it is merely that columnar databases were never accessible or attractive.
  2. With the ability to use more significant amounts of memory, columnar databases or emulated columnar databases have begun to receive more attention. But outside of SAP, this attention has been almost entirely for use in analytics. SAP is the only software vendor that proposes that columnar databases are better for all applications.

Compression is an advantage of column-based databases — and this has been true since column based database was invented. However, column-based databases represent only a small fraction of the overall database market. Why? Well, there is much more to database design than compression. SAP will most often bring up a positive aspect of HANA — or a column based database, but leave out the negatives.

HANA is presented by SAP as a universally advantageous combination of database design combined with faster memory/storage. The less one knows about databases, the more credible this seems. The article Mismatch Between HANA and S/4HANA and ECC explains why SAP HANA’s speed benefits do not hold for this application type.