- Hasso Plattner, Steve Lucas, Ron Enslin, Vishal Sikka, and John Appleby provide oodles of false information around HANA’s performance.
- We cover the impact of SAP marketing on benchmarking on HANA performance.
I covered the topic of the actual performance of HANA versus competitive databases in the article, Which if Faster, HANA, or Oracle 12C. In this article, I will cover the various database benchmarks on HANA and its competitors in more detail.
Who Performs Benchmark Testing in Databases?
The first thing to establish is that there is no independent body – such as a Consumer Reports for database benchmarking. This means that vendors performed the benchmarks that I reviewed.
This is a major issue.
Let us enumerate the problems with having no independent source for benchmarking as it relates to databases.
A vendor would never release a benchmark, which showed it as losing to a competing vendor across the board. The result of the reference would have to be positive for the vendor in some dimension, and more positive than negative for the results to be released.
This brings up the issue that pharmaceutical companies drug testing shows that negative studies tend to go unpublished.
“…studies about antidepressants made the drugs appear to work much better than they did. Of 74 antidepressant studies registered with the FDA, 37 studies that showed positive results ended up being published. By contrast, studies that showed iffy or negative results mostly ended up going unpublished or had their data distorted to appear positive, Turner found. The missing or skewed studies helped create the impression that 94 percent of antidepressant trials had produced positive results, according to Turner’s analysis, published in the New England Journal of Medicine. In reality, all the studies together showed 51-percent positive results.”
For instance, a past analysis of clinical trials supporting new drugs approved by the FDA showed that just 43 percent of more than 900 trials on 90 new drugs ended up being published. In other words, about 60 percent of the related studies remained unpublished even five years after the FDA had approved the drugs for market. That meant physicians were prescribing the drugs and patients were taking them without full knowledge of how well the treatments worked.” – LifeScience
We will address this topic directly as it appears that SAP is doing the same thing with its OLTP benchmarks for HANA.
Skill Familiarity Bias
A vendor will always have more skills in their solutions than in a competitor solution. Because databases can be “tuned up” and because of differences in hardware that is selected as well as some other variations, even if a vendor were 100% above board, they would still tend to observe better performance in their solution than a competing solution.
The vendors spare no expense in hardware for these tests. The customers will often purchase equipment that is lower in its specification than that used by the vendor. We cover this topic in the article How Much is Hardware Responsible for HANA Benefits?
The Laboratory Environment Bias
The hardware and database are run in a “lab” environment. It has no other batch jobs pulling its resources – which are, of course, unrealistic. Therefore the performance of the benchmark would typically not be attainable in a production setting. I see the benchmark results are more comparable between different benchmarks than between the baseline and a production environment.
Every benchmark paper I looked at had one clear purpose. That was to improve the sales of the product benchmarked by the vendor that wrote the paper.
The benchmarks that are released are then viewed through the prism of bias. That is, people that have an incentive to prefer a particular software vendor. One entity that has published inaccurate information about benchmarks that have been right in line with their financial bias has been the consulting firm Bluefin, which is overall one of the least reliable providers of information on HANA.
The Benchmark Tests
The following benchmarks were reviewed that were performed for these databases.
- SAP OLTP Benchmark: This is a benchmark for transaction processing. So things that ERP systems tend to do the most like recording journal entries, decrementing inventory when performing a goods issue, etc..
- SAP BW-EML (Business Warehouse Mixed Workload) Benchmark: This is an analytics benchmark. We cover this topic in the article How to Understand the Issues with BW EML Benchmarks. And we cover the SD benchmarks in the article How to Understand the Issues with SD HANA Benchmarks.
SAP’s Missing Benchmarks
For years SAP would release an OLTP benchmark for databases. However, with HANA, SAP stopped publishing this benchmark. Database design would predict that HANA would perform poorly in this benchmark, and this is the most likely reason why SAP never produced this benchmark. However, the consulting firm Bluefin Solutions has the following way of covering this up:
“The SAP HANA platform was designed to be a data platform on which to build the business applications of the future. One of the interesting impacts of this is that the benchmarks of the past (e.g. Sales/Distribution) were not the right metric by which to measure SAP HANA.” – Behind the SAP BW EML Benchmark
This is reinforced by the quotation from the book SAP Nation 2.0.
“It has not helped matters that SAP has been opaque about HANA benchmarks. For two decades, its SD benchmark, which measures SAP customer order lines processed in its Sales and Distribution SD module, has been the gold standard for measuring new hardware and software infrastructure. It has not released those metrics using a HANA database. One of the (unsatisfactory) excuses offered is that the expensive hardware needed to support such a test in a lab is better shipped to paying customers.”
John Appleby’s Lack of Transparency on Financial Bias
At no point in this article by John Appleby does he declare the fact that he has a quota, or leads a group with a quota to sell HANA. Further than this in the 2013 timeframe, John Appleby was preparing the company for which he was an executive to be sold to Mindtree, which we cover in the article Appleby’s False HANA Statements and the Mindtree Acquisition. Curiously, Appleby’s aggressive promotion of HANA considerably declined after Bluefin Solutions was sold to Mindtree.
John Appleby presents himself as if he is a disinterested third party. So that is problem number one. But the second issue is that Appleby is speaking what amounts to gibberish in this quotation.
- S4 has a Sales module.
- This sales module will be performing the same functions as the current ECC SD module. Will there be analytics involved in the Sales module? Of course. However, there will also be transactions or OLTP performed.
- S4 Sales will record sales orders, update sales orders, etc.…
- Therefore it is demonstrably untrue that an OLTP benchmark is now irrelevant because “the platform was designed to be a data platform on which to build business applications of the future.” That sentence is false, and it’s hard to twist oneself up into a pretzel to try to defend it. The person seems to be preparing to run for political office.
Appleby’s interpretation of the BW-EML benchmark contains other nonsense like
“the configuration used by published results is the stock installation…there are not performance constructs like additional indexes or aggregates in use.”
Appleby’s Trademark Nonsense
The reason this is nonsensical is that column-oriented databases don’t use indexes. They don’t need them. Why Appleby is impressed by this is a head-scratcher. How many times has it been established that the primary reason for the reduction in the size of the database footprint is due to the removal of indexes? If so, and if this is widely accepted, why is it surprising to Appleby that the BW-EML benchmark for a column-oriented database does not have indexes???
On the topic of aggregates, HANA does use aggregates but does not call them aggregates. So what Appleby is saying is incorrect, although there are fewer aggregates.
It is important to remember that John Appleby is tied to the hip with SAP. Like other SAP consulting companies, John Appleby and his company Bluefin Solutions primarily serve to repeat whatever SAP says. For whatever reason, Bluefin Solutions, and John Appleby, in particular, were chosen/decided to release information about HANA and S/4HANA that would have had to have been approved by SAP. Please see our analysis of the consulting partnership agreement that controls the media output of SAP consulting partners in this article.
John Appleby wins another Golden Pinocchio Award for his statements regarding the BW-EML benchmark.
Hasso Plattner’s Overstatement of the Importance of Removing Aggregates
Hasso Plattner has had an obsession with eliminating aggregates for some time, and he rails against aggregates in his articles and his books, but in many cases, aggregates are beneficial. Unlike what Hasso Plattner states, not everything needs to be continuously recalculated. And not everything needs to be recalculated every time it is accessed.
Many reference tables only occasionally change. Why instantaneously recalculate something that rarely changes? This is just a waste of processing cycles.
Let us take an example.
Let us say we want to see a report of all the sales orders that a company has processed for the past three months. This report was processed and aggregated, along with different dimensional attributes yesterday. Under Hasso Plattner’s logic, this aggregate is worthless because it is pre-calculated.
Let us look at that statement in detail.
Let us say that the aggregate was calculated yesterday precisely 24 hours prior.
- One day is roughly 1/90th of three months.
- If we look back 90 days in the report, we will show say 100,000 sales orders. That is an average of 1111 sales orders per day (yes, weekends would be less than work days, but as an average 1111 sales orders)
- Now let us say that the day that drops off, if we run the report anew had 1500 sales orders created (so a high day). And let us say that the day that was added, which is yesterday plus the hours up until the present hour are 700 sales orders (a low day).
- So instead of looking at 100,000 sales orders, we are now looking at 99,200 sales orders. Is that a real problem? Is the last 24 hours more representative than the 24 hours from 3 months ago? Probably not. But if it is, how much more should the company be willing to spend to get rid of all aggregates? And are there other investments that might be a better use of that money?
SAP’s Overestimation the Importance of HANA’s Changes as it Relates to SAP HANA Performance
There is an unlimited number of scenarios that could be imagined to determine the importance of the removal of aggregates. For instance, if just two days of sales orders were reviewed, then the company would receive a much more significant variation. However, the needs for instantly recalculated information are significantly overestimated in vendor marketing documentation and analytics vendor documentation in particular.
In the articles, Monthly Versus Weekly Forecasting Buckets and Quarterly Versus Monthly Forecasting Buckets challenges a long-held belief that forecasting information must frequently be updated with the most recent sales history to obtain the highest forecast accuracy. This was testing with actual client data, and from a client with challenging to forecast sales history, and will show that as with the tests I performed at previous customers, this is not all that important and contributes little to predict accuracy.
This client was convinced that daily forecasting produced the highest accuracy.
Generalizing From Misrepresented Scenarios
So, while there can be scenarios where getting the most up to date information is critical, SAP tends to take these few scenarios and generalize them to the be “normal,” when in fact, they tend to be the exceptions. Hasso Plattner has a way of presenting things that are often quite gray as black or white. And of course, all of Hasso Plattner’s examples have the peculiar and consistent outcome of handing over more money to SAP. I don’t make more money if I can exaggerate the way that Hasso Plattner does, and therefore his proposals tend to come off as sales fluff…at least to me. After years of reading Hasso Plattner’s statements, I do not consider him to be credible or truthful.
The Logic for The Improved Analytical (SAP HANA Performance) of Column-Oriented Databases
I found this quotation from IDC to be an excellent explanation of my column-oriented databases are so efficient for analytics.
“The established approach to setting up a query/reporting database (ODS, data mart, data warehouse) has involved establishing indexes for all columns that might have value lookup operations in the queries. Many organizations now use columnar databases, which have the same relational characteristics as row-oriented databases but store the data in blocks of column rather than row data for speed of retrieval. This obviates the need for indexes and, in some cases, for cubes and materialized views.” – IDC
“If live data is to be queried and updated at the same time, the queries must be very fast in order to avoid consuming resources on the database server and slowing down transactions. A number of vendors have created database technologies that optimize query performance by combining two key elements: query-optimized columnar organization for the data and memory-optimized database operations. In the case discussed here, however, there is an additional challenge, which is to maintain that data in a form that also supports a high-performance transactional database.” – IDC
“Database In-Memory leverages a unique “dual-format” architecture that enables tables to be in memory simultaneously in a traditional row format and a new in-memory column format. The Oracle SQL Optimizer automatically routes analytic queries to the column format and OLTP queries to the row format, transparently delivering best-of-both-worlds performance. Oracle Database 12c automatically maintains full transactional consistency between the row and the column formats, just as it maintains consistency between tables and indexes today.
· Access only the columns that are needed.
· Scan and filter data in a compressed format.
· Prune out any unnecessary data within each column.
· Use SIMD to apply filter predicates.” – Oracle
However, this does not mean, and Oracle is not implying that a column-oriented database is better for applications outside of analytics. And as far as I can determine from reading the perspective of different database vendors on this topic, SAP is the only database vendor that proposes that a column-oriented design is better for all types of applications.
Some of the Results On Oracle and SAP HANA Performance
For instance, in the Oracle benchmark paper released in 2015, the benchmark was tested on hardware similar to what SAP used in its BW-EML benchmark but left out the topic of how many customers would use this hardware configuration. I don’t know myself as I have not recorded the hardware specification of many clients, but the hardware employed by Oracle appeared quite advanced. At one point, SAP’s benchmark used a machine with 1536 GB of RAM.
I have personally never heard of this much RAM being used on a server at any account that I have worked on. It probably exists as there are very advanced companies out there doing scientific computing. But it is a small number.
At one point, Oracle points out that the monster machine used by SAP beat Oracle’s BW-EML benchmark, but needed three times the amount of memory to do this. Things bring up the question of whether SAP’s hardware was simply re-engineered to beat the Oracle benchmark. So did SAP first try the machine with 1000 GB of RAM, and then add 200 GB of RAM and then test again, and then add another 200 and test again, etc. until it finally beat the Oracle score?
In another benchmark, SAP installed 100 IBM servers in an SAP HANA cluster. Furthermore, if no one outside of the NSA, Amazon AWS (which resells portions of its hardware over the cloud) or a scientific computing center, will be willing to buy this size of hardware, how relevant are these benchmarks to the majority of HANA customers?
The Impact of Marketing on SAP Benchmarking on SAP HANA Performance
SAP needs to get marketing out of the process of releasing benchmarking information. In the benchmark publication SAP HANA Performance: Efficient Speed and Scale-Out for Real-Time Business Intelligence, the marketing influence is apparent. One should not need to see a cover plastered with stock photograph imagery of a man pulling a “fly” snowboarding maneuver, and then an image of a bunch of people rowing together, along with a marketing written introduction that uses a word salad of terms like NetWeaver components. How is this related to SAP HANA performance? This should be a scientific paper that is not word-smithed and couched in the deceptive marketing language.
SAP marketing must acknowledge that not every paper produced by SAP needs to have their fingerprints on it. Here is an example of the type of nonsensical writing that I am referring to.
“The drill-down queries (276 to 483 milliseconds) demonstrate SAP HANA’s aggressive support for ad hoc joins and, therefore, to provide unrestricted ability for users to “slice and dice” with- out having to first involve the technical staff to provide indexes to support it (as would be the case with a conventional database).”
Please do not use the term “slice and dice” in a technical paper, or the word “unrestricted,” or the colorful “HANA’s aggressive support.” This is not scientific terminology. SAP’s benchmarking paper needs to be rewritten entirely, just using the original data. Then at the end, SAP has quotations like the following:
“We have seen massive system speed improvements and increased ability to analyze the most detailed levels of customers and products.” – Colgate Palmolive
This is an anecdote, and it sounds like it was written by Donald Trump (except it use the word massive instead of tremendous.) If your benchmark study could have been written by Donald Trump, then you have a credibility problem with your benchmarking study.
The Specifics of Database Performance
For some time, we have been saying that HANA’s only beneficial area of performance is for analytics, which is called a read operation in database speak.
However, there is a level below this in detail. HANA’s primary beneficial area is for short SQL queries. An excellent example of a short SQL query would be a query for BW.
Long Versus Short SQL Queries
HANA’s performance degrades for longer queries.
An excellent example of a longer query is within ECC or S/4HANA. This is where the data is less prepared.
However, in SAP’s marketing material, they propose that HANA is excellent for reporting on ERP systems. There is no evidence of this up to this point. The evidence points in the opposite direction quite strongly as we cover in the article Why HANA is a Mismatch for S/4HANA and ERP.
Ironically, we have had many people tell us that once reports can be run from the ERP system, there will be no reason to have a centralized BI system. But the performance of HANA does not support this vision.
What Happened to SAP’s Row Store Performance?
This following quotation can be found in Oracle’s “Analysis of HANA HA” document.
“The SAP HANA database consists of two database engines:
The column-based store, storing relational data in columns, optimized for holding data mart tables with large amounts of data, which are aggregated and used in analytical operations.
The row-based store, storing relational data in rows.
This row store is optimized for write operations and has a lower compression rate, and its query performance is much lower compared to the column-based store”
Not all row-based stores are created equal, as HANA’s performance for ECC is worse for transactions than ECC on Oracle or IBM’s pre-column store databases. This explained the performance differences between Oracle DB, DB2, SQL server, MaxDB, and Sybase ASE even though all are row based by default.
One thing to remember is that HANA is still a relatively new database. When discussing this with a very experience database resource, they pointed out the following observation.
“That’s is no way a brand new row based DB can beat all these databases above which are optimised over so many years especially Oracle DB.”
When one compares Bill McDermott, Hasso Plattner, SAP marketing, Bluefin Solutions, Deloitte, and others say about the game-changing aspects of HANA to the technical benchmarks, there is no correspondence.
SAP invests comparatively little in benchmarking, but its marketing spending on HANA is enormous. This is similar to the major pharmaceutical companies. Pharmaceutical companies spend far more on marketing than research, and the study is mostly just running clinical trials, which is based on research that is performed by universities and is publicly funded.
Evidence of Oracle Outperformance SAP HANA Performance
Oracle has provided compelling evidence that its 12c database outperforms SAP HANA. I say this while acknowledging the fact that there is no independent body that performs database benchmarking. Oracle invests much more in database benchmarking, and its benchmarking studies are more transparent and make the case far better than SAP’s.
For all of the talk of SAP HANA performance, SAP produces a single benchmark to support these supposed claims of superiority over Oracle 12c and others. While we do not have independent verification, sifting through the results, it seems more likely than not that Oracle 12c is not only a little bit but far faster than HANA. And secondly, while SAP has placed speed as the priority in the design of its database, Oracle’s orientation is far more holistic, putting reliability first. Secondly, given 12c’s design, it will almost certainly easily beat SAP HANA performance for OLTP processing.
How Many Databases Can Outperform HANA?
This article did not review the benchmarking of other database vendors. However, I find it more likely than not that vendors like IBM, given the database talent that they have, not also have a solution that is superior to SAP HANA performance. And the list of other database vendors that can also beat HANA is likely more than just Oracle and IBM. The bottom line is that out of one type of database processing, called short query SQL, SAP HANA Performance is poor.
Memory-Optimized Transactions and Analytics in One Platform: Achieving Business Agility with Oracle Database (IDC Sponsored by Oracle)
OracleVoice: Oracle Challenges SAP On In-Memory Database Claims, John Soat, Oracle.
Benchmark Results Reveal the Benefits of Oracle Database In-Memory for SAP Applications, Oracle White Paper, September 2015
Behind the SAP BW EML Benchmark, John Appleby, Bluefin, March 19, 2015.