- SAP has promoted HANA as connected to Hadoop and Big Data, however, this is untrue.
- SAP would like to “piggyback” on Hadoop, but Big Data does not need HANA.
Introduction to the HANA and Big Data Reality
SAP has been proposing HANA for a new purpose, namely to serve as the database for the customer’s SAP Big Data.
See our references for this article and related articles at this link.
Is HANA a right choice for Big Data? That is it time for SAP Big Data? In this article, you will learn the issues with using HANA for this purpose. And why SAP wants customers to very strongly associate HANA with Big Data.
Big Data, No Interruptions?
SAP released the following information in a press release titled SAP’s HANA Bet Seems to be Paying Off.
While SAP HANA’s new or improved database features — like high availability, disaster recovery, hot standby, hardened security and unified administration for cloud and on premises deployments — may not seem all that interesting, how’s this?
It’s the end of the year (like it is now), you’re trying to close out your business and the system keeps crashing. The breakdown doesn’t happen at the beginning, or in the middle, but just as it’s about to finish. Over and over again, kind of like an IT version of Groundhog Day.
This actually happened at a company which Goodell didn’t have permission to name. That company wasn’t able to close out the year. There was too much data, not enough horsepower. Needless to say, it moved to SAP HANA to solve the problem (Goodell wouldn’t have told us the story if that wasn’t the case).
What’s important to note is that SAP HANA’s latest release includes a new hot standby feature to prevent problems like the one that company encountered. It promises to help customers regain faster access to data by switching over to a standby database that is continuously updated with log replays from the primary data source. When problems occur, an enhanced SAP HANA cockpit can be used for offline administration and diagnosis.
Why wouldn’t Goodell have told this story if it wasn’t the case?
Heads of marketing tell lies all the time. Goodell started in the previous paragraph stating that HANA was growing faster than R/3 back in the late 1980s, so Goodell’s credibility is not high.
SAP has proposed that S/4HANA processes end of year close much faster than using ECC on Oracle, but Brightwork has analyzed this claim and found it to be false. This is covered in the article Analysis of SAP Provided Information on S/4HANA.
The statement regarding hot standby is unlikely to be as advertised. So far, we have seen no innovation that stands out from HANA versus other databases. Unlike SAP’s proposals, HANA is more problematic due to maturity issues that more established databases.
Data Points to Dangerous Places?
SAP goes on to layout the following case study for HANA.
The more data, the better the insights. We’ve all heard this one before. But it’s not always easy to blend vast quantities of different types of data — structured, unstructured, semi structured, IoT, geospatial — to understand relationships between them and to discover new insights and/or anomalies.
That’s what Prescient Traveler uses SAP HANA to do in order to help keep travelers who go to sensitive, and sometimes dangerous places, safe.
Prescient leverages its own proprietary data from high-stakes intelligent operations, irregular warfare and threat analysis systems, blends it with sentiment analysis gleaned from high volumes of social media and news articles, and then parses it with geospatial data to identify and immediately alert its subscribers who are in proximity of a danger.
SAP’s newest HANA release provides the enhanced text analytic capabilities needed for the job. More specifically, the ability to identify relationships among the elements of a sentence and improved language support for text mining algorithms to produce insights from unstructured content. The release also includes new spatial features, such as clustering and the ability to partition spatial data, to help accelerate analysis and enrich location intelligence. Reverse coding capability has also been added to help pinpoint a location’s latitude and longitude and display specific addresses within a given radius which can help assess the impact of disaster or health care risk.
What does this look like in the real world? Prescient users can receive alerts before they walk into the epicenter of danger. As we’re increasingly learning, insights gleaned from places like Twitter and Facebook might lead to saving lives.
SAP’s 1.1 release for the SAP HANA platform has been certified by the Open Geospatial Consortium (OCG) which helps make it easier to exchange data between third party spatial solutions. SAP will also announce its intention to deeply integrate advanced Esri ArcGIS geospatial capabilities and content across the entire SAP application portfolio.
If Prescient Traveler does this, they have made a poor decision. HANA is not a leader in any of the things mentioned in this article. Just because a customer is using SAP for something does not mean that they are getting value out of. One has to analyze the implementation as an independent party to say for sure.
SAP’s Inconsistent Explanation for Hadoop Integration
Keeping all of your data in memory isn’t practical or feasible. That’s where SAP’s Hadoop integration comes in. It allows for large data volumes to be managed transparently with policy-based data movement from memory to disk and Hadoop using the data lifecycle management capability. This allows organizations to fully optimize performance-price considerations based on business needs.
Since HANA was first introduced, SAP has been saying that all data must be loaded into memory to obtain “zero latency,”
So Not 100% In-Memory Computing?
If putting all data into memory isn’t feasible, then why is SAP pushing so hard on “in-memory computing?” and telling people that everything should be placed into memory?
SAP has been pushing hard to get companies to use HANA with Hadoop, but if we look at the leaders in Big Data, they don’t use HANA. Hortonworks, for example, does not use HANA. Neither does Cloudera. AWS’s Big Data offering does also not offer HANA (but offers HANA for other things). The only people who use HANA with Hadoop are companies that know little about Hadoop and have been most likely tricked into using it by one of the large consulting companies.
What is HANA?
HANA is two things.
- A column-oriented database design.
- An in-memory database that uses SSD and RAM to store the database. With HANA, the entire database must be loaded into memory, and spinning disks can only be used for archival.
HANA is optimized for analytics. Its expense can be justified under minimal scenarios — all to do with a high query or read access to the database, although HANA is not the best database on the market for even this limited type of processing.
Furthermore, Big Data is about using managing massive amounts of unstructured data. That is not all that inconsistent with SAP’s messaging, because SAP’s solution design for HANA is to use HANA with another database that does specialize in unstructured data, called Hadoop. Hadoop, which is an open-sourced database which is the best-known name in Big Data. However, this design is strange. It means extracting data from Hadoop (unstructured) and placing it into column-oriented tables. It is not clear why this is a desirable design, but SAP seeks to frequently skip these details, preferring to simply focus on HANA’s speed benefits.
HANA’s High TCO
- HANA is a very expensive database, both regarding the initial purchase price and the estimated TCO.
- Brightwork has not yet completed a TCO study for HANA, but all of the standard cost inputs to the TCO calculation are high.
- You can see Brightwork’s online TCO calculators at this link.
We have more experience calculating TCO than any other entity. And we performed an analysis of all available TCO research before we created our calculators and as exploratory research for the book Enterprise Software TCO: Using Total Cost of Ownership for Enterprise Decision Making.
HANA is simply far too expensive and is not designed to store unstructured data. HANA is designed to store data in a column-oriented table, which is structured.
SAP Big Data & SAP Hadoop?
The big name in Big Data isn’t SAP; it is Hadoop. Hadoop is an open source database and is not controlled by any company. SAP has tried to promote HANA being used through a connector called Vora to Hadoop; it is unclear what HANA is adding to this “mix of ingredients.”
Companies that have been successful with Hadoop have not used HANA, so what SAP is doing is essentially bandwagoning onto Hadoop’s success.
- Hadoop has one of the best value propositions in the database space.
- Hadoop is growing very rapidly.
Even still, it is now a common search term to look for SAP Hadoop? But really, how much does SAP have to do with Hadoop?
Force Fitting HANA into the Big Data Trend
SAP is force fitting HANA into the Big Data trend that SAP does not logically have much to do with. SAP Big Data is simply not yet a “thing.” And technically exporting data from Hadoop to HANA for fast queries do not seem to make any sense.
However, we have observed this pattern of co-option many times.
Co-option is where SAP mostly pretends to be involved in something or pretends to have an offering in an area where it doesn’t. A perfect example of this is IoT and is covered in the article Why SAP’s Leonardo Seems to Fake.
This is just another example of SAP attempting to broaden out HANA into applications that it is simply not a very good fit. Using HANA when it is not a good fit means accepting both very high costs and a high probability of failure.