Coding and Development

How AWS and Google Cloud Enable the Data Centric Development

Executive Summary

  • There are two primary models for software development. One is application-centric, and the other is data-centric.
  • We cover the implications of moving to AWS and Google and the data centric model.

Introduction to The Application Centric Software Model

The most commonly used model for software development is called the application-centric model. Under the application-centric model, everything begins with the application. If either the application is created or purchased, the data is a consequence of a specific application’s needs. Later on, the company deals with the implications and negative externalities of a large number of applications that create data, often siloed data.

Implementing companies most often don’t consider this implication because for decades application vendors have successfully driven the focus to an application first or application-centric perspective. The application-centric software model is not the dominant form because it is right, or sustainable, but because it allows for the fastest possible development.

The Data-Centric Model

The opposite of the application-centric model is the data-centric model. If we step back from our assumptions and analyze how applications are normally purchased by companies, even if there are multiple applications purchased from one vendor (such as with SAP and Oracle that sell multiple applications normally into their customers), they more often than not do not have any notion that the data will be used by many other applications or data services internally. When an application is purchased, data projects are not performed that determine how that data will fit with the existing data.

A data-centric approach where that is determined by a data group internally, and where they are included on custom development and packaged software purchases so the data group can determine how the new fields fit with the current fields as at some point, that data is going to get loaded into an external storage as both integration and reports on this data will be necessary.

The Issue with Integration

Integration has always been a primary reason holding companies back by stifling innovation and draining budgets. Companies mostly buy applications that have proprietary and exclusive access to their own data. For example, CRM data is only accessible and meaningful to the CRM application. The financial data is in a completely different format and can only be accessed by the finance app.

This application-centric approach has functioned to create silos, and it seems quite difficult to break with this siloed approach. As an example, even the data warehouse, which was originally designed to create a common or shared data area is really just another silo. And all of these silos create the need for integration.

This leads to the following frightening statistics regarding IT projects.

“A survey of 40,000 projects reveals that only one in three succeeds.

The so-called successful projects pay off only 20% of the time.

Data integration typically consumes 35-65% of a company’s IT budget.” – Data Manifesto

The Core Problem with Application Centric Development

The data centrists propose that these problems are at least in part related to the current application-centric approaches to IT management. These data centrists go onto propose the core problem.

“The problem, at its core, is that we have allowed applications exclusive control over the data they manipulate. At first blush, this seems necessary and desirable. The validation, integrity management, security and even the meaning of most of the data is tied up in the application code. So is the ability to consistently traverse the complex connections between the various relational tables that we euphemistically call ‘structured data.’ This arrangement seems to be necessary, but it isn’t. It is not only not necessary, it is the problem.”

The Data-Centric Approach is Born

That was the only way companies did things until data companies came along and a better data-centric approach was born. This new approach is having positive impacts. For example, many tech startups today are built around the data-centric approach from day one and that’s the reason they’re so agile, can pivot quickly, and can go to market so quickly. Even giants like Google, Amazon, and Facebook innovate and pivot quickly and they have a strong tendency to follow data-centric approaches.

Why can’t big companies do the same? Why do so many projects keep failing with such high frequency? It is the proposal of the data centrists that this is due to how the IT vendor and services entities are configured and incentivized. There is a tremendous amount of money to be made in building, implementing and integrating applications in organizations. This keeps the application-centric paradigm alive.

This is emphasized by the following quotation:

“The zero-legacy startups have a 100:1 cost and flexibility advantage over their established rivals. Witness the speed and agility of the Pinterests, the Instagrams, the Facebooks and the Googles. What do they have that their more established competitors don’t? It’s more instructive to ask what they don’t have: They don’t have their information fractured into thousands of silos that must be continually “integrated” at great cost.”

In large enterprises, simple changes that any competent developer could make in a week typically take months to implement. Often change requests get relegated to the “shadow backlog” where they are ignored until the requesting department does the one thing that is guaranteed to make the situation worse: launch another application project.

“How often have we seen multi-million (even multi-hundred million) dollar projects justified on the basis of a handful of requirements, that if not for the need to make a wholesale change for the sake of change, would be fairly simple incremental additions? We’ve seen a $50 million HR project justified on the basis of a requirement to support collective bargaining, only to see it not be available in time for the requirement that justified it.” – Data Centric Manifesto

Who Owns the Data?

In a data-centric approach, applications still create the data but they don’t own it. They don’t define it and they don’t have exclusive access to that data.

“Think about the various mobile phone apps that attach to your calendar. They don’t own your calendar data but they know of it and can schedule events and set reminders via a published API. You can add infinite functionality with new apps but you never create silos.

The app-centric model is about writing and shipping software. The data-centric model is about building and managing a platform. Apple is a platform, so is Google, Amazon, and Salesforce. Developers, partners, and customers create apps that run on the platform and extend its functionality but don’t create data silos with every new app.” – Ahmed Azmi

The Problem with Data Silos

With silos, the smallest project can cost millions and frequently fail because the data sources are vague, fragmented, and the information landscape is fractured. Over the years, new versions and upgrades wreak havoc with the data layer definitions and create multiple copies that make it near impossible to move ahead.

Data, the data centrists say, not applications should be the centerpiece of the business.

It is a fascinating concept. For a developer’s perspective on this we turned to Rolf Paulson who is a long-term developer in SAP.

“I am application-centric myself, too, I see this as a result of basic architectural requirements like modularization, encapsulation, information hiding and even modern approaches like domain-driven design. But maybe this conclusion is wrong or not stringent.

You may get a mess if you lose control about who is accessing the data of your domain. Imagine you have to refactor the way data is stored. I see a lot of challenges. The disadvantages of the common way integration is done are obvious, but there will be many challenges if you switch your thinking and development to data-centric. My first impression was it is a kind of moving the problem how to create a comprehensive and stable contract between provider and consumer from the application layer to the way data is stored.”

M. Fowler said something like minimizing the number of irreversible decisions is the most important task of a software architect, and in the data-centric thinking the way how data is stored seems to be a very early decision, that contradicts to agility.

So the data-centric approach makes a lot of sense if the “data” is somewhat immutable. Therefore the terms “event sourcing” and “event streams” came into my mind. These architectural integration approaches stem from microservice architectures.
Event sourcing and a persistent immutable event stream make it possible to integrate even applications that do nor yet exist into an existing landscape. This is one of the central requirements of the data centric manifest. A new application can consume existing streams of data. Since events can be considered as immutable data, this architecture may be a use case to approach the “data-centric” thinking. To get more practical you may look at Apache Kafka.”

The In Grained Application Centric Approach

So what are the barriers to the data centric approach?

“In my experience, the barriers to the data-centric approach are NOT technical at all. First, there’s the application-centric mindset. This mindset’s so ingrained into IT. The idea of treating data as an open source that outlives any given application is NOT how legacy IT naturally think.

Then there’s the business model of application vendors. Application vendors make money from database licenses. The more databases sold (silos) the more money they make. Today, M&S accounts for (roughly) 50% of SAP & Oracle’s revenue.”In fact, application vendors treat the idea that data is an open source to be shared by other applications something to actively stop. Some have taken their customers to court over it. This sits at the heart of creating more data silos and makes integration more difficult and expensive. Worst of all, IT complexity kills business agility and innovation. A single IT change request often goes at the end of a month long backlog. By the time it gets attention, either the requirements change or the opportunity is lost. – Ahmed Azmi

Conclusion

We don’t think any of this is correct or makes any sense. Customer data is customer property. Customer data is not owned by vendors. No application or vendor should claim data ownership. Vendors should support customer strategy, not define it.

One has to be careful not to conclude that a new way of doing something will be fantastic as often while the old way has had time to have its shortcoming exposed, new things have gone through fewer cycles of testing. However, AWS and Google Cloud at least allow customers to move away from such a vendor-dominated approach to IT management.

SAP and Oracle could argue that by using AWS or Google Cloud isn’t the company simply trading one type of domination “vendor domination” in this case for “IaaS/PaaS domination.”

We don’t think so. And the reason being that once one uses AWS or Google Cloud, the options are not reduced but in fact greatly expanded. AWS and Google Cloud is not about telling customers what the prescribed pathway is. AWS and Google Cloud look much more like a candy store where you can choose to test and use an amazing array of items.

References