How to Understand The Design of Mainframes
Executive Summary
- Mainframes are engineered systems designed to run at high utilization without virtualization.
- Mainframes are designed to meet four distinct design objectives.
Introduction
To understand why AWS and Google Cloud can attain such advantages versus on-premises servers, it is necessary to comprehend mainframes at a basic level. This section’s coverage is known to all people who work in mainframes, but those outside the mainframe industry do not generally well understand it. And most of the readers of this book will not have a background in mainframes, which is why we include this brief overview.
Let us elaborate on the most critical differences in a little detail.
Our References for This Article
If you want to see our references for this article and other related Brightwork articles, see this link.
Redundancy/Stability
Coming from the server mindset, the stability, redundancy, and longevity of mainframes is mind-boggling. Quotes like this are just not found outside of mainframes.
“Any mainframe computer technician will tell you, enthusiastically, how remarkable the technology is despite its age. Those old IBMs have triple- and quadruple hardware failover mechanisms, several different electrical inputs for backup power, end-to-end encryption, and layers of advanced virtualization. They are built to last 40+ years at 100 percent uptime due to those redundancy features.”
If one takes the time to investigate a modern mainframe’s interior, it begins to look like a miniaturized server farm. However, once you get past the high-level similarities, there are many differences in the details. Mainframes use proprietary hardware, and hardware performs functions managed by software in servers. Mainframes contain redundant components and follow the opposite approach to component failure versus an AWS or Google Cloud data center. Mainframes are far smaller and don’t scale as easily as racks of commodity servers (although multiple mainframes can be connected and made into a cluster). However, mainframes already have a designed efficiency, which means they do not need to scale to obtain fantastic utilization levels. Their hardware design makes them less dependent upon the software for utilization.
Much of the IT media coverage related to the longevity of mainframes casts reliability through a skewed lens. The following quotation is a prime example of this type of coverage of a 2016 Government Accountability Office (GAO) report on the state of US Federal Government IT.
“In a soon to be released report, GAO found, “federal legacy IT investments are becoming increasingly obsolete” with outdated software languages and hardware parts that are not supported. In some cases, GAO found agencies are using systems that have components at least 50 years old.”
Indeed, it’s not a good thing to not correctly invest in one’s IT systems, but when is the fact that an item that has been in functional service and adding value to entities for 50 years is construed as a bad thing? Could that not be construed as a good value for taxpayers? Not only are the systems 50 years old, but the GAO found many of the individual components are 50 years old!
Furthermore, the framing of the related items that are “out of date” is also interesting. One of the out-of-date languages the GAO referred to was COBOL, which the Department of Veteran Affairs uses, Social Security Administration, and the Department of Justice. COBOL is undoubtedly old but still a highly respected language with an estimated more than 200 billion code lines still in operation globally.
Languages tend to turn over relatively quickly in software. As the cartoon correctly points out, Java was not that many years ago considered leading edge. This is not to say that languages won’t eventually be phased out, but much of the old languages’ issues come down to training. The US Federal government has little excuse to say they cannot find COBOL programmers when they can relatively easily implement COBOL training. They have all the incentives as they have systems written in COBOL to maintain. If COBOL exists as it did previously, and there are tons of COBOL training materials, what is the problem exactly? Secondly, many of the system’s custom-built for the government often do not have a reasonable facsimile available in commercial software vendors.
COBOL skills are declining, but they can be trained, and COBOL programs can be maintained quite well with the right trained resources. Instead of maintaining the code, the plan was to replace parts of the three agencies’ systems. A second out-of-date language was assembly language, which the Department of Treasury relies upon, and that does seem to be something to update.
Indeed, 50-year-old computer parts/components will often not be supported and are coming to the end of their life, but isn’t this something to celebrate? (while looking for eventual replacements, of course).
Mainframes Versus the Typical Server Farm
Mainframes have much more hardware design thought that went into them than does the typical server farm. Server farms replicate nearly the same server many times over, with everything connected, and as we said previously, creating what amounts to one giant computer. In the mainframe, the components are of higher quality and fail less frequently. Their initial cost is far higher than the commodity servers used by AWS and Google Cloud, but the overall TCO is, which includes the cost of maintenance, is better than on-premises servers. Mainframes are far lower in maintenance than batches of servers, and mainframes are considered “turnkey” solutions. It is also regarded as typical for mainframes to stay up for years without being rebooted. This is not the case with servers. The comparison in reliability is explained in the following quotation.
“One of the less reliable vendors in the mainframe replacement business was quite keen on announcing that it had achieved “5 nines availability, apart from planned downtime” – which turned out to be rather different to what 5 nines availability meant in the mainframe world, where you could change operating systems without bringing the system down, you didn’t need to patch the operating system every week and unplanned downtime of a few minutes a year worried your vendor.”
On-premises servers are purchased not as turnkey but more as a kit. They must be assembled with a series of other hardware components to make a useful emergent property that functions. They must have virtualization software and other software to run even at low levels of utilization. On the other hand, mainframes can be plugged into a wall and to communications connections and be ready to use. The high quality of the mainframe components means that it naturally stays up without adjustment for long periods of time. This is the opposite of server farms with lower quality but far less expensive components being replaced as a matter of routine. The reliability of mainframes is evidenced by the fact that companies put their most essential workloads on mainframes. Standard practice in many companies is to have just two mainframes. Each at a different location, the second being the backup mainframe.
Overall, because of their design, mainframes have a mean time between the failure of decades. Yet, their low maintenance and high availability and the fact that there are examples of mainframes working and doing their jobs for 50 years are a bit unstimulating for the modern IT sensibilities that are looking for far sexier items to focus their attention.
Input/Output (I/O)
Servers use the CPU to manage I/O, which means they apply a general-purpose processor to the task. Mainframes have specific hardware that focuses entirely on I/O (called peripheral processors, channel processors, or I/O subsystems) that removes the I/O load from the CPU. This is a bit like a computer with a video card versus video processing on the mainboard. This means that mainframes are more specialized and provide more unique processors for different tasks. This is a significant reason that CPU utilization on mainframes is so much higher than for servers. I/O turns out to be the biggest bottleneck on modern servers, limiting their utilization and something that mainframe designers knew decades ago when they designed their mainframes. IBM’s I/O Control Program is dedicated only to managing the I/O for the mainframe.
To emphasize the point about I/O, notice how much of this mainframe is dedicated to I/O drawers. Roughly 45% of the components in this mainframe, by how the cabinet is filled at least, are for I/O processing.
The Hardware Utilization Benefits of Mainframes
Because mainframes have so much forethought built into them, they do a much better job of hardware utilization than servers. This is why mainframes can run at 100% utilization for years and maintains peak performance up to 90% load, while servers lose efficiency at greater than 20% load. Mainframes are designed explicitly for particular workloads, but they are not marginal or unusual workloads. Quite the contrary, they are very common workloads. Transaction processing, for example, is a very common workload, and it is something for which no better hardware has been designed than the mainframe. Rather than requiring extensive virtualization, mainframes rely upon logical partitions (LPARs) that run their own operating systems. The system processors and their memory are flexibly assigned to the LPARs, and resource allocation algorithms switch the processors and the memory between the LPARs depending upon the load. This means that while one LPAR is running virtualization, for running various specialized non-mainframe type workloads, the others can skip virtualization altogether when they run mainframe workloads. Within the LPAR that runs virtualization, the VMs are very lightweight, which is why IBM z systems can run so many of them. The IBM z13 can have up to 85 LPARs.
This diagram shows how LPARs are managed on a z Series mainframe. Notice the combination of LPARs with virtual machines (VMs)
Backward Compatible
This is one of the lesser emphasized areas of mainframes, but backward compatibility means that old programs don’t need to be refactored. As we have described, one of the fantastic features of a mainframe is its ability to run programs for so many years with so little maintenance. When a new mainframe is purchased, it comes with that backward compatibility built-in.
Different CPU/Processing Unit Architectures of Mainframes Versus Servers
Mainframes use different CPUs than do servers. IBM makes its own mainframe processors. These processors are customized for the mainframe workloads.
This Dell server contains a single 14 core processor. Mainframes do not include a single multi-core processor but rather many processors designed for specialized tasks. Servers try to meet a variety of needs with software rather than hardware. There is constant price pressure on servers, which results in low prices and functionality that could be handled with hardware dumped into the software. With such a simple layout and overall design, servers can be made in large numbers rapidly. Therefore the computing problem has a “unicrop” of servers applied to it. This is where software comes in to overcome the limitations of the hardware design in addressing different workloads.
The IBM z Series contains the following processors. The number of processors differs per configuration, so we have added an (s) to each processor type.
- The CPU(s): Used for the OS and the applications.
- The System Assistance Processor(s) (SAP): Supports the I/O subsystems.
- Integrated Facility for Linux(s) (IFL): Used only by the z/OS
- (zAAP): For executing Java code.
- System z9 Integrated Information Processor (s) (zIIP): For processing database workloads. It helps reduce software costs.
- Integrated Coupling Facility (s) (ICF): For running licensed internal code invisible to the CPU.
- Spare (as is spare processor): Functions as a spare processor for the other processor types listed above.
While it is unnecessary to understand each of the processors above in detail, the primary takeaway is that mainframes don’t operate off of copying over the same CPU. They offer a series of specialized processors. This is one reason they are far more capable of many workloads than any other hardware design.
Mainframes as Specialized In Their Design and Configured to Order
Mainframes are thought out to an exceptional level of detail for how the hardware should be designed. A process has been refined to match workloads to components and how they interact within the overall mainframe design. Because so few mainframes are purchased (instead of purchased in bulk as servers), there is a configuration process that is far more involved than with servers.
Mainframes like the IBM zSeries are entirely configured to order. There is no “single price.” Depending upon the configuration, the zSeries can cost from hundreds of thousands to multiple millions. However, because of the configuration, the customer gets a mainframe that is tuned for their requirements.
The mainframe manufacturers have templates that meet different workloads. Therefore, the mainframe manufacturer does the design work to match the workload. This is different from on-premises servers, where the responsibility is far more on the IT department.
The larger the mainframe configuration, the more components become involved. This is the DS8886 storage system’s internal setup, which would connect to a z series mainframe for extra storage.
Servers require an enormous scale until they can compete with mainframes in the total cost of ownership.
The IBM LinuxOne is a mainframe that runs Linux, which is the primary operating system that runs the cloud, as its name implies. Those that don’t usually run IBM’s virtualization software, z/VM directly on the hardware. A single mainframe can support thousands of virtual machines. However, mainframes do not have to virtualize to obtain high utilization, which is a fundamental difference from servers.
Mainframe Workloads
Mainframes are most often associated with the relational database type, as both the relational database and the mainframe are designed for transaction processing. Still, they can run all the different types of databases that are presently available, although not at the same level of efficiency. Moreover, while many people that work in IT have heard the term transaction processing because it is less discussed than it was in the past and has been primarily a background item in many ways, it is easy to forget precisely what transaction processing is.
The following quotation helps illuminate the topic.
“Making reservations for a trip requires many related bookings and ticket purchases from airlines, hotels, rental car companies, and so on. From the perspective of the customer, the whole trip package is one purchase. From the perspective of the multiple systems involved, many transactions are executed: One per airline reservation (at least), one for each hotel reservation, one for each car rental, one for each ticket to be printed, on for setting up the bill, etc. Along the way, each inquiry that may not have resulted in a reservation is a transaction, too.”
When IT media covers Big Data, IoT data, and unstructured data from social networks, websites, mobile, and others, what they often leave out is that data was accumulated from transactions being processed. These are the new “hot” technologies, right? Why is the term transaction processing not used in the vast majority of articles that cover these hot topics? If transaction processing is so tedious, why is it so foundational to some of the most discussed IT areas?
ERP systems are frequently discussed concerning their functionality, without emphasizing that most of what they do is transaction processing. ERP systems have other types of processing, as well. For example, they run mathematical procedures like MRP and DRP. Some analytics are performed within the ERP system, although in most cases, analytics are performed outside of ERP systems. However, the bulk of ERP systems is transaction processing, not running procedures, and not processing analytical queries.
SAP’s new S/4HANA ERP system is the first SAP ERP to be highly focused on analytics instead of SAP ECC that had virtually no analytics.
However, there is nothing in S/4HANA’s analytics screens that require any particular analytic-specific processing. The reports are canned and precompiled before they are viewed. But this does not stop SAP from claiming that the in-memory HANA database is critical for analytics.
Gartner declares that some areas of technology as “hot” and others as “not.” And Gartner does not have very much interest in mainframes and transaction processing. However, systems work as an integrated whole, so this US Magazine/Instagram approach to technical analysis is not particularly helpful to Gartner customers, most of whom have no idea that Gartner does not follow rules of research and is riddled with multidimensional conflicts of interest as we cover in the article The Problem with How Money Influences the Gartner Magic Quadrant. This is why we have previously questioned if Gartner is running a technology advisory firm or a fashion magazine, as we covered in the article How to Understand Gartner’s Similarity to the Devil Wears Prada.
While there are more and better analytics solutions than in the past, the analytics is ultimately based upon aggregated transactions, something that would, of course, be very disappointing for Gartner to learn. And while transaction processing seems dull, it is only because transaction processing has become so reliable that it has become boring. When transactions failed to get processed, “excitement” quickly ensues.
Mainframes Versus Server Farms
Server farms are only possible because of the software that ties them into a cohesive whole and allows processing to be virtualized onto generally uniform hardware. Without virtualization, servers have a major problem supporting their assigned workloads. However, while mainframes can virtualize, they rely less on virtualization and instead rely upon specialized purpose-built hardware and the specific configuration of the mainframe to meet the demands of the workload.
The similarities between the mainframe and the server farm are further illuminated in the following quotation from Nikola Zlatanov.
“The mainframe is still massive in the industry. Despite the predictions and rumors, it is really not going away. While some of the smaller customers have dropped off, all of the bigger customers are growing their workloads(emphasis added) on the mainframe.
There are two sides to why the mainframe market is still going strong. One is lock-in—an old business-critical application running on a mainframe, with many years of business logic built into it, costs a fortune to migrate elsewhere. The other is the mainframe’s relevance in a modern hybrid cloud architecture, running VMs on-premise and storing petabytes of data.
The mainframe has evolved a lot over the years. You can run Linux on the mainframe. A lot of what these big providers are doing is really trying to recreate the mainframe by hashing together a lot of servers. A private cloud made from many commodity boxes—one that can provide, say, 5,000 virtual machines—has things in common with a mainframe.
Both contain huge pools of CPU and memory for virtual machines, and massive amounts of storage for objects and images.
Both run an OS that virtualizes workloads to maximize efficiency.
Both can deal with high volumes of transactions. Many of the scaling problems being encountered in the cloud were solved long ago by mainframe technicians.”