In most circles, "five-nines" uptime is standard for 24/7 datacenter operations, but there are now payment processors targeting 100 percent uptime for their banking and consumer customers -- and they are taking every step to assure that kind of performance.
Is this realistic? In the mainframe world where most of these payment processors operate, some CIOs tell me they have never seen the mainframe go down for as long as they have held their positions. In one case, the CIO has been in his job for over 20 years. But the challenge for 100 percent uptime goes beyond mainframe resources and extends to data communications and the distributed side of payment processor datacenters.
Moving to 100 percent uptime is not a trivial pursuit. It takes planning, investment, and the patience to see a datacenter project of this magnitude through. Common elements in these projects include:
Control over at least two datacenters and the use of a third for backup. Because of the urgency of the payment processing business, the major players all own and control their own datacenters. The starting point for any 100 percent uptime plan revolves around transforming these datacenters so their respective resources and applications are all running in parallel with one another. The third leg of the datacenter side of the project is to have a third-party datacenter for disaster recovery and backup standing in the wings as extra insurance in the event that corporate datacenters are destroyed or compromised.
Redundant data communications between corporate datacenters and fast datacom inside each datacenter. Payment processors use multiple data communications topologies (fiber optic, Ethernet, etc.) to transport data between their datacenters. They do this for redundancy and load leveling, as well as for speed. Within the datacenter itself, they are likely to use a data flow architecture like Infiniband to move data quickly between processors and I/O devices.
Parallel processing and data mirroring. The goal is to parallel process data on two separate processors in two separate facilities simultaneously, so that if you need to failover from one to the other, there is no disaster recovery and backup. You simply keep going. This is accomplished by clustering machines together into what appears to be a single system image that allows for data sharing and parallel computing. This is run in combination with systems management software capable of managing the end-to-end computing and data resources and performing automated failovers.
Centralized staff. Instead of replicating every staff position at both datacenters, the goal is to leverage the well-developed skills of a single staff that is strategically spread between the datacenters. This ability to streamline staff is largely due to the amount of automation that has been introduced into the datacenters.
A patient migration of applications. Understandably, making a move to an environment like this carries heavy risk. Those who are undertaking such projects plan for implementation periods of many months as they test and retest each application before they move it into the second datacenter for parallel processing.
Transparency minus one. The objective of painstakingly testing, retesting, and migrating applications for parallel processing over many months is to make the move to a 100 percent uptime, parallel processing environment without your customers ever finding out about the project -- because everything continues to run flawlessly while you are doing this. However, when the migration of applications is finally complete, you are going to have to perform and test for the ultimate failover. Because there is always the risk that something could go wrong, you must notify your customers in advance and secure the test window that you need.
Due to the costs and the risks, 100 percent uptime targets aren’t for everyone, but in banking and finance, they are the future. Today’s task for technology companies and payment processors is to develop methodologies that simplify this transition -- which almost certainly will be expected in tomorrow’s financial marketplace.
@Hospice - I've been involved in product and development in the past and have seen how budgets are considered. In an instance like this, the bank will likely evaluate how to pass this cost off onto the customer and promote it as an additional service. I suppose the right institution may not necessarily make it reflect upon the customer overtly, but inevitably the customer will feel the impact somehow.
@Sara - There are certainly benefits to having the 100% uptime for sure. But I just don't think the cost would ultimately be worth it. Surely much of this is speculation because we don't have hard dollar figures. With working in banking for the past 10+ years in a variety of functions, you get to know what customers want. Personally I dont think the customers would see the value in having his up-time unless they don't incur any charge.
I spend a lot of time trying to persuade financial institution-haters that banks have to make money too. They're a business just like everyone else. But the biggest backlash I hear is, "why do I have to pay to get my own money?" So if you were to offer the 100% up-time for an additional $2.50 a month per account, I doubt that would go over very well. :)
You can be 24/7 without having 100% uptime. "Five nines" (99.999%) uptime means that a service is only down for 5.26 minutes per year (see "high availability" on Wikipedia), and "six nines" is 31.5 seconds/year. 100% uptime is 0.0 seconds per year, which means that there's neither planned downtime nor unplanned downtime.
First, we should make a distinction between internet banking and payment processors.
The typical internet banking site is far from 100% uptime either in reality or in aspirations. I would say the vast majority of these sites have scheduled downtime at some point, typically at 3am on a weekend or some other lightly-trafficked time. You may get a message, "Temporarily down, come back in an hour." No big deal. Considering the need to do periodic patch updates or other maintenance tasks, there's currently no way getting around this without spending a great deal of money for parallel infrastructure. Based on adoption rates, consumers have already accepted Internet banking "as-is" with periodic downtime, and I see no evidence of a huge public clamoring for higher quality of service.
The original blog was talking about payment processing, which are the third parties that sit between merchants and their banks. There is more at stake here, because nobody wants a transaction to fail even at 3am. And I agree that there should be extremely robust disaster recovery and business continuity planning, and that the service should work 24/7.
However, once you set the bar at "100 percent uptime" that justifies a level of investment that may extend well beyond the business benefit to end users. If there's a huge earthquake, flood, etc., and payments go down for an hour or two, well, so what? As long as the service comes back online, that's the important part. Occasionally, we're going to have a power outage, and occasionally we're going to have a financial outage. It's not the end of the world. The industry should pay attention to resiliency instead of uptime.
Yes, the banking industry is part of the nation's critical infrastructure, just like the power grid. But that doesn't mean the payments network needs to have the same always-on availability as the control systems to a nuclear power plant.
Finally, no institution can promise 100 percent uptime. All you can do is set that as your goal and set prioirites accordingly. My point is that financial institutions should set more reasonable uptime goals and focus on things that matter more to customers, such as price, resilience and security. There's only so much money in the operating budget, and 100% uptime is not the most important criteria in financial services.
"What's the true cost of having 100% uptime?". Why do you think that customers would have to bear additional costs if the banks or financial intitutions go 100 percent uptime? My banks' online bankings services work 24/7 and I can make transfers from one account to another instantly without any disruption. The only problem I noticed is that some accounts take time to get updated, but so far I'm fine with that. I will probably not be happier if I was asked to pay additional costs for a "slightly" better service.
I don't know Damian, I think that plenty of customers would pay a bit more for 100% uptime. Of course that depends on exactly how much more money we're talking about. If the monthly fee goes up by $1, they might go for it, if it goes up $2 or more, they might not. But when a customer NEEDS to make a transfer from their savings account to their checking account -- because otherwise they'll bounce a check or be late on a payment, which might result in them having their electricity turned off, or their credit score slammed, or their husband in jail overnight, or whatever -- they'll be happy they stayed with a bank that gives them 100% uptime.
@Ivan - I tend to agree with just about every sentiment you made mention. What's the true cost of having 100% uptime? Is it really worth adding the incredible additional expense and then passing that to the customer just for the customer to know that they'll always be able to have the 100% uptime? I'll bet if you surveyed most people you'd probably find most have never experienced downtime. And if they did, it was only once or twice. Is that really worth it?
Mary, it’s necessary that the payment gateways should be 100% up in a 24/7 manner. Otherwise, from user point of view it makes lots of issues. Once I had faced a similar problem with the payment gate way, while travelling I tried to book an international flight ticket from my saving bank account. After making the payment, I got a conformation saying that we are not able to process your request because the payment gate way is down and the worst part is the amount is deducted from my account. So I think making the payment up for 100% is more beneficial and can increase the business too.
Who's pushing for 100 percent uptime? This functionality certainly isn't free. As you pointed out, such a target requires multiple owned-and-operated datacenters, multiple communications links, and higher staffing costs.
Who in the marketplace is saying, "Well, cost control is important in our organization, but what's more important is 100% uptime."
Are there consumers saying, "I don't mind paying extra fees to my bank, or paying a little extra for consumer goods, as long as I know that I have 100 percent uptime."
I suspect that what's really happening is that increased regulation is pushing the financial services industry in the direction of rate-regulated energy utilities. Utilities can't just raise rates whenever they want, as they're constrained by state rate control boards. However, they're allowed to make a reasonable return on assets. That creates the incentives for investor-owned utilities to undertake the construction of huge projects, a.k.a. "featherbedding." If you want to increase revenue, increase the asset base. And now that the payments processors [edit: was "banks"] are being turned into utilities, they're turning to the same playbook.
If the central payment processors in the industry are going to make a move to 100 percent uptime, I'd like have some assurance that whatever line of business they're in is subject to open and vibrant competition before they increase prices on everyone for marginal benefit.
The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail: [email protected]
Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.