In most circles, "five-nines" uptime is standard for 24/7 datacenter operations, but there are now payment processors targeting 100 percent uptime for their banking and consumer customers -- and they are taking every step to assure that kind of performance.
Is this realistic? In the mainframe world where most of these payment processors operate, some CIOs tell me they have never seen the mainframe go down for as long as they have held their positions. In one case, the CIO has been in his job for over 20 years. But the challenge for 100 percent uptime goes beyond mainframe resources and extends to data communications and the distributed side of payment processor datacenters.
Moving to 100 percent uptime is not a trivial pursuit. It takes planning, investment, and the patience to see a datacenter project of this magnitude through. Common elements in these projects include:
Control over at least two datacenters and the use of a third for backup. Because of the urgency of the payment processing business, the major players all own and control their own datacenters. The starting point for any 100 percent uptime plan revolves around transforming these datacenters so their respective resources and applications are all running in parallel with one another. The third leg of the datacenter side of the project is to have a third-party datacenter for disaster recovery and backup standing in the wings as extra insurance in the event that corporate datacenters are destroyed or compromised.
Redundant data communications between corporate datacenters and fast datacom inside each datacenter. Payment processors use multiple data communications topologies (fiber optic, Ethernet, etc.) to transport data between their datacenters. They do this for redundancy and load leveling, as well as for speed. Within the datacenter itself, they are likely to use a data flow architecture like Infiniband to move data quickly between processors and I/O devices.
Parallel processing and data mirroring. The goal is to parallel process data on two separate processors in two separate facilities simultaneously, so that if you need to failover from one to the other, there is no disaster recovery and backup. You simply keep going. This is accomplished by clustering machines together into what appears to be a single system image that allows for data sharing and parallel computing. This is run in combination with systems management software capable of managing the end-to-end computing and data resources and performing automated failovers.
Centralized staff. Instead of replicating every staff position at both datacenters, the goal is to leverage the well-developed skills of a single staff that is strategically spread between the datacenters. This ability to streamline staff is largely due to the amount of automation that has been introduced into the datacenters.
A patient migration of applications. Understandably, making a move to an environment like this carries heavy risk. Those who are undertaking such projects plan for implementation periods of many months as they test and retest each application before they move it into the second datacenter for parallel processing.
Transparency minus one. The objective of painstakingly testing, retesting, and migrating applications for parallel processing over many months is to make the move to a 100 percent uptime, parallel processing environment without your customers ever finding out about the project -- because everything continues to run flawlessly while you are doing this. However, when the migration of applications is finally complete, you are going to have to perform and test for the ultimate failover. Because there is always the risk that something could go wrong, you must notify your customers in advance and secure the test window that you need.
Due to the costs and the risks, 100 percent uptime targets aren’t for everyone, but in banking and finance, they are the future. Today’s task for technology companies and payment processors is to develop methodologies that simplify this transition -- which almost certainly will be expected in tomorrow’s financial marketplace.
I thought 5-9's was difficult! I think it's a nice goal to shoot for...and perhaps a few may accomplish it. But when dealing with electronics, I don't know if there's a single person that would bet against a failure somewhere in a 4-5 year lifespan.
The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail: [email protected]
Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.