As a society, we face growing problems repairing and maintaining the vital infrastructure we once took for granted.
This infrastructure ranges from the roads and bridges we drive every day to airports and railroads, communications networks, and electrical power grids -- the list goes on and on. In some cases we have seen tragic events resulting in the loss of life, such as the I-35W bridge collapse in Minnesota -- not far from where I live. Others represent mere inconveniences, such as local or regional power outages.
Most of these incidents involve aging, worn-out physical infrastructure desperately in need of repair or replacement. But infrastructure doesn't have to be old -- or even physical -- to cause problems when it fails.
The IT systems and applications all around us form a digital infrastructure that most enterprises take for granted -- until it's not there. A single glitch in the system can affect millions of people and companies, whether it's an airline computer that causes flight delays, lost email or text messages, or unplanned downtime for Websites and cloud applications. The problem isn't theoretical: Blackberry, Carbonite Inc. , Facebook (Nasdaq: FB), Twitter Inc. , Google (Nasdaq: GOOG), and eBay Inc. (Nasdaq: EBAY) -- all of these services, and many others, have suffered recent outages.
This isn't a new phenomenon. In fact, it's all part of a continuing problem that combines rapid growth, cost-cutting, competitive pressures, and short-term thinking. And it all adds up to more "fail whales" than any of us would like to see.
These applications are often designed, built, deployed, and managed with a throwaway mentality: "We'll replace it soon or throw it away and start over -- and if we don't we can leave the mess for somebody else to clean up."
Take a step back to the dotcom era a decade ago, when you heard the same "build it now, fix it later" mantra. Guess what? For the companies that survived, "fix it later" means fixing it now. These are today's legacy systems, but the people who designed them are long gone to work on the next cutting-edge project.
It's deja vu all over again, folks. Today's IT disruptions are tied to the same old application development, deployment, and scalability challenges we've been dealing with for decades. The question now is whether we're going to repeat the same mistakes again and again, without learning anything from them.
I think that's a real possibility. We live in an era of throwaway electronics and shiny new toys (SNTs). We throw away our still-new PCs for slightly newer ones. Then we throw those away for iPads. A new iPhone means we throw away the last one, and a new platform like Android means we throw that one away, too. It's a mindset that can easily extend to the underlying infrastructure, which can suffer from not being invested in or kept up to date.
There is some good news here. A few digital infrastructure carriers such as AT&T Inc. (NYSE: T) are investing billions to upgrade and optimize their networks and services. [Disclosure: I have used AT&T for about 15 years.] My local carrier, Qwest Communications International Inc. (NYSE: Q), has also been upgrading its networks -- granted, not as fast or as much as I would like, but they're still moving in the right direction.
Enterprises have to be aware that the existing public and private infrastructure you rely on may be more vulnerable than you expect. That also means now may be the time for infrastructure updates and improvements to things that many not qualify as "SNTs" but which we still rely on every day.
Just as important, enterprises must incorporate the lesson of infrastructure frailty into their own internal infrastructure choices. It's trendy -- and tempting -- to simply replace the old with the new, but it's critical to make sure that the replacement technology you choose will be cost-effective to operate and maintain over the long haul.
How do you do that? Leverage the hard lessons we've learned from designing, building, deploying, and managing resilient infrastructure, whether it's transportation systems, communications networks, bridges and roads, or information technology. Focus on single points of failure, and leverage Infrastructure Resource Management (IRM) processes and best practices.
This may not sound particularly appetizing in an age of severe competition and cost pressures. But there really isn't much choice. You can either pay up front to get this right, or you can pay even more later with downtime and service disruptions added to the bill. Either way, there will be a price to pay.