Spreadsheets were fine in the 1980s -- and at the time, they represented an innovative solution to a wide range of do-it-yourself computing challenges. You could see what you were doing on one sheet, perform calculations, write scripts, and so on.
But the problem with spreadsheets is that the visual presentation constrains our mental picture of the data within a two-dimensional space. Sure, you can add worksheets and "Save as" with last year's data. But once you start adding dimensions to your rows and columns outside what you can comfortably fit on a single screen, you start to lose track of the plot.
You start cutting-and-pasting, there's data everywhere, your formulas become more and more opaque, and before you know it, you're the only one in the world who can understand what you're doing. Rather than audit your work, your supervisors instead begin to trust you, and that's when you might start taking chances with the firm's money. From there, it's not too long before regulators are trying to figure out how your spreadsheet managed to destroy billions of dollars in a bank's market capitalization due to an unsupervised gamble.
A useful concept in programming involves maintaining separate layers. Keep the data away from the business logic, and keep the business logic away from the presentation. Each layer should stand on its own. If you make a change to the data, you shouldn't have to change the business logic, and if you change the business logic, you shouldn't have to change the presentation logic. (Here's a chart from Microsoft that shows how the separation concept works.)
Excel makes it far too easy to keep the data, business logic, and presentation layers in one place. On a single worksheet, you can have some cells containing data, other calculated cells with your business logic, and an adjacent chart acting as your presentation layer. Although there are many ways to do things in Excel, the blank ledger of a new spreadsheet invites people to use Excel in a very specific way, which is also the easiest and most intuitive way, and that is to pile everything into one big scratchpad.
What's more, when it comes to rigorous statistical analysis, there are known issues involved with using Excel. As the ability to perform analytics on large data sets becomes more important, the short-term benefit of ease-of-use gives way to the long-term benefit of learning an environment purpose-built for analytics such as R, a powerful environment for statistical computing.
Modeling data in Excel is like cooking a meal using whatever happens to be in your refrigerator, and then serving up the completed dish to your friends. They may like your cooking, but if they have any questions about whether you used dairy, gluten, shellfish, or peanuts, they're on their own because you don't even know what's in there.
By contrast, using R is like sending someone not only a cooked meal that's ready to be popped into the microwave, but also a full recipe with step-by-step instructions detailing exactly how you made it, plus a grocery bag containing all of the required ingredients should they wish to modify the recipe to their own tastes. Even if you hacked around for hours looking at a dataset from every which way, the precise steps that you took along the way are visible, traceable and auditable. That's good for risk managers as well as people with food allergies.
Certainly, writing a repeatable recipe is more difficult than hacking around in the kitchen until it looks right. However, it's precisely that discipline that underscores the potential benefit of using R in business settings.
Don't let the complexity scare you off. If you work with data, you owe it to yourself and your career longevity to learn this new set of tools. If you're interested in making the switch from the spreadsheet to an interactive development environment for statistical computing, visit the R-Project site, take a course, try this Excel-to-R toolkit, and let me know how it goes.