A few blogs back, I asserted that many organizations are paying “taxes” that they shouldn’t have to pay. These taxes are in the form of extra costs from overprovisioning servers just so that performance doesn’t become unacceptable during data protection operations. Databases underpin most business-critical systems out there, so they are a great place to show this. I discussed some specific SQL Server examples. This is not just a Microsoft problem; let’s take a look at Oracle.
The standard utility for performing database backup, restore, and recovery of Oracle databases is RMAN. Here are some results quantifying the impact of RMAN in an Oracle database environment. This is for a 2-Node Oracle 11g R2 RAC cluster running a TPC-C style transactional workload.
There are a couple of key points to note:
Even though this was a lightly loaded system, CPU Utilization increased 30 percent to 40 percent for the entire duration of the backup.
Response time and transactional performance see a negative spike for the first couple of minutes of the backup window.
Conclusion: You have to pay the “tax man” for Oracle when you use RMAN on your Oracle host.
Thanks for the comment @singlemud . It is funny... I used to work for JavaSoft selling commercial licenses for the use of Java technology. And yes Java has been blamed for bogging down many-a-system.
In this case everything is performing as expected. The problem arises when the CPU cycles demanded by the Backup process are added to the CPU cycles needed by the production Database process. In order to have a single system that can run both jobs with acceptable latency, that system must be sized to be able to run both jobs simultaneously. That means seriously over-sizing the system when compared to the size of a system that would be needed to only run the production Database workload.
The solution is to offload backup processing to a dedicated backup server.
I work on IBM DB2 on distributed systems(Unix/iseries/Windows) and I can tell you that the 'taxman' always shows up. Eventhough my backups are done in a quiesce mode and offline with no users connected,the cpu is almost certainly at 35-40% during the entire length of the backup.
But do you see this problem restricted only to Oracle and SQL server? What about the smaller players like MySQL ,PostGreSQL and of course IBM's DB2???
@eethtworkz I don't have data for the other databases mentioned in your question. However I can say this with certainty - The data protection process takes resources. The question becomes where do you allocate those resources - in your production servers or your storage.
"But what happens if De-Duping and De-Compressing are part of the enterprise Policy???[Have to be done whether we like it or not].
Is there any way to avoid such a massive hit on Performance then?"
@eethworkz - The benefits of block-based auto-tiering will accrue to any data stored. I don't know the type of deduplication and compression technologies or the impact on performance in your case, but I'm going to hazard a guess that auto-tiering could mitigate some of that impact.
Data reduction techniques are more often used for archive files and backups. File-based storage (even primary) is another place that this is becoming more common.
Blanket application of a "deduplicate and compress" policy to primary storage for performance-sensitive databases may be counter-productive. The amount of $ saved in storage cost may not cover the costs of mitigating performance impacts over time.
what I meant is that if you've got a performance-sensitive workload to the point where you are auto-tiering hot data to SSD, you aren't going to want to incur additional overhead of de-duping and de-compressing data
But what happens if De-Duping and De-Compressing are part of the enterprise Policy???[Have to be done whether we like it or not].
Is there any way to avoid such a massive hit on Performance then?
The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail: firstname.lastname@example.org
Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.