Storage Doesn't Matter for Bioinformatics? Not So Fast

Joe Stanganelli, Founder and Principal, Beacon Hill Law | 4/30/2012 | 28 comments

Joe Stanganelli
Last week, I wrote an article about how keynote speaker Martin Leach presented a convincing argument to Bio-IT World Conference 2012 attendees here in Boston as to why the biggest obstacle facing the health and life sciences industry in the age of "big-data" is not one of storage, but of computing.

Accessibility, analysis, and integration are the sole true bugaboos, says Leach, making storage issues but a petty distraction when it comes to genomics and others who work with intensive bioinformatics.

Turns out, not everyone here agrees.

Robert Bjornson is director of IT at the Yale Center for Genome Analysis (YCGA). "We spend almost no time thinking about computing. We spend all of our time thinking about storage," told Bjornson to a room of a few dozen conference attendees.

In a presentation about IT infrastructure and hardware, Bjornson talked about the technological challenges YCGA and similar organizations face.

"Drives," he aptly observed, "fail."

Even Leach does not dispute this fact of IT life. The Broad Institute of MIT and Harvard, where Leach is CIO, boasts the largest genomic datacenter in the world, with over 10 petabytes of data on spinning disks -- and every day to day-and-a-half, one of those disks fails.

"When you have 1,000 drives, expect failure," confirms Bjornson, by way of advice. What's more, backing up all of a genomics organization's data -- which can number in the petabytes -- just isn't practical.

Cost is also a factor (Moore's Law notwithstanding) for some customers, says Bjornson -- at least psychologically. Despite the price of storage falling, many enterprise and high-level organizational customers maintain a consumer market perspective. "I can't tell you how many times people have said, 'Why does this cost $1,000 a terabyte?' " says Bjornson, relating laughable characterizations of customers who protest that hard drives at Best Buy can go for about $65 per terabyte.

Big-data customers may be their own worst enemy in more ways than one. YCGA's customers use YCGA's storage and YCGA's cluster. Cautions Bjornson, however, "It's risky to let customers into the factory." They can crash the login node. They can overload the storage. They can "do any one of a number of things that people do when they get the chance," Bjornson says, and any of those things can interfere with their data management and data analysis.

To be fair, this is an example of a risk that falls under both the "storage" and the "accessibility" umbrellas -- and there are others.

For instance, Bjornson himself concedes that search is a huge problem in big data genomics, as he presents a slide that reads, " 'Find' does not work on 2PB on Storage." Genome sequencing, of course, is a data-intensive field -- yet the field of genomics lacks a truly effective data identification solution ("a Google search for data," as Leach called it on Tuesday), says Bjornson. "We don't have it. We really need it."

Nonetheless, "storage, for us, is by far the hard part," maintains Bjornson.

Both sides of the accessibility vs. storage discussion raise very valid points -- and have very real concerns. Alas, the hardware presentation series here at Bio-IT has been somewhat sparsely attended compared to other sessions. Conversely, so many Bio-IT attendees clamored to see the opening keynote speeches on Tuesday that dozens were relegated to an overflow room with a live video feed.

With so many of the attendees here having heard only Leach's advocacy for accessibility, arguments like Bjornson's about the importance of storage seem to have become lost in the din of the conference -- and therefore, ironically, much less accessible.

View Comments: Newest First | Oldest First | Threaded View
<<   <   Page 3 / 3
Sara Peters   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 4:31:51 PM
other genomic research institute
If you haven't already, check out the On the Case video documentary series we're doing about the Translational Genomics Research Institute. Storage is DEFINITELY a challenge for them -- but processing speed and data sharing are just as important. Check it out: http://www.enterpriseefficiency.com/video.asp?section_id=1467&doc_id=242705

 
Sara Peters   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 4:22:03 PM
gold star
Joe you get extra credit simply for using the word "bugaboo."
Taimoor Zubair   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 2:51:05 PM
Re: Storage hubris
"What I've learned is it's the one leg in computer processing that is taken for granted until it fails."

@Lufu: I agree with you on this. Recently I had a meeting over a project's budget with finance. It took a whole while and considerable efforts convincing those non-technical folks why we needed redundant data disks inside a server and why can't normal disks replace a RAID controller.

LuFu   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 2:05:09 PM
Storage hubris
I worked in data storage for many years. What I've learned is it's the one leg in computer processing that is taken for granted until it fails. One of the first things I learned was the MTBF of a disk drive which is it's Mean Time Between Failure. It's a calibration in hours of a hard drive's expected lifetime. So, a manufacturer reports 1.5 million hours MTBF and you think that this will last a lifetime if not longer. Of course the drive spec should include the asterisk - *Your drive mileage may vary - since Murphy's Law is usually excluded from the MTBF equation. Ergo, I vote for Bjornson and his concern about storage. And when it's big storage then it becomes a larger issue. As we all know, size does matter.
tekedge   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 1:49:44 PM
Storage Doesn't Matter for Bioinformatics? Not So Fast
"What's more, backing up all of a genomics organization's data -- which can number in the petabytes -- just isn't practical. "

I am a bit confused with that statement. Does the poster mean to say, backing up petabytes of data daily? Yes that wd be a difficult task but surely on a periodic basis the entire database can be backed up and incremental backups on daily basis which should probably be in terrabytes should be possible.

 
syedzunair   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 12:21:44 PM
Re: What about Cloud?
It's at least better than backing up your data in a hundred Hard Drives when all the data can perish if a fire breaks out.

It seems much better than storing data on tapes or on hard drives. With storage on the cloud you get the option of storing your data in a geographically separate location as compared to your current business location. It will most certainly help in data recovery if the primary location goes down. 
David Wagner   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:46:08 AM
Re: What about Cloud?
It is tempting to think that perhaps the Broad Institute is just better at running its data center (or better funded) than Yale. Sure, drives fail. Sure backing up petabytes of data is hard and expensive.

But it isn't like this isn't being done.

On the other hand, what I really think Dr. Bjornson is probably feeling is the tightness of the standard academic budget. Hopefully, Dr. Leach is correct that prices continue to drop and academic datacenters can afford to do more.

In the meantime, I doubt there's a datacenter out there that doesn't feel the budget pinch.
nasimson   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 10:02:34 AM
What about Cloud?
We're no longer in the 70s when Bill Gates announced that a couple of kilobytes of storage space is more than enough for everyone.

Storage demands increase, and it goes for every kind of organization given today's storage needs.

With the inception of Cloud we sure have another pretty darn good alternative for storing. Storage providers aren't charging much at this point in order to commercialize Cloud. Perhaps this is the next wise step?

It's at least better than backing up your data in a hundred Hard Drives when all the data can perish if a fire breaks out.
<<   <   Page 3 / 3


The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.

More Blogs from Joe Stanganelli
Joe Stanganelli   11/20/2013   58 comments
The Internet may be global, and we may call what we see in our browsers the world wide web, but about 70 percent of the world doesn't have Internet access -- the part that's covered by water.
Joe Stanganelli   10/10/2013   62 comments
"Passwords are dead," a Google information security manager decreed at last month's TechCrunch Disrupt. Other pundits have come to the same conclusion. However, these reports are greatly ...
Joe Stanganelli   9/11/2013   83 comments
Nietzsche said, "That which does not kill me can only make me stronger." Scientists have recently discovered that this may be literally true in the case of plastics, and it could be a real ...
Joe Stanganelli   4/24/2013   28 comments
Big-data is a perennial concern at Boston's annual Bio-IT World Expo because of the sheer volume of information the life sciences industry must contend with. The pain points expressed at ...
Latest Archived Broadcast
In this episode, you'll learn how to stretch the limits of your private cloud -- and how to recognize the limits that can't be exceeded.
On-demand Video with Chat
IT has to deploy Server 2012 in a way that fits the architecture of its application delivery system.
E2 IT Migration Zones
IT Migration Zone - UK
Why PowerShell Is Important
Reduce the Windows 8 Footprint for VDI
Rethinking Storage Management
IT Migration Zone - FR
SQL Server : 240 To de mémoire flash pour votre data warehouse
Quand Office vient booster les revenus Cloud et Android de Microsoft
Windows Phone : Nokia veut davantage d'applications (et les utilisateurs aussi)
IT Migration Zone - DE
Cloud Computing: Warum Unternehmen trotz NSA auf die „private“ Wolke setzen sollten
Cloud Computing bleibt Wachstumsmarkt – Windows Azure ist Vorreiter
Like Us on Facebook
Twitter Feed
Enterprise Efficiency Twitter Feed
Site Moderators Wanted
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail:
[email protected]
Informed CIO: Dollars & Sense: Virtual Desktop Infrastructure
Cut through the VDI hype and get the full picture -- including ROI and the impact on your Data Center -- to make an informed decision about your virtual desktop infrastructure deployments.

Read the full report
Virtualization Management: Time To Get Serious
Welcome to the backside of the virtualization wave. Discover the state of virtualization management and where analysts are predicting it is heading

Read the full report
PUBLIC SECTOR RESOURCES
WHITE PAPERS
A Video Case Study – Translational Genomics Research Institute
e2 Storage Video


On the Case
TGen IT: Where We're Going Next

7|11|12   |   08:12   |   10 comments


Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
On the Case
Better Care Through Better Communications

6|6|12   |   02:24   |   11 comments


The achievements of the TGen/Dell project could improve how all people receive healthcare, because they are creating ways to improve end-to-end communication of medical data.
On the Case
TGen IT: Where We Are Now

5|15|12   |   06:58   |   6 comments


TGen is breaking new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions.
On the Case
TGen IT: Where We Were

4|27|12   |   06:45   |   10 comments


The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
On the Case
1,200% Faster

4|18|12   |   02:27   |   12 comments


Through their partnership, Dell and TGen have increased the speed of TGen’s medical research by 1,200 percent.
On the Case
IT May Improve Children's Chances of Survival

4|17|12   |   02:12   |   8 comments


IT is helping medical researchers reach breakthroughs in a way and pace never seen before.
On the Case
Medical Advances in the Cloud

4|10|12   |   1:25   |   5 comments


TGen and Dell are pushing the boundaries of computing, and harnessing the power of the cloud to improve healthcare.
On the Case
TGen: Living the Mission

4|9|12   |   2:25   |   3 comments


TGen's CIO puts the organizational mission at the heart of everything the IT staff does.
On the Case
TGen Speeding Up Biomedical Research to Save More Lives

4|5|12   |   1:59   |   6 comments


The Translational Genomics Research Institute is revamping its computing to improve speed, storage, and collaboration – and, most importantly, to save lives.
On the Case
Computing Power Helping to Save Children's Lives

3|28|12   |   2:13   |   3 comments


The Translational Genomics Institute’s partnership with Dell is enabling them to treat kids with neuroblastoma more quickly and save more lives.
Tom Nolle
How Deep Is My Storage Hierarchy?

7|3|12   |   2:13   |   5 comments


At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
E2 Interview
What Other Industries Can Learn From Financial Services

6|13|12   |   02:08   |   3 comments


We asked CIO Steve Rubinow what CIOs in other industries can learn from the financial services industry about datacenter efficiency, security, and green computing.
E2 Interview
Removing Big-Data Flow Bottlenecks

6|12|12   |   02:55   |   No comments


We ask CIO Steve Rubinow what pieces of financial services infrastructure need to perform better to get traders info faster.
E2 Interview
Getting Traders the Data They Need

6|11|12   |   02:04   |   1 comment


We ask CIO Steve Rubinow: What do stock market traders need to know, how fast do they need it, and how can CIOs get it to them?
E2 Interview
Can IT Help Fix the Global Economy?

6|8|12   |   02:32   |   2 comments


We ask CIO Steve Rubinow whether today's IT can help repair the global economy (and if IT played any role in the economy's collapse).
E2 Interview
More Competitive Business via Datacenter Strategy

5|4|12   |   2:46   |   1 comment


Businesses need to be competitive, yet efficient, and both goals affect datacenter design.
E2 Interview
The Recipe for Greater Efficiency

5|3|12   |   3:14   |   2 comments


Intel supplies the best ingredients to drive greater datacenter efficiency and support new compute, storage, and networking needs.
E2 Interview
Datacenters Enabling Business Transformation

5|1|12   |   06:37   |   1 comment


Dell’s Gaurav Chand says that for the first time ever datacenter technology is truly enabling all kinds of organizations to transform their business and achieve new objectives.
Tom Nolle
Cloud Data: Big AND Persistent!

3|28|12   |   2:11   |   10 comments


We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
Tom Nolle
Project Lightning Streamlines Storage

2|16|12   |   2:09   |   2 comments


EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.
Tom Nolle
Big Data Appliance Is Big News

1|12|12   |   2:18   |   No comments


Oracle's release of a Hadoop appliance for Big Data may be a signal that we're shifting to database appliances.
Tom Nolle
Myopia Can Hurt Storage Policy

12|22|11   |   2:08   |   No comments


We're at the beginning of a cloud-driven revolution in storage, but Oracle's quarter shows that enterprises are hunkering down on old concepts because they're afraid of the costs in the near term.
Sara Peters
An Untrained User & a Mobile Medical Device

12|19|11   |   2:43   |   11 comments


Untrained end users, clueless central IT staff, and expensive mobile devices are a worrisome combination for healthcare CIOs.
Tom Nolle
Too Many Labels on 'Big Data'?

12|9|11   |   2:12   |   3 comments


However you label it, structured and unstructured information are different and will likely always require different tools.
Sara Peters
E2 Debuts New Storage Section

12|8|11   |   1:51   |   1 comment


Need strategic guidance on everything from SSDs to 100 percent virtualized datacenters? Look no further.