Storage Doesn't Matter for Bioinformatics? Not So Fast

Joe Stanganelli, Founder and Principal, Beacon Hill Law | 4/30/2012 | 28 comments

Joe Stanganelli
Last week, I wrote an article about how keynote speaker Martin Leach presented a convincing argument to Bio-IT World Conference 2012 attendees here in Boston as to why the biggest obstacle facing the health and life sciences industry in the age of "big-data" is not one of storage, but of computing.

Accessibility, analysis, and integration are the sole true bugaboos, says Leach, making storage issues but a petty distraction when it comes to genomics and others who work with intensive bioinformatics.

Turns out, not everyone here agrees.

Robert Bjornson is director of IT at the Yale Center for Genome Analysis (YCGA). "We spend almost no time thinking about computing. We spend all of our time thinking about storage," told Bjornson to a room of a few dozen conference attendees.

In a presentation about IT infrastructure and hardware, Bjornson talked about the technological challenges YCGA and similar organizations face.

"Drives," he aptly observed, "fail."

Even Leach does not dispute this fact of IT life. The Broad Institute of MIT and Harvard, where Leach is CIO, boasts the largest genomic datacenter in the world, with over 10 petabytes of data on spinning disks -- and every day to day-and-a-half, one of those disks fails.

"When you have 1,000 drives, expect failure," confirms Bjornson, by way of advice. What's more, backing up all of a genomics organization's data -- which can number in the petabytes -- just isn't practical.

Cost is also a factor (Moore's Law notwithstanding) for some customers, says Bjornson -- at least psychologically. Despite the price of storage falling, many enterprise and high-level organizational customers maintain a consumer market perspective. "I can't tell you how many times people have said, 'Why does this cost $1,000 a terabyte?' " says Bjornson, relating laughable characterizations of customers who protest that hard drives at Best Buy can go for about $65 per terabyte.

Big-data customers may be their own worst enemy in more ways than one. YCGA's customers use YCGA's storage and YCGA's cluster. Cautions Bjornson, however, "It's risky to let customers into the factory." They can crash the login node. They can overload the storage. They can "do any one of a number of things that people do when they get the chance," Bjornson says, and any of those things can interfere with their data management and data analysis.

To be fair, this is an example of a risk that falls under both the "storage" and the "accessibility" umbrellas -- and there are others.

For instance, Bjornson himself concedes that search is a huge problem in big data genomics, as he presents a slide that reads, " 'Find' does not work on 2PB on Storage." Genome sequencing, of course, is a data-intensive field -- yet the field of genomics lacks a truly effective data identification solution ("a Google search for data," as Leach called it on Tuesday), says Bjornson. "We don't have it. We really need it."

Nonetheless, "storage, for us, is by far the hard part," maintains Bjornson.

Both sides of the accessibility vs. storage discussion raise very valid points -- and have very real concerns. Alas, the hardware presentation series here at Bio-IT has been somewhat sparsely attended compared to other sessions. Conversely, so many Bio-IT attendees clamored to see the opening keynote speeches on Tuesday that dozens were relegated to an overflow room with a live video feed.

With so many of the attendees here having heard only Leach's advocacy for accessibility, arguments like Bjornson's about the importance of storage seem to have become lost in the din of the conference -- and therefore, ironically, much less accessible.

View Comments: Newest First | Oldest First | Threaded View
<<   <   Page 2 / 3   >   >>
Skr2011   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:32:35 PM
Re: Storage hubris
@ David yep! It always works that way doesn't it. We think we have enough space and then BAM something new comes along that requires just that much more....
David Wagner   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:27:46 PM
Re: Storage hubris
@Skr2011- I think we do need to find a way to reduce the number of spindles as storage grows. But you are right, there are graceful ways to lower the problem of disk failure. I think eventually, we're going to ditch the spindle for some very clever new storage device. When we do, we'll see yet another explosion of data.
David Wagner   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:25:15 PM
Re: Storage for Bioinformatics
@Zerox203- Very nice point. Just because someone isn't focusing on an aspect of the problem doesn't mean it didn't require work. It just means they're talking about somehting else.
David Wagner   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:21:33 PM
Re: Storage hubris
@Lufu- You're right. Storage is the field goal kicker of IT. It is a shame, too, because as data gains more value when it is stored well and made more easily accessible.
Taimoor Zubair   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 10:17:09 PM
Re: What about Cloud?
@Zerox203: I agree that bioinformatics data might be very different from data that normally organizations store. However, at the end of the day, it's essentially bits and bytes that get stored. The volume of bioinformatics data may be considerably large but considering the volume of data in companies like Facebook, E-Bay or Amazon, it may equal if not less. I certainly feel it can YCGA can look up to solutions other companies are using in managing Big Data.
Skr2011   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 9:40:46 PM
Re: Storage hubris
@LuFuThe number one tenet of storage at scale is "things fail". When you scale up, you will find that 2-5% of your disks are going to fail (as iI am sure your are aware) and when you have a lot of spindles that's a pretty large number. You have to manage against such failures, so it isn't just about buying TBs of disk. The design of the systems is to  fail gracefully. Depending on your applications/goals you might need to make any storage solution highly available, which means you need redundancy, and to scale reads you will almost certainly need to partition your data.

I recommend checking out some of the presentations by Chris Dagdigian, e.g.
zerox203   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 8:23:34 PM
Re: What about Cloud?
well, we have to keep in mind that we're talking about a specific field here rather than general storage. Bionformatics definitely brings some special concerns to the table, although it is a broad field itself. At organizations like the ones Joe talks about in the article, and especially with some types of data, security may be of more importance than it is at other organizations - if only because it's bionformatics data could prove to be a more enticing theft target.  Even if not enough to prohibit cloud storage, these concerns could be enough to delay or prohibit it, which brings us back to square one - storage is still something that needs to be planned for, and can't be taken for granted yet.
Taimoor Zubair   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 8:06:26 PM
Re: What about Cloud?
"But it isn't like this isn't being done."

@David: I'd say there are quite a few companies, that are dealing with big data and may have volumes much larger than YCGA. They need to study the models by other companies and see how they have resolved the storage issues. Yes, funds can be a problem though if there's limited supply of them.

zerox203   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 8:06:10 PM
Re: Storage for Bioinformatics
The give and take here seems pretty reasonable on both sides. Those saying that storage is not a primary concern are aware (hopefully) that they're talking about an ideal theoretical situation where things run as best as they possibly can in the present or the near future. They're also probably aware that it doesn't work this way at many organizations yet. Those that are talking about how important storage still is seem to agree that their target goal is to make storage a given so that other things can be focused on and that that's an attainable goal - they just think we'll get there later rather than sooner, and want to make sure the ''optimists'' don't get ahead of themselves.

In other words, it's a classic balancing act - the trick is to keep pushing the envelope and moving forward without overextending yourself. It's to be expected that there's a little push and pull to keep it in the right spot. As for me personally, I'm inclined to agree with the storage folks - it's more likely that things look great at an organization becasue nothing has gone wrong yet than for the opposite to be true.
Taimoor Zubair   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 8:00:35 PM
Re: Storage Doesn't Matter for Bioinformatics? Not So Fast
@tekedge: I agree that incremetntal backups can be a useful way of taking daily backups rather than backing the whole DB up everyday. As long as the incremental backups are limited to terabytes, that should not be a problem.
<<   <   Page 2 / 3   >   >>

The blogs and comments posted on do not reflect the views of TechWeb,, or its sponsors., TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.

More Blogs from Joe Stanganelli
Joe Stanganelli   11/20/2013   58 comments
The Internet may be global, and we may call what we see in our browsers the world wide web, but about 70 percent of the world doesn't have Internet access -- the part that's covered by water.
Joe Stanganelli   10/10/2013   62 comments
"Passwords are dead," a Google information security manager decreed at last month's TechCrunch Disrupt. Other pundits have come to the same conclusion. However, these reports are greatly ...
Joe Stanganelli   9/11/2013   83 comments
Nietzsche said, "That which does not kill me can only make me stronger." Scientists have recently discovered that this may be literally true in the case of plastics, and it could be a real ...
Joe Stanganelli   4/24/2013   28 comments
Big-data is a perennial concern at Boston's annual Bio-IT World Expo because of the sheer volume of information the life sciences industry must contend with. The pain points expressed at ...
Latest Archived Broadcast
In this episode, you'll learn how to stretch the limits of your private cloud -- and how to recognize the limits that can't be exceeded.
On-demand Video with Chat
IT has to deploy Server 2012 in a way that fits the architecture of its application delivery system.
E2 IT Migration Zones
IT Migration Zone - UK
Why PowerShell Is Important
Reduce the Windows 8 Footprint for VDI
Rethinking Storage Management
IT Migration Zone - FR
SQL Server : 240 To de mémoire flash pour votre data warehouse
Quand Office vient booster les revenus Cloud et Android de Microsoft
Windows Phone : Nokia veut davantage d'applications (et les utilisateurs aussi)
IT Migration Zone - DE
Cloud Computing: Warum Unternehmen trotz NSA auf die „private“ Wolke setzen sollten
Cloud Computing bleibt Wachstumsmarkt – Windows Azure ist Vorreiter
Like Us on Facebook
Twitter Feed
Enterprise Efficiency Twitter Feed
Site Moderators Wanted
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail:
[email protected]
Informed CIO: Dollars & Sense: Virtual Desktop Infrastructure
Cut through the VDI hype and get the full picture -- including ROI and the impact on your Data Center -- to make an informed decision about your virtual desktop infrastructure deployments.

Read the full report
Virtualization Management: Time To Get Serious
Welcome to the backside of the virtualization wave. Discover the state of virtualization management and where analysts are predicting it is heading

Read the full report
A Video Case Study – Translational Genomics Research Institute
e2 Storage Video

On the Case
TGen IT: Where We're Going Next

7|11|12   |   08:12   |   10 comments

Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
On the Case
Better Care Through Better Communications

6|6|12   |   02:24   |   11 comments

The achievements of the TGen/Dell project could improve how all people receive healthcare, because they are creating ways to improve end-to-end communication of medical data.
On the Case
TGen IT: Where We Are Now

5|15|12   |   06:58   |   6 comments

TGen is breaking new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions.
On the Case
TGen IT: Where We Were

4|27|12   |   06:45   |   10 comments

The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
On the Case
1,200% Faster

4|18|12   |   02:27   |   12 comments

Through their partnership, Dell and TGen have increased the speed of TGen’s medical research by 1,200 percent.
On the Case
IT May Improve Children's Chances of Survival

4|17|12   |   02:12   |   8 comments

IT is helping medical researchers reach breakthroughs in a way and pace never seen before.
On the Case
Medical Advances in the Cloud

4|10|12   |   1:25   |   5 comments

TGen and Dell are pushing the boundaries of computing, and harnessing the power of the cloud to improve healthcare.
On the Case
TGen: Living the Mission

4|9|12   |   2:25   |   3 comments

TGen's CIO puts the organizational mission at the heart of everything the IT staff does.
On the Case
TGen Speeding Up Biomedical Research to Save More Lives

4|5|12   |   1:59   |   6 comments

The Translational Genomics Research Institute is revamping its computing to improve speed, storage, and collaboration – and, most importantly, to save lives.
On the Case
Computing Power Helping to Save Children's Lives

3|28|12   |   2:13   |   3 comments

The Translational Genomics Institute’s partnership with Dell is enabling them to treat kids with neuroblastoma more quickly and save more lives.
Tom Nolle
How Deep Is My Storage Hierarchy?

7|3|12   |   2:13   |   5 comments

At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
E2 Interview
What Other Industries Can Learn From Financial Services

6|13|12   |   02:08   |   3 comments

We asked CIO Steve Rubinow what CIOs in other industries can learn from the financial services industry about datacenter efficiency, security, and green computing.
E2 Interview
Removing Big-Data Flow Bottlenecks

6|12|12   |   02:55   |   No comments

We ask CIO Steve Rubinow what pieces of financial services infrastructure need to perform better to get traders info faster.
E2 Interview
Getting Traders the Data They Need

6|11|12   |   02:04   |   1 comment

We ask CIO Steve Rubinow: What do stock market traders need to know, how fast do they need it, and how can CIOs get it to them?
E2 Interview
Can IT Help Fix the Global Economy?

6|8|12   |   02:32   |   2 comments

We ask CIO Steve Rubinow whether today's IT can help repair the global economy (and if IT played any role in the economy's collapse).
E2 Interview
More Competitive Business via Datacenter Strategy

5|4|12   |   2:46   |   1 comment

Businesses need to be competitive, yet efficient, and both goals affect datacenter design.
E2 Interview
The Recipe for Greater Efficiency

5|3|12   |   3:14   |   2 comments

Intel supplies the best ingredients to drive greater datacenter efficiency and support new compute, storage, and networking needs.
E2 Interview
Datacenters Enabling Business Transformation

5|1|12   |   06:37   |   1 comment

Dell’s Gaurav Chand says that for the first time ever datacenter technology is truly enabling all kinds of organizations to transform their business and achieve new objectives.
Tom Nolle
Cloud Data: Big AND Persistent!

3|28|12   |   2:11   |   10 comments

We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
Tom Nolle
Project Lightning Streamlines Storage

2|16|12   |   2:09   |   2 comments

EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.
Tom Nolle
Big Data Appliance Is Big News

1|12|12   |   2:18   |   No comments

Oracle's release of a Hadoop appliance for Big Data may be a signal that we're shifting to database appliances.
Tom Nolle
Myopia Can Hurt Storage Policy

12|22|11   |   2:08   |   No comments

We're at the beginning of a cloud-driven revolution in storage, but Oracle's quarter shows that enterprises are hunkering down on old concepts because they're afraid of the costs in the near term.
Sara Peters
An Untrained User & a Mobile Medical Device

12|19|11   |   2:43   |   11 comments

Untrained end users, clueless central IT staff, and expensive mobile devices are a worrisome combination for healthcare CIOs.
Tom Nolle
Too Many Labels on 'Big Data'?

12|9|11   |   2:12   |   3 comments

However you label it, structured and unstructured information are different and will likely always require different tools.
Sara Peters
E2 Debuts New Storage Section

12|8|11   |   1:51   |   1 comment

Need strategic guidance on everything from SSDs to 100 percent virtualized datacenters? Look no further.