Storage Doesn't Matter for Bioinformatics? Not So Fast

Joe Stanganelli, Founder and Principal, Beacon Hill Law | 4/30/2012 | 28 comments

Joe Stanganelli
Last week, I wrote an article about how keynote speaker Martin Leach presented a convincing argument to Bio-IT World Conference 2012 attendees here in Boston as to why the biggest obstacle facing the health and life sciences industry in the age of "big-data" is not one of storage, but of computing.

Accessibility, analysis, and integration are the sole true bugaboos, says Leach, making storage issues but a petty distraction when it comes to genomics and others who work with intensive bioinformatics.

Turns out, not everyone here agrees.

Robert Bjornson is director of IT at the Yale Center for Genome Analysis (YCGA). "We spend almost no time thinking about computing. We spend all of our time thinking about storage," told Bjornson to a room of a few dozen conference attendees.

In a presentation about IT infrastructure and hardware, Bjornson talked about the technological challenges YCGA and similar organizations face.

"Drives," he aptly observed, "fail."

Even Leach does not dispute this fact of IT life. The Broad Institute of MIT and Harvard, where Leach is CIO, boasts the largest genomic datacenter in the world, with over 10 petabytes of data on spinning disks -- and every day to day-and-a-half, one of those disks fails.

"When you have 1,000 drives, expect failure," confirms Bjornson, by way of advice. What's more, backing up all of a genomics organization's data -- which can number in the petabytes -- just isn't practical.

Cost is also a factor (Moore's Law notwithstanding) for some customers, says Bjornson -- at least psychologically. Despite the price of storage falling, many enterprise and high-level organizational customers maintain a consumer market perspective. "I can't tell you how many times people have said, 'Why does this cost $1,000 a terabyte?' " says Bjornson, relating laughable characterizations of customers who protest that hard drives at Best Buy can go for about $65 per terabyte.

Big-data customers may be their own worst enemy in more ways than one. YCGA's customers use YCGA's storage and YCGA's cluster. Cautions Bjornson, however, "It's risky to let customers into the factory." They can crash the login node. They can overload the storage. They can "do any one of a number of things that people do when they get the chance," Bjornson says, and any of those things can interfere with their data management and data analysis.

To be fair, this is an example of a risk that falls under both the "storage" and the "accessibility" umbrellas -- and there are others.

For instance, Bjornson himself concedes that search is a huge problem in big data genomics, as he presents a slide that reads, " 'Find' does not work on 2PB on Storage." Genome sequencing, of course, is a data-intensive field -- yet the field of genomics lacks a truly effective data identification solution ("a Google search for data," as Leach called it on Tuesday), says Bjornson. "We don't have it. We really need it."

Nonetheless, "storage, for us, is by far the hard part," maintains Bjornson.

Both sides of the accessibility vs. storage discussion raise very valid points -- and have very real concerns. Alas, the hardware presentation series here at Bio-IT has been somewhat sparsely attended compared to other sessions. Conversely, so many Bio-IT attendees clamored to see the opening keynote speeches on Tuesday that dozens were relegated to an overflow room with a live video feed.

With so many of the attendees here having heard only Leach's advocacy for accessibility, arguments like Bjornson's about the importance of storage seem to have become lost in the din of the conference -- and therefore, ironically, much less accessible.

View Comments: Newest First | Oldest First | Threaded View
Page 1 / 3   >   >>
CurtisFranklin   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 10:50:32 AM
Re: Storage hubris
@white.space, I remember the old statement, "Nature abhors an empty horizontal surface." I think that we could also say that, "Nature abhors an un-filled storage bit," and we'd be just as correct. I've watched storage requirements grow exponentially during the last 30 years -- I can hardly wait for the day when the desktop petabyte is commonplace!
Gigi   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 6:15:56 AM
Gigi
Re: Storage hubris
Joe, any potential outcomes for bioinformatics collaborative projects, for drug discovery. I heard that there are some proposals for TB, HIV and cancer drug discovery.
Gigi   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 6:12:58 AM
Gigi
Re: What about Cloud?
Taimoor, we had similar storage constrains for the Bio informatics projects and it's a collaborative project. What we had done is we formed a virtual group and create some common repository, where we kept all the datas. So those who are interested can access the data at any point of time, irrespective of location through net. This can help to avoid storing the same data at multiple locations.
Joe Stanganelli   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 12:13:59 AM
Re: Storage hubris
Yes, Skr2011, scalability was a big part of the talks at the Conference.

Incidentally, this is why Netflix was pretty unaffected by the huge AWS cloud outage a year ago even though they're a major AWS customer -- because they were smart enough as an organization that handles enormous amounts of data to have so many redundancies that they could handle an outage.  It was the little guys who relied completely but without backups on the cloud who were screwed for the following days.
Joe Stanganelli   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 12:07:52 AM
Re: Storage for Bioinformatics
Very nice breakdown, zerox, and rather one of the points I was going for.  Both concerns ought be heard and addressed for real progress.
Joe Stanganelli   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 12:04:49 AM
Re: Storage hubris
As the saying goes, LuFu, the bigger they are, the harder they fall.
Joe Stanganelli   Storage Doesn't Matter for Bioinformatics? Not So Fast   5/1/2012 12:03:08 AM
Re: What about Cloud?
Well, hold on there, kiddies, with all the talk about the cloud.

A recurring theme at the Conference was the unsuitability of the public cloud for a lot of bioinformatics work -- especially in the field of genomics -- because of the TREMENDOUS amounts of data.  Far too much to be sending across public cloud data lines.

One speaker related a tale of how a NY hospital ran a query that lasted four months.

And, of course, there are issues with proprietary data, HIPAA, and other confidentiality (this one's for you, Sara!) bugaboos.

Private clouds can be suitable in many instances (and, indeed, often the best option), however.

Joe Stanganelli   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:58:24 PM
Re: What about Cloud?
Hi, Dave.  Thanks for weighing in.

This reminds me of a discussion from a Dell-sponsored Webinar E2 hosted quite some time back in which the speakers discussed how many IT Departments are most concerned with just keeping the lights on -- more than anything else.

Perhaps that is another part of YCGA's struggle.  Sure, better integration and accessibility would be nice, but it's a huge effort just to deal with what they have.
white.space   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:57:23 PM
Re: Storage hubris
If we find we have some extra space, someone is going to come up with something to fill that space.

Absolutely! You could have posters and bumper stickers with that adage! I bet you have a Dropbox, and a Google Drive account (at the very least!), and several external hard drives filled with stuff.. :) And still wish there was more. The library of Congress has roughly 20Tb worth of stuff, and sometimes I wish I had as much space!
User Ranking: Blogger
David Wagner   Storage Doesn't Matter for Bioinformatics? Not So Fast   4/30/2012 11:36:01 PM
Re: Storage hubris
@Skr2011- And the opposite seems to be true, too. If we find we have some extra space, someone is going to come up with something to fill that space.
Page 1 / 3   >   >>


The blogs and comments posted on EnterpriseEfficiency.com do not reflect the views of TechWeb, EnterpriseEfficiency.com, or its sponsors. EnterpriseEfficiency.com, TechWeb, and its sponsors do not assume responsibility for any comments, claims, or opinions made by authors and bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.

More Blogs from Joe Stanganelli
Joe Stanganelli   11/20/2013   58 comments
The Internet may be global, and we may call what we see in our browsers the world wide web, but about 70 percent of the world doesn't have Internet access -- the part that's covered by water.
Joe Stanganelli   10/10/2013   62 comments
"Passwords are dead," a Google information security manager decreed at last month's TechCrunch Disrupt. Other pundits have come to the same conclusion. However, these reports are greatly ...
Joe Stanganelli   9/11/2013   83 comments
Nietzsche said, "That which does not kill me can only make me stronger." Scientists have recently discovered that this may be literally true in the case of plastics, and it could be a real ...
Joe Stanganelli   4/24/2013   28 comments
Big-data is a perennial concern at Boston's annual Bio-IT World Expo because of the sheer volume of information the life sciences industry must contend with. The pain points expressed at ...
Latest Archived Broadcast
In this episode, you'll learn how to stretch the limits of your private cloud -- and how to recognize the limits that can't be exceeded.
On-demand Video with Chat
IT has to deploy Server 2012 in a way that fits the architecture of its application delivery system.
E2 IT Migration Zones
IT Migration Zone - UK
Why PowerShell Is Important
Reduce the Windows 8 Footprint for VDI
Rethinking Storage Management
IT Migration Zone - FR
SQL Server : 240 To de mémoire flash pour votre data warehouse
Quand Office vient booster les revenus Cloud et Android de Microsoft
Windows Phone : Nokia veut davantage d'applications (et les utilisateurs aussi)
IT Migration Zone - DE
Cloud Computing: Warum Unternehmen trotz NSA auf die „private“ Wolke setzen sollten
Cloud Computing bleibt Wachstumsmarkt – Windows Azure ist Vorreiter
Like Us on Facebook
Twitter Feed
Enterprise Efficiency Twitter Feed
Site Moderators Wanted
Enterprise Efficiency is looking for engaged readers to moderate the message boards on this site. Engage in high-IQ conversations with IT industry leaders; earn kudos and perks. Interested? E-mail:
moderators@enterpriseefficiency.com
Informed CIO: Dollars & Sense: Virtual Desktop Infrastructure
Cut through the VDI hype and get the full picture -- including ROI and the impact on your Data Center -- to make an informed decision about your virtual desktop infrastructure deployments.

Read the full report
Virtualization Management: Time To Get Serious
Welcome to the backside of the virtualization wave. Discover the state of virtualization management and where analysts are predicting it is heading

Read the full report
PUBLIC SECTOR RESOURCES
WHITE PAPERS
A Video Case Study – Translational Genomics Research Institute
e2 Storage Video


On the Case
TGen IT: Where We're Going Next

7|11|12   |   08:12   |   10 comments


Now that TGen has broken new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions, the company discusses what will come next for it and for personalized medicine.
On the Case
Better Care Through Better Communications

6|6|12   |   02:24   |   11 comments


The achievements of the TGen/Dell project could improve how all people receive healthcare, because they are creating ways to improve end-to-end communication of medical data.
On the Case
TGen IT: Where We Are Now

5|15|12   |   06:58   |   6 comments


TGen is breaking new ground in genomic research by using Dell's storage, cloud, and high-performance computing solutions.
On the Case
TGen IT: Where We Were

4|27|12   |   06:45   |   10 comments


The Translational Genomics Research Institute wanted to save lives, but its efforts were hobbled by immense computing challenges related to collecting, processing, sharing, and storing enormous amounts of data.
On the Case
1,200% Faster

4|18|12   |   02:27   |   12 comments


Through their partnership, Dell and TGen have increased the speed of TGen’s medical research by 1,200 percent.
On the Case
IT May Improve Children's Chances of Survival

4|17|12   |   02:12   |   8 comments


IT is helping medical researchers reach breakthroughs in a way and pace never seen before.
On the Case
Medical Advances in the Cloud

4|10|12   |   1:25   |   5 comments


TGen and Dell are pushing the boundaries of computing, and harnessing the power of the cloud to improve healthcare.
On the Case
TGen: Living the Mission

4|9|12   |   2:25   |   3 comments


TGen's CIO puts the organizational mission at the heart of everything the IT staff does.
On the Case
TGen Speeding Up Biomedical Research to Save More Lives

4|5|12   |   1:59   |   6 comments


The Translational Genomics Research Institute is revamping its computing to improve speed, storage, and collaboration – and, most importantly, to save lives.
On the Case
Computing Power Helping to Save Children's Lives

3|28|12   |   2:13   |   3 comments


The Translational Genomics Institute’s partnership with Dell is enabling them to treat kids with neuroblastoma more quickly and save more lives.
Tom Nolle
How Deep Is My Storage Hierarchy?

7|3|12   |   2:13   |   5 comments


At the GigaOM Structure conference, a startup announced a cloud and virtualization storage optimizing approach that shows there's still a lot of thinking to be done on the way storage joins the virtual world.
E2 Interview
What Other Industries Can Learn From Financial Services

6|13|12   |   02:08   |   3 comments


We asked CIO Steve Rubinow what CIOs in other industries can learn from the financial services industry about datacenter efficiency, security, and green computing.
E2 Interview
Removing Big-Data Flow Bottlenecks

6|12|12   |   02:55   |   No comments


We ask CIO Steve Rubinow what pieces of financial services infrastructure need to perform better to get traders info faster.
E2 Interview
Getting Traders the Data They Need

6|11|12   |   02:04   |   1 comment


We ask CIO Steve Rubinow: What do stock market traders need to know, how fast do they need it, and how can CIOs get it to them?
E2 Interview
Can IT Help Fix the Global Economy?

6|8|12   |   02:32   |   2 comments


We ask CIO Steve Rubinow whether today's IT can help repair the global economy (and if IT played any role in the economy's collapse).
E2 Interview
More Competitive Business via Datacenter Strategy

5|4|12   |   2:46   |   1 comment


Businesses need to be competitive, yet efficient, and both goals affect datacenter design.
E2 Interview
The Recipe for Greater Efficiency

5|3|12   |   3:14   |   2 comments


Intel supplies the best ingredients to drive greater datacenter efficiency and support new compute, storage, and networking needs.
E2 Interview
Datacenters Enabling Business Transformation

5|1|12   |   06:37   |   1 comment


Dell’s Gaurav Chand says that for the first time ever datacenter technology is truly enabling all kinds of organizations to transform their business and achieve new objectives.
Tom Nolle
Cloud Data: Big AND Persistent!

3|28|12   |   2:11   |   10 comments


We always hear about "Big" data, but a real issue in cloud storage is not just bigness but also persistence. A large data model is less complicated than a big application repository that somehow needs to be accessed. The Hadoop send-program-to-data model may be the answer.
Tom Nolle
Project Lightning Streamlines Storage

2|16|12   |   2:09   |   2 comments


EMC's Project Lightning has matured into a product set, and it's important, less because it has new features or capabilities in storage technology and management, than because it may package the state of the art in a way more businesses can deploy.
Tom Nolle
Big Data Appliance Is Big News

1|12|12   |   2:18   |   No comments


Oracle's release of a Hadoop appliance for Big Data may be a signal that we're shifting to database appliances.
Tom Nolle
Myopia Can Hurt Storage Policy

12|22|11   |   2:08   |   No comments


We're at the beginning of a cloud-driven revolution in storage, but Oracle's quarter shows that enterprises are hunkering down on old concepts because they're afraid of the costs in the near term.
Sara Peters
An Untrained User & a Mobile Medical Device

12|19|11   |   2:43   |   11 comments


Untrained end users, clueless central IT staff, and expensive mobile devices are a worrisome combination for healthcare CIOs.
Tom Nolle
Too Many Labels on 'Big Data'?

12|9|11   |   2:12   |   3 comments


However you label it, structured and unstructured information are different and will likely always require different tools.
Sara Peters
E2 Debuts New Storage Section

12|8|11   |   1:51   |   1 comment


Need strategic guidance on everything from SSDs to 100 percent virtualized datacenters? Look no further.