Are you ready to harness the power of a piranha? New software with big-data teeth could change the way your agency looks at text.
Oak Ridge National Labs' Computation Data Analytics Group (CDA) has spent the last nine years working on a system that looks not only at text data but at the context around the text. It tries, in a sense, to "read between the lines" of the data to help with detailed analysis of the contents. Doing this with massive amounts of data is hard, and the CDA has pulled out all the tech stops in order to make the analysis happen in a reasonable amount of time. In a statement about Piranha on the CDA website, they write:
We have pioneered a software agent approach to text analysis that uses a large number of agents distributed over very large computer clusters. This method works much faster than traditional approaches and provides the capability to cluster massive amounts of textual information in relatively short amounts of time, due to the scalability of the agent architecture.
Those are fine words, but what do they really mean for IT executives in government agencies and departments? On one level, they mean that the ability to make some sense out of giant mounds of text files, whether they're in the form of email messages, tweets, or scanned and OCR'd historical documents, has just gotten a bit closer. That's going to be good news for anyone trying to figure out how to make the "wish list" a reality for departments dealing with law enforcement, public health, or natural resource management (to name three quickly obvious areas).
Piranha could also be good news for someone who's trying to figure out what to do with a room full of slightly out-of-date workstations or servers. Make no mistake, Piranha is designed to run best on a high-performance cluster of the sort that a place like, say, Oak Ridge National Labs probably has sitting around waiting for a task. If you're fresh out of high-performance clusters, though, you can run the Piranha server on a cluster made up of just about anything that will support the Java Runtime Environment (JRE), version 1.4.2 or higher. That simple requirement covers an awful lot of computing power in relatively modest chunks.
The modest hardware requirements and very rich set of functions make Piranha perfect for departments and agencies with more brainpower than hardware capex in the budget. You should understand going in that Piranha is not a simple system: It's powerful but has an initial learning curve that approaches vertical. Once scaled, though, there are few big-data problems involving text that Piranha can't help you solve in a very cost-effective manner.
Big-data presents a problem set that many government agencies must deal with. The payoff, though, can be high. In the strict-budget environment that most government entities must work in today, a tool like Piranha can be the perfect way to take a bite out of the large problem of big-data.