Open source, open access, open content… now we have open data. India has joined a select group of over 20 countries whose governments have launched open data portals. With a view to improving government transparency and efficiency, www.data.gov.in (in public beta stage at present) will provide access to a valuable repository of datasets, from government departments, ministries, and agencies, and autonomous bodies.
Open data will be made up of “non-personally identifiable data” collected, compiled, or produced during the normal course of governing. It will be released under an unrestricted license -- meaning it is freely available for everyone to use, reuse, or distribute, but citations will be required.
Of course, no government can make all data public, so guidelines have been passed to manage what data is available. Sensitive and restricted data will be kept out of the portal, but some of it may be accessed directly -- perhaps for a price -- from relevant agencies. India passed the Right to Information Act in 2005 and has been working since on multiple projects to ensure meaningful government-citizen engagement, democratization of information, and community collaboration. The National Data Sharing and Accessibility Policy (NDSAP) of 2012, announced by the Department of Science and Technology, also allows access to government-owned “shareable” data.
As of now there are just a few raw data sets available, though the policy mandated that every department would upload at least five “high value date sets” within three months and the rest within a year. Since information is power, there is reluctance to share the data. In response, the NDSAP has created an oversight committee to ensure implementation. Even the protected lists of data will be periodically reviewed to keep only some highly sensitive, defense-related data outside the public domain.
The potential of the open data sources is clear. Imagine a developer integrating geo-spatial data on droughts with data from a government food assistance program to create a mashup that improves the model for food security welfare schemes. Indeed, the data portal already provides data on the outcome of a rural employment scheme in terms of employment, physical asset creation, and financial allocations. Another dataset provides the number of allopathic hospitals and dispensaries and beds in government hospitals for a 10-year period. Surely, this is a treasure-trove for the healthcare industry, healthcare management consultants, media organizations, researchers, and others.
Even more enticing is that it is estimated that there are more than 1 million datasets from worldwide governments in the open domain. These include location of toxic waste dumps, regional healthcare costs, and statistics on crime, transport, education, and so on. The World Bank data portal provides downloadable access to 8,000 development indicators from its datasets. Imagine the insights that a combination of multiple worldwide datasets could provide.
The potential for combining worldwide data is clear, but the effort to put this data out is only just beginning. It poses multiple challenges to government CIOs, including putting it out in easily read formats, making it compatible with other databases, working with developers, and doing it cheaply so the publishing of the data doesn’t become a burden. And, of course, while the data itself is open and doesn’t need to be protected, it is important to create processes to insure that data remains anonymous and that sensitive data isn’t accidentally released. The scale of the challenge poses a real problem.
While government CIOs are trying to ramp up their efforts, private-sector CIOs should begin planning on accessing the data from India and the other open government projects in the US, Europe, and elsewhere. There’s too much valuable information to ignore.