To call what's happening a torrent or an avalanche of data is to use entirely inadequate metaphors. This is a development on an astronomical scale. And it's presenting us with a predictable but very hard problem: our capacity to collect digital data has outrun our capacity to archive, curate and – most importantly – analyse it. Data in itself doesn't tell us much. In order to convert it into useful or meaningful information, we have to be able to analyse it. It turns out that our tools for doing so are currently pretty inadequate, in most cases limited to programs such as Matlab and Microsoft Excel, which are excellent for small datasets but cannot handle the data volumes that science, technology and government are now producing.

Does this matter? Yes – for two reasons. One is that hidden in those billions of haystacks there may be some very valuable needles. We saw a glimpse of the possibilities when Google revealed that by analysing the billions of queries it handles every hour it could predict flu epidemics way ahead of conventional epidemiological methods. There's a lot more where that came from.

More importantly, we need to recognise that Big Data (as it's now called) could tip the balance between society's need for independent scientific research and the corporate world's use of data-mining to further its own interests. Tooling up to handle this stuff requires major investment in computer hardware and software and you can bet that most of the world's big corporations are making those investments now. But most PhD students working in data-intensive fields are still having to write their own analytical software and cadge computing cycles wherever they can find them.



« To call what's happening a... »


A quote saved on Feb. 26, 2013.

#software
#billions


Top related keywords - double-click to view: