Measuring The Data Mountain

From the economist

http://www.economist.com/displaystory.cfm?story_id=S%27%298%24%28Q13%2B%21%40%20%3C%0A Sorry, behind a pay barrier.
You can see the original research here: http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

A short and curious article, this immediately changes Data from the title into a measurement of Information in the first line. This made me uncomfortable. As a programmer I know that information is structured data; a string of numbers use is meaningless unless you are told it is a telephone number, by presentation or implication.

That structuring might be the change from 0800235354 to 0800 235 5354 for some people, or to +44 (0)800 235 5453 for others. A contextualization.

On the internet the generation of context is much more complex. An HTML page containing a text article may have a considerable amount of strcuture to make it's presentation understandable [or to position adverts neatly around it's outside]. I've seen pages which are more than 50% structure in my time.

And this extends out into all media formats delviered digitally. Does a low quality image file contain more information than a high quality image, or a low quality sound file more than a high quality. It might hold a greater detail of information, but not relatively to it's increase in data size.

Further the data transfer/stored isn't quantified in the article, so I want to find out their sources. What intrigues me is how much of the wonderfully large amount of data is used in the duplication of items. Poisoned is me showing 4.898,191.48 GB of data currently available across OpenFT, FrastTrack and Gnutella... how much of that is duplicates and copies [if the music industry are to beleived, the majority of it].

If that's the case then that much information isn't created it's stored. It's duplicated.

It also makes me wonder how much of it is people saying the same thing overt and over again in slightly different words, rehashing the same arguments and saying "me too"? Or just for the sake of making a noise.

A cursory look at the source materials even imply that the figures are based upon sales of recording media, and that these include wide assumptions of what was stored on the media.

I'd stick this in the same category as one of my pet hates : Research that is announced as newsworthy after long testing when the same conclusion was painfully obvious to anyone with there eyes open, or that states statistics without qualifying their source or relative position within a field.

This article, which goes on to say that all of the figures are pretty much guesstimates, isn't really information. It's statistics for the sake of saying something. To me it's dis-information.

published 2003.12.10 updated 2017.06.26
show menu