Skip to content

167 times the information contained in all the books in the Library of Congress

This week I’ve been writing a critique of the era of big data. I have noticed that many blogs, wiki articles and white papers about big data repeat this little bit of hyperbole:

“Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress.”

As far as I can tell, this can be traced back to a special report Data, Data Everywhere that The Economist ran in print in February 2010.

The idea that Walmart handles more informational transactions every hour than the collections stored at the LoC is an odd way to frame big data and its significance to this present cultural moment. From what I gather, this measurement is based on the idea that one typed alphanumeric character in computer memory is equivalent to one byte, and that 2 pages of printed type are 2 kilobytes. The LoC houses approximately 32 million cataloged books, the report finds that when digital data are quantified, “[i]n terms of bytes, written words are insignificant.”

What this characterization neglects to mention is the petabytes of digital resources that the Library of Congress stewards and makes available through the American Memory site. It also overlooks the fact that the LoC houses the U.S. Copyright Office where the mandatory deposit of registered copyrighted works is required: 22,000 new publications are registered every business day and about 10k of those works are added to the library’s permanent collection.

I was talking to my friend Mike about this and he reminded me that this is not a revelatory way of measuring culture or even information, there have always been more receipts that record transactions than there have been printed books or other written works. In fact, most surviving cuneiform tablets are receipts. As we move to a ‘big data ecosystem,’ I believe that it is important to interrogate and contextualize the ways ‘bigness’ and contexts shape our expectations–the possibilities for analytics, measurement and inquiry of big data. We might think of comparing Walmart and LoC as an analog/digital trope: what does this big data rhetoric do?

In the coming months, I will try and continue to write about the characterizations of big data as I encounter them.