Library of Congress Collecting Twitter Feeds

Last week the Library of Congress Blog announced  that it acquired the entire Twitter Archive.  Anybody who has ever sent a tweet publicly, their tweet along with the billions of other tweets will be housed in the Library of Congress.  Major events such as the controversial Iran elections, Barak Obama, and the Hudson River plane will be collected along with the tweet I recently sent about renewing my AHIP.  (Hey Mom, something I wrote is in the Library of Congress ;).) The noteworthy and not-so-worthy all together on about 5 terabytes of digital storage. 

So why on God’s green earth would the Library of Congress be interested in something like Twitter?  Well that is part of their mission.  They collect everything and digital is just one area of collection.  According to “Q&A: Twitter Goes to the Library of Congress” on the Wall Street Journal blogs, Matt Raymond, communications director at the LoC said a congressional mandate requires them to identify and aquire materials that are “born digital.”  These items would be web pages, blogs, government records, data sets, etc. 

When you apply that mandate, Twitter would be one of those things that was born digital. It also doesn’t hurt that the folks at Twitter actually contacted the LoC about their data and whether it would be of value. 

To me there is little doubt that there are definite nuggets of sociological gold, but researchers will be have to do a lot to separate the wheat from the chaff because there is also a lot of unimportant chatter out there, like my AHIP tweet or the tweets about somebody’s breakfast.  I wonder how people will access this stuff and how they will do it in a way that makes sense.  To view President Barak Obama’s tweet after he won the Presidency is relatively easy to do, you just look at his account.  Following the thread of chatter about an event such as Hudson River plane might be a little more difficult but doable if there are certain common words or they used a hashtag.  Just think, we will be able to look through the archive of tweets happening at the MLA meetings.  However, there are a lot of other tweets that are still floating around in those 5 terabytes of data, and I have no idea how somebody can find logic within that mess. 

It will be interesting to see how it all works out.  I don’t think we will know for quite a few years what value the archived tweets hold, if any.  I am glad the LoC is collecting electronic information.  It makes me wonder what NLM is doing.  I am not criticizing NLM, I just would like to know what kind of electronic information are they collecting.