The Horror of Google Scholar

We love Google Scholar, we hate Google Scholar, which is it?  It’s a complicated relationship.  Librarians use it to find trickly little articles that are hiding in some dark crevice of the Internet and we search it probably just as much or more than the rest of the normal world.  So why does Google Scholar draw such frustration with librarians at times?  Why do we do our best impersonations of  The Scream  at the reference desk when researchers, doctors and nurses claim to have done an extensive search on Scholar? Part of the reason is us (librarians) and part of it is Google.

We are nerds who know that the information in Google is not always right. Just like a Trekkie debating the merits and flaws of Star Trek, The Next Generation, Deep Space Nine, Voyager, Enterprise, and all of the movies, we debate the merits and flaws of databases and information.  Google Scholar is just one of the various topics in our librarian geek debate camps.  As important as the debate over proper Klingon verb conjugation is in the Trekki community, it doesn’t usually impact the daily medical lives of many people.  (I know, I am going out on a limb there.)  Where as the information that librarians, researchers, doctors, and nurses retrieve from medical and other information databases usually does impact the lives of people.

As librarians we are trained to index, categorize, find and retrieve information.  As the Librarian Avenger  states, librarians “bring order to chaos.”  Nothing gets us more worked up than a database like Google Scholar that pretends to bring order to chaos.  I take that back.  Nothing gets us more worked up than when our patrons don’t realize that the Google Scholar database isn’t always right. 

Google Scholar indexes the full text of scholarly literature across many disciplines.  Yet Google keeps Scholar’s coverage and indexing a giant super secret.  Had Darth Vader employed Google Scholar creators, those pesky rebels would never have been able to sneak the plans out to destroy the Death Star.  This secrecy helps make Scholar a mess of a database.  Who knows what is in the that soup of information?  Some publishers do not allow Scholar to crawl their journals, but it is difficult for searchers know specifically which ones and adjust accordingly.  Like the unknown item in the Tupperware in the back of the fridge we don’t even know how fresh Scholar’s content is because it does not provide information on the frequency of its updates. 

As if not knowing what is in the database and how often it is updated isn’t bad enough, a recent article in Library Journal, “Google Scholar’s Ghost Authors, Lost Authors, and Other Problems” by Peter Jacso clearly will make any librarian (and should make any researcher) shudder over the idea of doing citation analysis within Scholar. 

Some of the problems Jacso found:

  • False authors like P Login (for Please Login), P Options (for Payment Options), and a whole bunch related to author affiliation such as CA San Diego.
  • Multiple listings of the same paper
  • Mismatched citation information – Scholar ignores existing, correct publication years, page numbers, volume numbers, etc.
  • Missing authors – Scholar in many cases replaces the real authors names with that of the false authors.

Really if it weren’t true it would make for a great comedy for librarians to laugh at, but with the 10.2 million records added from Google Books into the millions already within Scholar the story reads more like a B horror film. Not a campy one like Evil Dead, but more like the Blair Witch Project where despite the rave reviews (who gave that thing such great reviews anyway) leaves you nauseated and declaring it unworthy of even renting. 

However, I wonder if our patrons will even notice.  Will they even care about the fact that Scholar developers decided not to use metadata from scholarly publishers but instead built the database around a parser created by the developers and labeled as “imbecile” by Jacso?  Call me a cynical librarian nerd but I don’t think they will.  I have about as much hope that researchers will know or care as Jacso has for the Scholar developers to correct their massive mess. 

“The parsers have not improved much in the past five years despite much criticism. GS developers corrected some errors that got negative publicity, but these were Band-Aids, where brain surgery and extensive parser training is required. Without these, GS will keep producing similar errors on a mega-scale.”

WE may have a complicated relationship with Google Scholar and we will still be using it to find things, it is up to us, the Trekkies of the database searching world, to inform our patrons of the pitfalls as well as the promises of Google Scholar and to alert them to alternative resources and databases to use instead of Scholar. Because for every person who just wants something quick and easy, there are always those who happy to learn something new and will be amazed at what Scholar does not really do.

Share on Facebook