A guest post by Rachel Pinotti, MLIS, AHIP
Recently, a faculty member sent me a copy of a June 2017 editorial published in Annals of Internal Medicine entitled Computer-Aided Systematic Review Screening Comes of Age along with the article which it accompanied. The editorial argues, in short, that machine learning algorithms generate superior results to human-designed search strategies. It asks (and answers), “Is it time to abandon the dogma that no stone be left unturned when conducting literature searches for systematic reviews? We believe so, because it has a deleterious effect on the number and timeliness of updates and, ultimately, patient health.” (Hemens & Iorio, 2017)
As a librarian who conducts, consults, and teaches systematic review searching, this unleashed a flood of thoughts and questions.
On a philosophical level, these authors’ thesis raised a real tension that I feel with regards to so many topics I teach about: the tension between teaching students about the way things are now vs. the way they very likely will be in the near-to-medium term future. As of now, I don’t think GLMnet and GBM, the machine-learning algorithms utilized in the original article which the editorial accompanies (Shekelle, Shetty, Newberry, Maglione, & Motala, 2017) are widely utilized for systematic review searching, but they quite possibly may be in 3-5-7 years’ time (or less). Are students better off learning to design and execute comprehensive search strategies, a skill that will serve them in the immediate term and perhaps a few years hence or better off learning how to use GLMnet and GBM, tools that may come into wide use a few years from now? The answer is probably that they are best off learning both. Unfortunately I don’t know of anyone within my institution who could teach the current cohort of students these new tools. (Maybe such people exist and I’m not familiar with them, maybe they don’t exist, or maybe they exist but exercise their skills exclusively for research, not teaching purposes….)
Even once these tools come in to wide use, I wonder if teaching students to design and execute comprehensive search strategies is a bit like teaching them long division – not something they are likely to use frequently or maybe ever in their day to day work, but you need to learn long division in order to understand the concept of division so you understand what is happening when you type 48756/38 into a calculator (or enter your initial search terms into a machine learning search tool).
On a practical level, a big concern with machine learning algorithms is whether they are able to effectively handle multiple information sources and grey literature? Shekelle indicates, “Although initial results were encouraging, these methods required fully indexed PubMed citations.” The algorithms could likely be adapted for Embase and other databases, though this might require permission from database providers. Grey literature (conference abstracts, theses, etc.) often does not have complete abstracts and almost by definition is not fully indexed. Excluding grey literature from a systematic review or meta-analysis introduces a real risk that publication bias will produce a biased result, as documented by McAuley, Pham, Tugwell, & Moher, 2000 and others.
I’ve always felt that some of the best practices recommended in systematic review searching, such as the recommendation to use both index terms and keywords, are redundant and unhelpful. So I don’t doubt – and on the contrary am quite receptive to –the argument that current best practices in systematic review searching favor sensitivity far too greatly over precision. Some filters (e.g. language) are much more effective than others (e.g. age, gender). I absolutely think that filters, especially Clinical Queries, are useful when you need to find the few most relevant articles on a topic, but do not believe that they are yet able to unearth all the studies that would have a bearing on a topic, as suggested by Hemens.
The editorial asserts, “Simple Boolean filters applied when searching increased precision 50%…while identifying 95% of the studies…” (Hemens & Iorio, 2017) with an underlying, unstated assumption that these results would hold true across fields and topics. Thinking of this in evidence-based medicine terms, these articles present case-series level evidence for the effectiveness of machine learning search algorithms. Before changing our standard of practice, I would like to see larger scale studies (perhaps cross-over studies in which the same review is conducted both using evidence found through machine learning and traditional search methods) which indicate that machine learning search tools are as effective or more effective than traditional search methods before advocating that they be widely adopted.
The ideal outcome would be getting to a point where machine learning is used to effectively search traditional published material allowing reviewers to focus their energy on searching grey literature. However until we reach the point where these methods have been validated and can be widely used and taught, current best practice remain just that: the best methods for unearthing all evidence that may have a bearing on a given research topic.
Rachel Pinotti, MLIS, AHIP
Assistant Library Director, Education & Research Services
Levy Library, Icahn School of Medicine at Mount Sinai
Box 1102 – One Gustave L. Levy Pl.
New York, NY 10029-6574
Email: rachel[dot]pinotti[atsign]mssm[dot]edu
Phone: available via MLA members list
Follow us on Twitter @Levy_Library to learn about Levy Library events and initiatives.
References:
Hemens, B. J., & Iorio, A. (2017). Computer-aided systematic review screening comes of age. Annals of Internal Medicine, 167(3), 210-211. doi:10.7326/M17-1295
Shekelle, P. G., Shetty, K., Newberry, S., Maglione, M., & Motala, A. (2017). Machine learning versus standard techniques for updating searches for systematic reviews: A diagnostic accuracy study. Annals of Internal Medicine, 167(3), 213-215. doi:10.7326/L17-0124
McAuley, L., Pham, B., Tugwell, P., & Moher, D. (2000). Does the inclusion of grey literature influence estimates of intervention effectiveness reported in meta-analyses? The Lancet, 356(9237), 1228-1231. doi:http://dx.doi.org/10.1016/S0140-6736(00)02786-0
Very interesting – thanks for this post. Now that NLM is saying to publishers that they have to provide the language tag when creating records, problems have emerged. Articles not in English being tagged in English. https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html That’s another reason why there needs to be the human element involved.
I would like to add another complication in systemic searching. We, as other academic health science libraries, have developed a discovery tool which searches all of our bibliographic tools in a single search. I observations are that creates a broader bibliography including hits from databases not often searched. So while it creates a more comprehensive bibliography it is harder to replicate. Has anyone explored the potential role of discovery tools in this process.