I recently attended the NLM Georgia Biomedical Informatics Course at the lovely Brasstown Valley Resort in Young Harris, GA. This week-long semiannual course is hosted by the Robert B. Greenblatt, M.D. Library, Georgia Regents University and funded by the National Library of Medicine. If you’ve ever heard library colleagues talk about the Woodshole course, this is the current version of that course. The content changes every session, which is necessary in such a fast moving field.
Attendees were a nice mix of librarians, clinicians, researchers and others involved in medical information technology. Instructors who are in the forefront of their field came from around the country to teach in this prestigious course. I found it to be a great overview of current important topics in informatics, and I learned so much about the breadth of this essential field from both the instructors and the other attendees. We also did some networking and shooting pool at the local watering hole, Brassies.
Read more to see what was covered (and some cool pictures from a field trip we took)
What is biomedical Informatics?
James Cimino answered this question succinctly: the representation of medical concepts in a way that computers can manipulate. This process facilitates taking data to information to knowledge using computational power.
What kinds of data do we have to deal with?
Clinical data are of major concern for research but also to improve patient care. If you’ve been to the doctor any time recently, you’ve probably seen your physician entering notes into an electronic health record (EHR). This results in a large amount of unstructured text that could be very useful for research. Your hospital has entirely separate systems to store clinical imaging data. These images require contextual information (metadata) like who the patient is, when it was taken, what region is imaged, and take up orders of magnitude more computer storage space. These data also have to comply with Meaningful use, which is using certified electronic health record (EHR) technology to: Improve quality, safety, efficiency, and reduce health disparities.
Now think of all of the other patients at your facility and all of the other facilities around the world collecting similar data. That’s a big data problem if I’ve ever heard one. “Big data” isn’t just about size but also complexity. DNA sequencing generates massive amounts of data as well. Donald Lindberg, Director Emeritus of the National Library of Medicine gave us a historical perspective on genomics. He did an excellent job at explaining how our understanding of genetics has changed from the conceptualization of inheritance to the human genome project and beyond with an emphasis on human disease. New personalized medicine initiatives are proposing adding genomic data to EHRs, increasing the complexity.
It’s not all about EHRs though. Non-clinical genomic data are stored in the NCBI databases. Public health efforts generate massive amounts of data as well. Jessica Schwind introduced us to the interdisciplinary world of Public Health Informatics and the variety outputs from this discipline, such as disease surveillance data found in tools like health map. Advances in technology allow the collection of data constantly through mobile devices. Rebecca Schnall discussed this new field (mHealth) from its origins in clinical decision support at the bed side to modern smart phone apps.
Mathematical modeling is also integral to the field of informatics. Dmitry Kondrashovmade mathematical modeling fun using a modeling environment called NetLogo. We used this program to model the predicted course of a disease outbreak while altering variables like herd immunity, population size, chance of infection or length of recovery.
So, what do you need to do to create machine readable content?
Mostly, this requires a structured dataset and a language to describe it. Creating a structured dataset can be achieved by following data management best practices. Paul Harris explained the major considerations for doing this with clinical data. That said, every data type is different, and requires unique metadata to really make it useful.
Having a nicely structured dataset is all well and good, but how does the computer ascribe meaning? For this you need a controlled vocabulary, a concept that librarians are familiar with. Using controlled vocabularies when designing a data applies meaning in a standardized way that allows data sets to be analyzed together. It’s very much the same logic of MeSH terms that are applied to articles in PubMed, only for data sets with a lot more field to describe. Dr. Cimino presented on what makes a good controlled terminology, based on his 1998 publication in Methods of Information in Medicine.
But what standards do you use? Christopher Chute discussed the major standards for clinical data in electronic health records (EHRs). Using standards is essential for using EHR data in research, which is a major component of meaningful use. He also went over the major meaningful use terminologies and information models along with their strengths and weaknesses.
Another option is to make the computer do the work. Wendy Chapman presented natural language processing (NLP) in a really accessible way. NLP is a process by which the computer can be trained to read unstructured text, which is common in EHR data, and ascribe meaning to these texts automatically. It’s not always as easy as it sounds though: we performed an activity where we were tasked with coding the text from notes from an EHR to demonstrate that it’s hard to get agreement among human coders let along between humans and computers.
How do we handle this diversity of data?
If big data is the question, Michael Ackerman speculates that the answer might be the cloud. He discussed the many flavors of “the cloud” and how they can help “solve” the big data problem. He also discussed the paradigm shift from hypothesis driven research to data driven research and how this necessitates open data to make it really work well, and the need for better infrastructure to support data storage and analytics.
But once the data is in the cloud, how do we make sure that the right people can access it? Aa you already know, the NLM makes a huge effort to curate and preserve these data. PubMed contains a wealth of health information, but getting the information that you need out of there is not always easy as a layman. Kathy Davies gave a great overview of the vast number of curated collections of information from the NLM. NLM houses resource collections for AIDS information, and information for populations with unique health concerns, like senior citizens or American Indians.
Molecular data are stored in the National Center for Biotechnology information (NCBI) databases. Rana Morris from the NCBI presented on the resources in those databases in a very practical way. We began with a clinical diagnosis, and tracked through MedGen to the Gene database and beyond to find out about the molecular basis of the disease in question.
What problems arise?
We’ve got the standards and the infrastructure, so sharing clinical data should be easy right? Well, not really. Implementing these systems require the cooperation of humans, many of whom are resistant to change. Kevin Johnson presented on Electronic Health Records (EHR), Health Information exchange and meaningful use. The major themes are illustrated in the documentary No Matter Where. Joan Ash tackled these issues from an organizational perspective. Her talk focused on the process of implementing change in organizations adopting new health information technologies. A video titled “HIT or miss” illustrated how failures in systems can compromise patient care.
Have you ever thought about how you’d manage information retrieval in a disaster? Neither had I until the Disaster Informatics sessions with Steven Phillips and Jennifer Pakiam. They described the NLM resources that are useful in the event of a disaster and led us through a scenario where we used these tools to find information about an earthquake at a nuclear facility.
Where is the field headed?
Betsy Humphries acting NLM director discussed Research issues in biomedical informatics. She underscored the importance of having access to research products, including information, data and software, as a return on the government’s investment, to promote transparency and reproducibility, and permanent access. She also discussed the Precision medicine initiative and gave an update on the NLM leadership transition that is currently underway.
Jessica Tenenbaum discussed the importance of translational bioinformatics to precision medicine. She gave this topic a personal twist by discussing her personal experience with direct to consumer genetic testing effected her course of treatment.
Overall, I feel that this class gave me a good overview of what constitutes informatics (a lot more than I realized) and a great basis of terminology involved. I’ve already gotten to put this knowledge into practice on campus in meetings about clinical data management and data science curriculum development. If you want to see what others in the course found interesting, check out the Twitter feed (#NLMGRUInformatics). You’ll even get to see some of the fun things we got to do on our afternoon break.
And on that note, I will leave you with some pictures from the trip we took to Crane Creek Vineyards.