OpenEvidence: Smart Medicine or Smart Marketing?

OpenEvidence is an AI-powered medical search platform launched in 2022 by Harvard‑affiliated founders Daniel Nadler and Zachary Ziegler, and cultivated via the Mayo Clinic Platform Accelerate program. It claims to sift through peer‑reviewed sources such as NEJM and JAMA, providing AI‑generated answers with citations. Free access is granted to verified U.S. healthcare providers, funded by advertising.

OpenEvidence reportedly scored a perfect 100% on the USMLE and has secured $210M in funding and carries $3.5B valuation which is attributed to rapid adoption by U.S. physicians according to Forbes. It seeks to address the problem of information overload and the sorting through it all that physicians face to stay up to date. Because it is free and it isn’t considered a diagnostic tool (even though I think many doctors can be using it that way), OpenEvidence doesn’t require FDA approval and it doesn’t have to compete for subscription dollars (individual or institutional) like other products such as DynaMed, UpToDate, and ClinicalKey.

Instead of relying on subscription dollars, OpenEvidence relies on advertising for income. With over 430,000 doctors registered and an additional 65,000 per month (according to the Forbes article), those are a lot of eyes and information that can be used by pharma and other medical advertisers. OpenEvidence does say, they do not share personalized information. So, while an individual user’s information is not shared, the aggregate data like the number of cardiologists using OpenEvidence might be. In September 2025, OpenEvidence bought an artificial intelligence advertising startup company, Amaro which specialized in “end-to-end advertising optimization using intelligent automation. Its technology is designed to help companies streamline deployment and maximize performance across digital channels.” So, it wouldn’t be a stretch to envision that OpenEvidence could say to a potential advertiser that they have X number of cardiologists, Y number of oncologists, Z number of endocrinologists, etc. and can offer targeted advertising to those users in those specialties. So a pharma company could have their statin drugs show up whenever a cardiologist does a search, oncologists would see their immunotherapy drugs, and insulin ads would show for endocrinologists when they search. So, the advertising would be agnostic of the search and information retrieved but would follow user profiles to target advertising.

This type of advertising in medicine isn’t new. Back in the day (I feel old saying that) when scientific journals were in print, not online. The publishers had the aggregated data of their subscribers and sold print advertising in their journals much the same way. It would not be unusual for advertising for drugs or other medical devices to be on printed on the back of research articles within the journal. Publishers would know their subscribers and would place ads directed to cardiologists in the journal that was sent to a cardiologists home or office. Advertisements for other products would go into journals sent to peoples’ homes or offices based on their discipline.

So now we know how OpenEvidence gets its money, or how it intends to. What makes it so great? Short answer…I don’t know. I am not a doctor or nurse. I don’t treat patients. So I don’t have an NPI to register for an account. I am uncertain of their rationale lock it to only a person with an NPI. Whether it is OpenEvidence’s myopic view of a “healthcare professional” or strategic opacity to maintain a competitive edge (which isn’t an unfounded concern: see how competitor impersonated physicians to access & hack OpenEvidence), that drives the NPI access requirement. But those of us who are involved in medical information and research (such as medical librarians) don’t have an NPI and can’t test or evaluate the product.

The Forbes article is full of glowing reviews from individual doctors detailing how much time it has saved them or pointed them to areas where they could expand their research. A quick and dirty search on PubMed for OpenEvidence yielded 18 citations (as of November 6, 2025), some of which review it or compare it to other products for clinical care.

The subject matter definitely “mattered” when it came to how well OpenEvidence performed. Apparently, for evaluating structural heart defects, ChatGPT performed better. (Struct Heart 2025 Jul 5;9(9):100696. doi: 10.1016/j.shj.2025.100696) But when it came to searching for the National Comprehensive Cancer Network (NCCN) guidelines for basal cell carcinoma and squamous cell carcinoma (SCC), OpenEvidence “scored significantly” better than ChatGPT. (Int J Dermatol 2025 Jun 5. doi: 10.11/ijd.1783)

Several studies evaluating OpenEvidence alone or against other resources such as ChatGPT, other AI programs, or UpToDate were somewhat limited in my opinion. One study, disappointedly only looked at 5 cases….really is that a true test? Others looked at the use of OpenEvidence for patient education material or discharge material. While OpenEvidence does provide the ability for physicians to “Write a Patient Handout,” I question whether a program that has medical information intended for clinicians is able to generate appropriate patient education material at a suitable reading level. (The AMA recommends patient education material should be at 5th-6th grade reading level.)

I think the editorial by Patel, et al. OpenEvidence: Enhancing Medical Student Clinical Rotations With AI but With Limitations brings up some important concerns.

Some of those listed in the editorial are:

  • Lack of search precision – inability to search for specific articles, authors or journals which is extremely important…especially if you know of a specific study or article on the topic.
  • Opaque curation – OpenEvidence is not clear about its article selection and ranking. This is important as medicine needs to be very clear about the evaluation of evidence.
  • Limited interface – It lacks the ability to clarify queries or work interactively like programs such as ChatGPT.
  • Inconsistent Evidence – For example, authors said it “identified buspirone usage in OCD but missed additional supporting studies, limiting scope and confidence.”

As a medical librarian, I share those same concerns. I am especially concerned about the evidence within OpenEvidence, ability to access the source material, and a lack of medical professionals and information professionals on the OpenEvidence Team.

Where and what material are they using? We know that they have content from NEJM and JAMA, but what other quality resources do they have access to? Are they able to get behind the firewall of publishers’ content to provide information.

What ability do clinicians have to access the source material? Users cannot access their institutional holdings by connecting OpenEvidence to an institutional authentication system to access the full text of the journal articles. I have talked to a lot of smart computer people who think they just need to include the DOI so people can get the article. They are very smart computer people but very unaware of how hospitals, medical schools, and libraries license journals for medical professionals to read. *Note I did field a question from a doctor using OpenEvidence about adding OpenAthens (an authentications system) to access the full text of journals. But we use a different authentication system so I was unable to help him or even verify if using OpenAthens is even possible.

Of the 26 people listed on OpenEvidence’s team, only one has an MD. The rest? A ton of computer scientists, mathematicians, and statisticians. Smart people, no doubt, but wouldn’t you want more than one medical professional curating clinical evidence? It’s deeply concerning that a product claiming to revolutionize medical decision-making has so few actual medical professionals on the Team. And let’s not even talk about the complete absence of information professionals (you know, the people trained to evaluate, organize, and retrieve medical literature).

But, I get it. Medical librarians are kind of like the power company: nobody thinks about us until the lights go out…or until they need an obscure case report, or full-text article buried behind a paywall and their “Google it” approach fails. Still, if you’re building a tool to sift through clinical evidence, maybe – just maybe- you should include the people who specialize in doing exactly that. The omission feels less like an oversight and more like a typical Tech Bro blind spot: build fast, break things, and forget the people who’ve been quietly keeping the lights on in medical research for decades.

So can I recommend OpenEvidence? I don’t know…and that’s exactly the problem. It’s the latest AI-powered darling of medicine, launched by Harvard-affiliated founders and backed by $210M in funding. It’s free for verified U.S. physicians and medical professionals with an NPI, monetized through advertising, and praised for saving doctors time. But as a medical librarian, someone trained in evidence evaluation and information retrieval, I’m locked out. No NPI, no access. That means I can’t assess its sources, search precision, transparency, or even help clinicians connect it to the full text of the citations. When the very people who specialize in evaluating medical information are excluded, it raises concerns. Until more voices from the information side of healthcare are included and kick the AI’s tires, it’s hard to fully know if OpenEvidence is smart medicine….or just smart marketing?

*I used AI to aid in my research and writing of this post

Unscrupulous Authors: Tricking AI to Promote Article

I have been meaning to write a post about this article in TechCrunch, “Researchers seek to influence peer review with hidden AI prompts” but time slipped by faster than I anticipated. However, I still feel it is an important thing to post about; if anything to open people’s eyes to how unscrupulous authors have found a way to game the system.

According to the article, researchers have been caught embedding hidden prompts in preprint manuscripts to influence AI-assisted peer review. Often this was done using white text or teeny tiny fonts that the human eye doesn’t see or notice but the AI can “see.” These white or tiny font prompts instruct AI tools to deliver positive feedback, praising papers for their “impactful contributions” and “methodological rigor.”

The TechCrunch article mentions a report by Nikkei Asia, where at least 17 papers on arXiv contained such embedded instructions. The authors of these hidden AI message papers hailed from 14 institutions across eight countries, including places in the U.S. such as Columbia University and the University of Washington.

The Nikkei Asia report interviewed a professor who said this practice was to “counter against ‘lazy reviewers’ who use AI.” IMHO I find it ironic that the professor is choosing what seems to be a lazy way to counter “lazy reviewers.” If they really wanted to counter lazy reviewers perhaps embedding a hidden message to tell the reviewer using AI to stop using AI to review the paper. But alas they chose a way to make sure that the lazy reviewer using AI was still giving great reviews, so not really combating the problem. (Again IMHO it sounds more like someone who caught got with their hand in the cookie jar.)

So why is this more than a reviewer problem?

How many researchers use AIs like ResearchRabbit, Elicit, Undermind, etc. to find articles and research from preprint servers like medRxiv and bioRxiv that have yet to be published in the journal literature? While ResearchRabbit and Undermind do not directly use data from medRxiv and bioRxiv, they draw their content from Semantic Scholar….which in turn gets content from medRxiv and bioRxiv.

I am not picking on medRxiv or bioRxiv, really it can happen to any preprint archive/database that allows the article to be uploaded as text or other format that would allow an unscrupulous author to add AI positive review instructions. This type of AI trick theoretically would not work for accepted and published articles, because the formatting it would go through to be published would typically strip the code/text. That doesn’t mean it would be detected before acceptance…the publisher would have to be on the look out for that. It just means the person utilizing an AI program to find research published in journals typically wouldn’t fall prey to AI review trickery.

But a lot of people use those tools to do research and the creators of those tools highlight the benefits of their AI finding preprints. If those tools are ingesting content from platforms like Semantic Scholar, which in turn aggregates from medRxiv, bioRxiv, arXiv, and others preprint servers, then the potential for AI manipulation extends far beyond peer review.

So it is no longer just a reviewer problem. It’s a discovery problem, a credibility problem, and ultimately a trust problem.

As a medical librarian, I find this deeply concerning. And so should anyone in the biomedical community dedicated to evidence-based practice, source transparency, and rigorous peer review. If AI tools used by clinicians, researchers, and students can be manipulated by unethical authors, then there are huge risks that many are unaware of.

What can be done? Stop using AI? No. But, AI developers need to build in safeguards against prompt injection and hidden formatting. Publishers need to scan for manipulation before and after acceptance. Librarians and educators need to raise awareness and advocate for responsible AI use.

AI isn’t going away so we must ensure that the tools we use to find and evaluate research aren’t susceptible to this type of fraud and can be tweaked to prevent it.

Can I Fast Forward in the AI Landscape?

I love technology. I love librarianship. I love the intersection of the two—where innovation meets information, and where we, as librarians, help people navigate the ever-evolving landscape of knowledge. But lately, I’m tired. Tired of what feels like the AI wars.

We’re in a moment of rapid transformation, and while I know it’s essential to stay current with advances in AI and information retrieval, the pace and fragmentation of it all is exhausting. Every week seems to bring a new tool, a new policy, a new platform—each promising to revolutionize research, while simultaneously complicating the very ecosystem we’ve spent decades building.

The Fragmentation of AI Tools

Let’s start with the publisher-created AIs. These tools are often limited to the publisher’s own content, creating silos that lack the diversity and breadth of scholarship needed for real, comprehensive research. They’re polished, yes—but narrow. They reinforce the walled information gardens we’ve been trying to break down in libraries for years.

Then there are the LLM-powered platforms that promise to assist with literature reviews and synthesis. These tools are exciting, but they often hit paywalls. They can’t access the full range of scholarly content behind institutional subscriptions, and most don’t integrate with our link resolvers—the very systems designed to connect users to full-text content we’ve already paid for. So we end up with tools that are powerful in theory but incomplete in practice leaving our users disconnected from the information.

The Copyright Conundrum

Meanwhile, users—well-meaning and curious— seem to want to upload everything into the AI of their choice. Library subscriptions, PDFs, paywalled articles, open access content… if it’s digital, they want to feed it into ChatGPT, Claude, Copilot or whatever tool they’re experimenting with. But copyright doesn’t disappear just because the interface is conversational. Additionally, any content created by AI is not subject to copyright – many users don’t realize this either.

We’re now spending more time educating users on copyright, fair use, and licensing than ever before. And it’s not just students—faculty and researchers are also navigating this new terrain, often unaware of the legal and ethical implications of how they use AI tools.

The Shifting Sands of Licensing

And just when we think we’ve got a handle on things, the publishers change the rules. Contracts that once allowed for text and data mining (TDM) are now being rewritten with restrictive AI clauses. These clauses often limit or eliminate our ability to mine content for research or innovation—something we’ve supported for years as part of open science and discovery.

Each publisher and company has their own unique rules regarding the licensing of AI (or inability to license AI). At MLA one librarian discussed trying to license a few journals (less than 100) to use with an institutional AI. Those journals had different publishers and different licensing requirements and fees, requiring hours/days/weeks of negotiating. This is impractical when dealing with the thousands of resources a library subscribes to. It feels like we’re being pushed out of the very conversation of connecting users to information that we helped start.

Wishing for the Fast Forward Button

Some days, I wish I could just hit the fast forward button—skip ahead to when the dust has settled, the standards are clearer, and the tools are interoperable. I want to get to the part where AI is a seamless part of the research process, not a battleground of competing interests, legal gray areas, and technological silos.

But I know that’s not how progress works. We’re in the messy middle. And as much as I’m tired, I also know this is where librarians are needed most.

We are the translators, the educators, the advocates. We understand metadata, licensing, access, and equity. We know how to ask the hard questions about bias, transparency, and sustainability. And we care—deeply—about helping people find, use, and trust information.

So yes, I’m tired. But I’m also still here. Still learning. Still advocating. Still believing that librarianship has a critical role to play in shaping the future of AI in research.

Let’s just hope that future gets here soon.

**1st Note** I want to be transparent that AI was used to aid in the creation of this post as I continue to attempt to learn ways to use AI better.

**2nd Note** My exhaustion is not regarding any specific company’s AI or type of AI, just tired of living in the messy middle.

Concerns Regarding AI Tools for Writing Articles and Papers

AI and Writing

In recent years, AI-powered tools like Grammarly, Quillbot, and Ginger Software have become increasingly popular for assisting with writing articles and papers. These tools offer a range of features, from grammar and spell checking to paraphrasing and style suggestions. However, their use also raises several concerns that writers and researchers should be aware of. Additionally, AI detection software like Turnitin and iThenticate plays a crucial role in maintaining academic integrity. Here, we explore the key concerns associated with using AI tools for writing and the potential pitfalls of AI hallucination in research, along with real-world examples of accusations faced by students and professionals.

1. Over-Reliance on AI Tools

One of the primary concerns with using AI tools like Grammarly, Quillbot, and Ginger Software is the risk of over-reliance. While these tools can significantly improve the quality of writing by catching errors and suggesting improvements, they can also lead to a dependency that may hinder the development of a writer’s own skills. Writers may become less attentive to their own mistakes and rely too heavily on AI to correct them, potentially stunting their growth as proficient writers.

2. Quality and Accuracy of Suggestions

AI tools are not infallible. The suggestions they provide may not always be accurate or contextually appropriate. For instance, Grammarly might flag a sentence as grammatically incorrect when it is, in fact, correct in the given context. Similarly, Quillbot’s paraphrasing might alter the original meaning of a sentence, leading to misinterpretation. Users must critically evaluate the suggestions provided by these tools and not accept them blindly.

3. Ethical Concerns and Plagiarism

The use of AI tools for paraphrasing, such as Quillbot, raises ethical concerns related to plagiarism. While these tools can help rephrase content to avoid direct copying, they can also be misused to produce work that is not genuinely original. This is where AI detection software like Turnitin and iThenticate becomes essential. These tools help identify instances of plagiarism and ensure that the work submitted is original and properly cited. However, the effectiveness of these detection tools depends on their ability to keep up with the evolving capabilities of AI writing tools.

4. Real-World Examples of Accusations

There have been instances where students and professionals have faced accusations of using AI to write their papers when they actually used tools like Grammarly, Quillbot, or Ginger Software. For example, Haishan Yang, a former Ph.D. student at the University of Minnesota, was expelled after being accused of using AI on a preliminary exam. Yang denied the allegations, stating that he used AI tools for various tasks but not on the test. Similarly, Marley Stevens, a student at the University of North Georgia, was accused of using AI in a paper. She claimed to have used Grammarly, as recommended by her school, but still faced repercussions that affected her GPA and scholarship

5. AI Hallucination in Research

AI hallucination refers to instances where AI generates information that is not based on actual data or facts. This is particularly concerning in the context of research, where accuracy and reliability are paramount. When using AI tools to assist with research, there is a risk that the AI might produce plausible-sounding but incorrect information. Researchers must be vigilant and cross-check any AI-generated content against credible sources to ensure its validity.

6. Unreliability and False Positives in AI Detection

AI detection software, while useful, can be unreliable and prone to false positives. These tools sometimes incorrectly identify human-written content as AI-generated due to algorithm limitations, complex writing styles, and specialized language. For instance, Turnitin’s AI detection software has been reported to wrongly flag parts of completely human-written academic essays as AI-generated. The accuracy of AI detectors can vary significantly across different programs, with some achieving higher accuracy rates than others. This inconsistency can lead to unfair targeting of students and professionals, harming their reputations and wasting valuable time.

7. Privacy and Data Security

Another concern with using AI writing tools is the privacy and security of the data inputted into these systems. Users often input sensitive information, including proprietary research data or personal details, into these tools. It is crucial to understand the data policies of these AI tools and ensure that the information is not being misused or stored without consent.

Conclusion

While AI tools like Grammarly, Quillbot, and Ginger Software offer valuable assistance in writing articles and papers, it is essential to be aware of their limitations and potential pitfalls. Over-reliance, accuracy of suggestions, ethical concerns, AI hallucination, and data security are critical issues that users must consider. Additionally, AI detection software like Turnitin and iThenticate plays a vital role in maintaining academic integrity by identifying plagiarism. By using these tools judiciously and critically evaluating their output, writers and researchers can harness the benefits of AI while mitigating the associated risks.

What are your thoughts on the use of AI in writing? Have you encountered any challenges or benefits that you’d like to share?

**Note** If you made it this far you should be aware this post was written by Copilot. I realize there is a bit of irony in that. But I am curious about a few things about how it did and about how it might get picked up by other bots.

BTW I had to fix all of the links to the statements. The first citation re: Yang was incorrectly attributed to this article where Yang’s name is not mentioned anywhere. Instead it found the information from this article, which actually is what inspired me for this post, but failed to cite it correctly.