This has been a topic I've discussed with peers, and see it increasingly in the local practice
OpenEvidence is maybe the forefront of this, but general LLMs are equally as bad if not worse
Medical professionals of junior age starting to overly rely on these outputs. And I don't care what the company CEOs say, I've seen plenty of examples of errors, not necessarily hallucination, but errors by omission of partial important information, in OpenEvidence, ChatGPT and other tools
We are maybe only year 2 of this process, but I believe we are going to see a potentially significant dumbing down of part of the medical population. Who do I blame? Well maybe those people themselves,
But these AI tools which are all about predicting the next word based on ingested data is NOT the right approach when people's health and safety is on the line.
I fully trust many or most medical colleagues of mine to do their due diligence, to get the calls right, and they are only using these tools for existing knowledge recall. But over time, I fear the net impact may be quite bad as new generations that don't learn the recall in the same way don't know when the tools are wrong or deficient
Am I overly worrying here? I think it's only a matter of time until we hear some doctor relied on medical answers given by OpenEvidence and it results in someone being hospitalised, or worse
It’s an appropriate concern. Tons of my peers use OE. Thankfully, the vast majority of them validate its output and don’t overly rely on it. But there are some who are reliant on it and are poorer clinicians because of that.
I always encourage the interns to use every tool available to them but to remember to critically think and cross-validate what it outputs. It’s incredibly obvious who relies on it completely versus those who don’t when you ask them to explain their decision making process.
I'd advocate for not letting interns use generative AI as a clutch - they need to develop their critical thinking and communication skills on assessment and plan before introducing a tool that could impair their learning if over-relied upon.
As much as I agree with this, you won’t be able to stop them. We just created a section for use of AI in our resident handbook to try to give a framework for use. I just hope people at least read references once in a while to doublecheck what the AI is saying because I have absolutely had it say something based on a reference when the reference never even mentions the thing I was asking it about…
Given that it's available as an app, I don't exactly see a way to enforce people not using it.
I agree. I find OE helpful when you know what questions to ask - if that makes sense. It can be helpful as an adjunct to a strong clinical foundation.
Agreed it’s an appropriate concern. I failed a bunch of students last semester for their reliance on generative AI, too. I yd them thru could use it as a tool, but they couldn’t use it to do the bulk of the work for their final papers. Planning? Yes. Asking for help rewording things? Yes. Improvements and editing? Yes. All of it? No. And if you plan to get away with it, you better be goddamn good at using it. I have a background in neurolinguistics and aphasia and told them that.
Trust when I say there was no ambiguity about whether their papers were written by them or AI. Almost all of the zeroes didn’t edit out the glaring fucking proof they used AI: not replacing placeholder text, sentences that had words but no meaning (like you might actually see with fluent aphasia patients!), emoji use next to every goddamn header. That wasn’t allowed at any point in time, so I’m not sure why they were surprised.
Understanding it’s an imperfect, continuously updated (hopefully continuously improved) tool is one thing - as you said. Relying on it to be the actual brains behind the work and then being upset someone followed through on a warning repeatedly given is another.
It is not necessarily physicians who are younger who are incorrectly reliant on this. I am single digit number of years into being attending, but recently had a patient whose extended family member was a concierge physician. He asked me all sorts of weird questions and it turns out that he is the medical director of clinical decision support of his large multi specialty medical group, i.e. spearheading use of OpenEvidence for hundreds of doctors.
I've posted on this sub a couple times about errors in OE, be it specifically obviously incorrect information being given as fact, or linking to an abstract that does not really back up the statement it is being used as a reference for.
I think the perception of cognitive blunting and offloading that people experience when they overuse AI in day-to-day life can apply to a profession as well, so I share the concern.
I just think these are tools. That’s it. Tools. Tools do not replace your brain and your skills.
I remember I used the epic AI to draft me a note and it left out critical incidental findings that could be problematic later.
They’re tools that people mistake for more useful and powerful than they are. They’re constantly changing, not always for the better from the results that the models spit out. The problem isn’t that LLMs exist or are used, it’s that the preparation for us using them is woefully inadequate.
However, the unique pitfall/danger of these tools is thematic they present the answer with an argument that implies it is comprehensive and accurate.
I do use open evidence to help me search, but I notice that it states slightly outdated approaches to what my specialty recommends, or they give an answer with important caveats omitted. —I find it helpful to ask questions in 2-3 different ways, BUT, I mostly use it to look up definitions or explanations or to help me find review papers on specific topics.
it's basically OG Google but it spells out your answer in an abridged narrative. I imagine the people use it to think for them don't think much in their care anyways.
Tools allow for cognitive offloading. Cognitive offloading weakens future performance.
Calculators made us bad at mental math. Google maps made us bad at street navigation.
It becomes dangerous when the tool is unreliable or is not always available.
I agree. Most people are just resistant to change. Every time there is something new, most people’s reaction is “will X make humans overly reliant on X?” This has been the case for generations. They used to say don’t use calculators you’ll destroy your basic math skills. Don’t use google maps, you won’t ever learn the way around if you ever got lost. Don’t use autocorrect/spellcheck, you’ll never learn to spell properly.
In reality? These are just tools. After a number of years, they become normalized and we never “lose” all that much.
Edit: on second thought maybe skeptical is a better word than resistant.
Edit2: just thought of a better example. Order sets in the hospital or just electronic prescribing. “In the old days we used to memorize the doses for everything” — well ok but in the old days error rates were higher as well.
Im a radiologist. I use OpenEvidence all the time and is fantastic and I tell everyone I can about it.
I’ve also seen it be frankly wrong several times.
I think it’s best used as a super powered search engine. I use it mostly to confirm things I already knew, and to help me think of some zebras I didn’t think about, or to explain to me new or complication subjects with relevant resources to confirm the info it tells me.
I think blind trust of it can definitely be dangerous. At the very least, people need to read the articles it cites to make sure they can be trusted and to make sure they actually say what OpenEvidence concludes.
As long as people use it responsibly, it’s an incredibly useful tool. Unfortunately I think the endpoint is going to be increasing independence of midlevels using OpenEvidence and the like as their “attending” who may over rely on it.
I agree with this to the point that often I ignore the response to my prompt and just scroll to the bottom to see the relevant articles for myself. It's a far better search tool than pubmed, Google, etc.
Occasionally will confidently make a statement with a niche journal and underpowered study as evidence
Have you compared OpenEvidence to chat-gpt's extended thinking? Or perplexity?
I have never used open evidence but I use ChatGPT for combing through literature all the time and it’s dumb as fuck. I’m always having to tell it that it’s said something flat out wrong and it goes, oh you’re right, then locates the papers I want. Like everyone is saying about OpenEvidence, the results have to be critically analysed and validated.
Hard to say what sort of cortical atrophy will occur on an individual level but will AI tools continue to make the overall system dumber? Absolutely.
At a minimum will continue to inflate the sense of unearned confidence that administrators and insurance and other laymen have that they algorithmically know what’s best for the complex patient sitting in front of you that they’ve never met.
Yes they make health professionals stupid
The juniors must be taught: READ THE PRIMARY SOURCE. This was always a problem, but before it used to be abstracts, guidelines, slides, and word of mouth from colleagues, while now it's LLM. It's not that different, in my view.
Reading the primary source is not practical advice for the vast majority of clinical decisions. The typical patient-facing doc makes many dozens of clinical decisions a day. We can’t be searching for, reading, and appraising RCTs for each one. That’s why professional society guidelines exist.
At some point, you should be reading some of the primary literature. I'm not saying you should be interrupting your clinical workflow, I'm saying that your education does not stop at the end of residency.
Spot on. The whole point of CDS in the clinic is fast, accurate recall responses and quick summaries of best practice, that jog the memory and can be relied on to have been best in class peer reviewed answers.
Nobody is reading 5 sources on OpenEvidence today in the middle of clinic. Some take at face value what it on odd occasion says which is technological AI algo-rly wrong. And that is the source of a major accident waiting to happen.
Also linking me to a random non open access journal abstract, and not even the full journal article, so how is the clinician meant to do the proper due dil
I had a trainee give a whole journal club presentation that was obviously just Claude output recently...
it's very different b/c rather than having to skim multiple sources, you get an answer in 10s (w/ often misquoted studies)
I'll chime in with my two cents:
1) I don't use the new AI tools. I prefer still using uptodate, LITFL, etc.
A lot of my attendings use the newer AI tools.
We'll often do parallel searches if we have a question. For things that are algorithmic or dosage/ drug of choice related the AI tools provide pretty good information. They get the same answers I do with my more traditional methods and in roughly the same amount of time.
For things that are more nuanced (say management of specific orthopedic injuries in infants as compared to adults for a recent example), the output of AI seems to be very vague and not offering helpful guidance. However, my traditional sources are often similarly vague when it comes to very specific/niche questions as well. So it's usually a wash.
2) My greater concern is how it affects individual learning and retention over the long term. The literature on learning shows that writing something down helps us retain information better than typing likely due to the engagement of more complex motor and thought processes than typing.
Anecdotally, the information that I gain from literature deep dives is almost always better retained than information someone simply tells me of i ask. Likely due to having to use more complex thought processes to parse through all the information when searching the literature.
I fear that relying on AI output will lead to a decrease in the ability to effectively parse out good from bad information, less long term retention, and maybe even an atrophy of the more complex learning pathways.
Hopefully, I'm wrong. But when I look at what the formats of Instagram and TikTok have done to attention spans, I'm not optimistic.
Did you use to read all the studies published in fields you didn't practice in to be able to extrapolate data to a clinical question? Isn't that what we've been using uptodate and bestpractice for? Didn't the whole evidence based medicine prepare us to ask clinically relevant questions and look for good data?
If you cannot determine what good data is, the issue isn't AI.
There's nuance depending on use. The consequences of cognitive offloading during training is being investigated actively, but hasn't produced robust findings yet from what I've seen (most early fears of AI was with accuracy rather than impact on users, so its still early to say).
With appropriate AI literacy education, I don't have an issue with physicians using LLMs supplemental to an already strong knowledge foundation, but I subjectively think less is more. The bar for me as a recent preclinical student, is that if you don't have the knowledge to assess the validity of the answer, you should not ask LLMs the question.
I’m worried about this current cohort of medical students and the availability of this to them
I use it as an advanced search tool. Can’t tell you how many articles buried in the internet I’ve found that normally I’d never find.
I’d like to think the vast majority of medical professionals back check the answers you get. I mainly review the references not so much the answers. And I 50/50 find that the data is not so good or concrete. I’ve found references that are totally bogus etc.
Almost everybody I work with uses OpenEvidence, but it's always in the context of having it do the busywork so that we are more efficient.
For example, Spanish speaking patient that needs help understanding their disease? Let's use OpenEvidence to generate an after visit summary about their condition.
Need to look up papers on X topic for a prior authorization? Okay, let's have it generate five citations and an appeal note so that we can get the patient their medication.
I work with mostly millennials and I think they are all pretty aware of both how awesome AI tools can be but also how wrong and stupid AI is as well.
I have noticed that boomers and older physicians tend to somewhat over-rely / over-believe in AI's capabilities.
UpToDate is pretty useful for reference and it doesn't try to think for you.
I tell my residents not to use AI. If I catch them then I make them read the entire Up-to-Date article on the subject and present it to me. It takes maybe 15 minutes to read a long one so it seems fair to me, and I'm teaching them how to acquire medical knowledge the correct way. In short yes, we agree lol
I don’t think it’s wrong to use AI, but there are wrong ways to use AI. This sound like my old high school teachers who said it was wrong to use Internet resources as a source in my essays
If your paper is not written in cursive no one is going to read it!
This is funny because UtD is what we used to tell the residents to avoid using. Instead, they were supposed to read this or that textbook. Up to Date was the OpenEvidence of the 2000s.
You'd like to think UtD is at least human canvassed though. There is no LLM that does not hallucinate, no matter what these CEOs say, and there is no LLM that knows the broader corpus of information to give me. You maybe still have some of that problem with normal reference tools as nobody is reading a bazillion journals either.
Yes you’re right UTD is supposed to be actually edited by physicians. That’s why it has the authors, reviewers and the date it was last updated up on top.
and it made people dumb. That's what mid levels do...I'd expect a specialist to think about it more. Open evidence is basically like a summary of up-to-date, so it's dumbing down the dumbed down.
The ai tools also read utd
It’s a tool, if people aren’t trained on how to use the tool for fear of over reliance, someone is going to use it incorrectly
There is one program at one of the hospitals I work at which uses OE (and even regular ChatGPT) very heavily, and there is a clear difference between their knowledge/thinking ability and that of residents from the other programs who do not use it as much.
What advantage does open evidence have over chat-gpt's extended thinking?
The latest versions of 5.2 extended thinking are really good at literature searches.
Perplexity is also decent.
The best approach is really understanding any LLM as a sophisticated auto-complete that generates a human-like response from millions or billions of parameters rather than a nuanced colleague. I treat OE as a Google search at best.
Can you give some examples, without getting too specific?
I don't really need it for day to day patient care. Id waste time using it. I use it to help my find research articles based off a clear question or on very challenging patients (rare auto-immune presentations).
I find that open evidence generally does well. I like that it links to the guidelines and I find that it is generally correct. It's helpful for some rare stuff (like finding case report level evidence quickly) but really you should be using it more as a search function rather than something making decisions or interpreting labs for you. Although having it calculate how much some fluid will impact an electrolyte is pretty neat.
Not that I have been practicing particularly long (PGY3) but I did not really start using this until this year. I think it is important to get a foundation and reading uptodate is more helpful for this than OE.
It’s just a more refined search engine
I actually don't worry much about hallucinations with OE, because I don't think it functions like most LLMs. I think it does a great job summarizing the actual articles it's citing. The actual dangers here are many. My learners don't know how to ask the right questions, so they're going to get the wrong answers (which will be right answers to the wrong question). Moreover, the risk I see is that depending on even decent question wording it may not pull the best available evidence to summarize. It may pull what looks like higher quality sources that are actually outdated info.
Mid-levels and trainees will be at extremely high risk of all of these issues.
I agree with part of your concern, but no more so than with the routine misuse of clinical studies, guidelines, or decision rules. Clinicians misapply support tools all the time. This is another tool that can be used well or poorly, but it is not categorically different in that respect.
One point I do want to push back on is the claim that LLMs are “just predicting the next word.” That framing is misleading. Predicting the next word is how these models are trained, but it is not an adequate description of what they do in practice. If they were merely guessing the next word in a shallow sense, the output would be unusable gibberish. The fact that they can summarize evidence, compare treatments, and maintain long-range coherence tells you they have learned internal structure and relationships.
“Just predicting the next word” describes the surface math, not the system’s behavior. By that logic, ECG interpretation software “just measures voltages.” That statement is technically true but we all know that's just the mechanism of its final output.
OE is like UTD...anyone who daily uses it to make multiple decisions on pt care was either poorly trained or below average intelligence for a clinician. It might be helpful for esoteric stuff, but if you can't handle complex pathology in your field, go work at an Urgent care
And if you're a trainee who uses it hourly, you're doing your training a disservice