The AI Threat to Medical Decision Making
Will large-language models help or harm patients and doctors?
“Your medical records are a mess,” announced a Wall Street Journal headline from October 24th, 2024. It goes on to assure us: “technology can help.”
An October 26th editorial in the New England Journal of Medicine, meanwhile, sounds the exact opposite note. Co-authors Dr. Liam G. McCoy, Dr. Adam Rodman, and biomedical researcher Arjun Manrai paraphrase Thoreau in their conclusion that “we are the tools of our tools.”
For over a decade, the American medical system has been largely transformed by the government-driven imposition of particular kinds of electronic health records, or EHRs. As I detailed in National Affairs in 2019, these instruments were never designed to serve doctors or patients—at least not in an immediate, clinical sense. Medical note-keeping had traditionally been an art, meant to capture a patient’s story in a way that would be meaningful to the patient and other physicians.
Rather than provide an exhaustive list of facts—medically pertinent or otherwise—about a given patient, doctors were trained to be concise, highlighting what was most germane to the patient’s particular case and general wellbeing. EHRs, by contrast, are primarily billing instruments, designed to coordinate payment between insurers and other entities and to collect health data at scale. They were promoted almost exclusively by the companies that developed and sold them—not by doctors or patient groups. The Obama administration kicked off a series of programs meant to reward doctors who adopted EHRs—and penalize those who declined to do so—through the 2009 HITECH Act. The results, 15 years later, are troubling.
Writing in the Atlantic in 2018, Dr. Rena Xu recalled one attending doctor who, “tired of wading through a morass of irrelevant information, writes notes in the electronic chart but in parallel keeps summaries of his patients’ medical histories on hand-written index cards.”
Xu also described her own dejection after repeatedly being prevented from helping a severely ill woman—“not for any medical reason, but simply because of an inflexible computer system and a poor workflow.” Despite the good will of a team of doctors, Xu could not update the woman’s scheduled medical procedures in a way that reflected her changing needs; the computer program didn’t allow it, and the administrators with access had gone home. Xu spent hours trying to arrange things so that the patient’s care would not be delayed.
Xu goes on to note that doctors spend years studying the substance of medicine but are increasingly being yanked away from the actual diagnosis and treatment of illness, and toward the work of documenting, coordinating, managing. A combination of new laws and technologies have redefined the doctor as a sort of data-entry clerk. Xu offers a colorful analogy:
Imagine a young chef. At the restaurant where she works, Bistro Med, older chefs are retiring faster than new ones can be trained, and the customer base is growing, which means she has to cook more food in less time without compromising quality. This tall order is made taller by various ancillary tasks on her plate: bussing tables, washing dishes, coordinating with other chefs so orders aren’t missed, even calling the credit-card company when cards get declined.
Then the owners announce that to get paid for her work, this chef must document everything she cooks in an electronic record. The requirement sounds reasonable at first but proves to be a hassle of bewildering proportions. She can practically make eggs Benedict in her sleep, but enter “egg” into the computer system? Good luck. There are separate entries for white and brown eggs; egg whites, yolks, or both; cage-free and non-cage-free; small, medium, large, and jumbo. To log every ingredient, she ends up spending more time documenting her preparation than actually preparing the dish. And all the while, the owners are pressuring her to produce more and produce faster.It wouldn’t be surprising if, at some point, the chef decided to quit.
In this scenario, the chef has been trained as a chef—instructed in the detailed craft of fine cooking—but she cannot meaningfully develop her craft in her place of work. Both medicine and the culinary arts can be properly studied only through consistent practice over the course of years—chefs must work as chefs to develop beyond the basics. Doctors, too, must engage in the detailed art of diagnosis and treatment to improve their clinical reasoning.
Xu echoes a common complaint in noting that this hurried, technocratic style of medicine contributes to doctor burnout. But it has also recharacterized the practice of medicine, and the understanding of what medicine ought to be. The problem runs deeper than simply fatigue or low morale.
So how ought we to resolve it? The authors of the October 26th editorial in the New England Journal of Medicine agree that EHRs are a blight on American medicine. They state at the outset that “[p]erhaps no artifact of modern medicine has redefined medical practice more than the electronic health record,” and they outline serious problems with EHRs—which include massive data breaches and technological shortcomings resulting in severe medical errors. But they urge caution when it comes to newer technologies promising to fix it all.
Many administrators, they write, “see EHR documentation generated by large language models (LLMs) as a potential path to salvation,” given that systems like OpenAI’s GPT-4 or Google’s Gemini “are trained on large amounts of text and have demonstrated impressive abilities in processing and generating humanlike text in many domains, including medicine.”
This has broad implications; at the very least, LLMs might execute administrative tasks such as reviewing charts and taking notes on medical appointments. They might also assume roles that entail more complex reasoning, including diagnosis and treatment recommendations. While there’s quite a bit of skepticism about LLMs assuming diagnostic authority, the news that they might lift administrative loads has been greeted with cheers.
Not so fast, warn McCoy et al. It is true that EHRs impose burdens upon doctors, many of which are unrelated to medical care. But as noted, medical note taking is an art of sorts, one not entirely lost. Recall Dr. Xu’s colleague who maintains summaries of his patients’ medical histories on handwritten index cards. That is well and good for doctors who have been trained to take such notes. But if the task of note taking is removed entirely from future physicians, it may exacerbate problems regarding physician skill and patient wellbeing.
McCoy and his colleagues list four reasons “why the rush to insert LLM-generated text into EHRs is misguided and risks cementing the mistakes of the EHR past.” The first reason is straightforward enough—much of the information in EHRs is not of value to physicians, and there is little reason to believe that this would be improved by LLMs filling out EHRs in place of doctors. The little value EHRs do contain tends to be in “extra” notes provided by doctors, outside of checklists. Could LLMs meaningfully assume this task? As McCoy et al. write,
Far from being generic transcripts of patient encounters, high-quality notes incorporate the physician’s reasoning, the patient’s values, and aspects of the clinical context that may not be represented elsewhere in the chart. The imposition of a specific structure by. . . click-through checklists has already limited the degree to which documentation reflects the subtleties of clinical reasoning and patients’ goals. If it continues on its current trajectory, LLM–EHR integration may amplify these tendencies, cementing the EHR as a billing-oriented, unrepresentative proxy of an actual human being.
Second and perhaps more important is the possibility that LLMs may actually undermine clinical reasoning. Note writing is an exercise in clinical reasoning, McCoy et al. explain; it is “a deceptively complex task to generate and justify a written ‘clinical impression’ that clearly and concisely summarizes the clinician’s thinking while reflecting an appropriately calibrated degree of confidence.” These “ostensibly clerical tasks,” they argue, are not incidental “to clinical reasoning but inherent to it.”
In other words, the process of note writing forces doctors to crystallize clinical decisions in their own minds—and to convey these decisions to patients and other clinicians.
There is a risk to removing this task from doctors entirely: “Health systems seeking to implement LLMs in note writing need to thoroughly evaluate these models’ effects on reasoning and decision making; such uses should not be automatically treated as low risk.”
The rest of the NEJM article is worth reading in full, particularly its brief history of medical note taking. Most notably, the authors conclude that using LLMs to “fix” problems with EHRs risks cementing the worst of the EHR status quo.