Abstract
Primary care physicians are likely both excited and apprehensive at the prospects for artificial intelligence (AI) and machine learning (ML). Complexity science may provide insight into which AI/ML applications will most likely affect primary care in the future. AI/ML has successfully diagnosed some diseases from digital images, helped with administrative tasks such as writing notes in the electronic record by converting voice to text, and organized information from multiple sources within a health care system. AI/ML has less successfully recommended treatments for patients with complicated single diseases such as cancer; or improved diagnosing, patient shared decision making, and treating patients with multiple comorbidities and social determinant challenges. AI/ML has magnified disparities in health equity, and almost nothing is known of the effect of AI/ML on primary care physician-patient relationships. An intervention in Victoria, Australia showed promise where an AI/ML tool was used only as an adjunct to complex medical decision making. Putting these findings in a complex adaptive system framework, AI/ML tools will likely work when its tasks are limited in scope, have clean data that are mostly linear and deterministic, and fit well into existing workflows. AI/ML has rarely improved comprehensive care, especially in primary care settings, where data have a significant number of errors and inconsistencies. Primary care should be intimately involved in AI/ML development, and its tools carefully tested before implementation; and unlike electronic health records, not just assumed that AI/ML tools will improve primary care work life, quality, safety, and person-centered clinical decision making.
- Artificial Intelligence
- Clinical Decision-Making
- Complexity Science
- Information Technology
- Machine Learning
- Medical Informatics
- Primary Care Physicians
- Primary Health Care
- Quality Improvement
Background
Artificial intelligence (AI), and its branch machine learning (ML), have been touted as among the “10 big advances that will improve life, transform computing and maybe even save the planet.”1 (Table 1) Other less dramatic AI/ML supporters say that it will facilitate new opportunities for doctor-patient connection,2 and if implemented wisely, AI can free up physicians’ cognitive and emotional space for their patients, even helping them to become better at being human.”3 Some primary care clinicians are excited at the possibility of AI/ML and envision many potential uses,3–5 whereas others are more apprehensive, believing that the doctor-patient relationship is founded on communication and empathy,6 and that AI/ML cannot duplicate this.7
The recent hype over nonhealthcare AI/ML applications demonstrates that the potential benefits and harms of AI/ML are on many peoples’ minds. Autonomous vehicles have had some successes, but also failures leading to fatal accidents that have prompted regulatory scrutiny.8 The large language model ChatGPT demonstrated success at passing standardized legal exams, but when tasked to write legal briefs, it wrote about nonexisting case law precedents.9 In health care, ChatGPT has successfully answered medical license test questions,10 but ChatGPT has been found to produce errors in both stating facts and synthesizing data from the medical literature,11 and there is now evidence emerging that the mass use of ChatGPT actually worsens its accuracy and reliability.12
Complexity science may provide some insight into which predictions about the future of AI/ML in primary care are full of reason or full of hype (Table 2). Briefly, complexity arises from the interconnectedness and interdependence of multiple agents in a particular context (a hospital setting vs a clinic vs the community, eg,).13 The dynamics between these agents results in feedback loops that alter the nature and behavior of its agents, that is, they adapt to changing circumstances resulting in emergent behaviors, which can be expected, but not predicted.14,15 Complex systems consist of a large number of elements that in themselves can be simple. Even if specific elements only interact with a few others, the effects of these interactions are propagated throughout the system, and these interactions are nonlinear. An everyday example could be a shortage of toilet article “caused” by the COVID-19 virus. Other aspects of complex adaptive systems (CAS) are further explained in the Appendix.
Primary care must manage many interconnected and interdependent issues, and thus by definition is complex. Unlike specialty care, there is a breadth of undifferentiated patient presentations and irregular timing of presentations to primary care and acute services.16,17 Their clinic visits are more complex than specialists’ visits.18 Primary care providers have to navigate the greatest volume of patient care information across all other health care entities such as hospitals, nursing homes, lab and imaging facilities, specialists, insurance companies, government agencies, pharmacies, home health agencies, and so on. AI/ML could help or harm managing this information.
Supporters believe that AI/ML will revolutionize primary care—improving risk prediction and intervention, dispensing medical advice and triage, improving clinic workflow, broadening diagnostic capabilities, assisting in clinical decision making, assisting in clerical work, and aiding population or panel management including risk assessments, and remote patient monitoring.3 However, primary care AI/ML implementation research remains at a very early stage of maturity,19 and as with many technological advances before, there is no guarantee that AI/ML will successfully transform care delivery and/or care outcomes.
The purpose of this article is to discuss the opportunities and challenges of AI/ML in primary care, seen through the lens of CAS.
The Opportunities and Challenges of AI/ML for Primary Care
Detection/Diagnosis of Single Diseases
Early examples of successful implementation of AI/ML include analyzing data from images to diagnose specific conditions. Examples include retinal scans to diagnose diabetic retinopathy,20 data from mammogram imaging to identify radiographic images suggestive of breast cancer,21 using wearables to detect atrial fibrillation,22 and using tele-dermatology with AI/ML assistance to improve diagnostic accuracy for skin lesions.23 AI/ML successes have been described as less that of a certain diagnosis and more like a good guess at what the answer might be,21 which helps explain why the impact of AI/ML even in a digital field such as diagnostic radiology has been only modest so far.24 For primary care, AI/ML might augment a physician’s knowledge and confidence to diagnose rare diseases.25
However, even apparent successes of AI/ML for diagnostic outcomes have been found to be nonsensical when deeply explored post hoc. For example, an AI tool for detecting melanomas in photographs of skin lesions did so by recognizing that photographs with cancer were more likely to have small rulers in the image.27 Judging AI/ML accuracy in clinical diagnosis is particularly challenging outside of well-structured case vignettes.28 Although AI/ML-trained systems may aid the diagnostic process, it cannot determine the final diagnosis, which involves human interactions, judgments, and social systems understandings that are beyond what computers can model.29
Treat Specific Diseases
AI/ML also has the potential to inform decision making by quickly synthesizing a wide variety of information from the medical literature or electronic health record (EHR). It could also incorporate patient pathways including hospital discharge summaries, drug databases, drug-drug and drug-disease interactions with the ability to analyze large amounts of data and discover correlations that may have been missed by researchers and health care providers, enhancing patient-centered care.30 For example, a hospital bedside AI/ML-based consultation service had only a limited effect on treatment decisions in 10 out of 100 queries, mostly involving unusual understudied patient presentations or rare diseases.31
AI/ML was not successful at improving cancer treatment, which may more closely reflect the complexity of primary care. Studies concluded that IBM Watson did not improve on the decisions of oncologists, and the project was abandoned.32,33 Attempts involving diagnoses that require integrating clinical findings, a crucial task in primary care, have not achieved the same success as single disease efforts.34
AI/ML models based purely on historic data would only learn the prescribing habits of physicians in retrospect that may not represent an ideal state in emerging practices.35 For example, computerized decision support systems (CDSS) designed with a high tolerance for risk favored algorithm performance over patient safety, potentially exposing patients to inappropriate medications.36 Determining if AI/ML improves patient outcomes remains the most important test, and currently there is scant evidence of downstream benefit. AI/ML optimists consider real world data such as pharmaceutical postmarketing surveillance as a valid source of evidence to connect treatments with outcomes.37 Skeptics believe AI/ML is no substitute for more rigorous methodologies, which may include randomized controlled trials or learning health system approaches.38
Predict Future Health Events
AI/ML may provide new opportunities to construct more acute predictions of disease risk, which could inform smarter decision making algorithms.39 Oak Street Health implemented an ML approach that increased the number of patients identified at high risk for hospitalization compared with retrospective models, but not improve mortality.40 ML may be particularly useful when dealing with “wide data,” where the number of subjects is greater than that of input variables. Other observers believe most studies on ML-based prediction models show poor methodological quality and are at high risk for bias.41 Existing prediction models that use large data sets and AI/ML can give similar population level estimates of cardiovascular disease, while giving vastly different answers for individual patients.42
An explanation for this discrepancy is that ML focuses on the strength of the correlation between variables rather than the direction of causality,26 and ML may add little value to predictions of future events compared with traditional methods. And even if one assumes that AI/ML applications might increase the predictive accuracy for future events, it does not mean that there is an action available to decrease the risk, which in turn raises important ethical concerns.
Decrease Administrative Burden
AI/ML using voice recognition has been implemented to listen to a physician-patient encounter and document a preliminary note, or autocharting.43 A tool called Suki was found in a demonstration project to reduce documentation time with this technology by 72%,44 critical in an age where such administrative burden has been clearly linked to growing rates of reported US clinician burnout.45 An assessment of the performance of ChatGPT to generate histories of present illness documentation in medical notes found that it sometimes reported information that was not present in the source dialog, an error called “hallucinating.”46 Other possible uses include optimizing coding for value-based payments and automating aspects of previsit planning.3 However, a substantial proportion of patient symptoms in primary care are vague, such that even human scribes present in the room do not agree how to document them.43 A possible distinction between these disparate findings may be recording and categorizing information (converting spoken history and physical examination elements to a dictated note) versus improving understanding. AI/ML used to automate prior authorizations has also been proposed.47
Expand Primary Care Capacity
Some AI/ML innovations aimed to extend the work of the primary care team beyond the office visit, the home visit, and even beyond telehealth. Conversational agents using AI/ML improved depression scores in small 2-week trial, a time frame that is likely clinically irrelevant.48 A recent randomized comparative effectiveness trial indicated that AI/ML could allow many more patients to be served effectively by CBT-CP programs using the same number of therapists.49 Other examples of AI/ML used for single-issue primary care tasks include mental health assessments during telehealth calls,50 mental health support,51 and weight management.52
AI/ML may help with remote monitoring, for example, alerting patients and doctors when a continuous glucose monitor stops functioning, or decreasing false alarms in telemetry units.53 However, algorithms that are applied repeatedly to track a patient’s condition likely will trigger repeated false alarms or information the clinician is already aware of, which contributes to alarm fatigue.54
Data Issues—Signal, Noise, and Action
Data Accuracy
Some of the successes of AI/ML outside of health care have been noteworthy, for example, the progression of AI to defeat humans in the games of checkers, chess, then Go.55 These programs were trained on tens of millions of previous games, where the input data were essentially perfect. In contrast, “noisy” data decreases classification accuracy and prediction results in ML.56 In fact, the information that should be classified as signal versus noise is difficult to determine even for highly focused questions in medicine, for example, determining if heart rate variability derives more from normal physiologic events (stress), normal “abnormal” states (sinus bradycardia in a young athlete), or disease (paroxysmal tachycardias).57
Noisy data are a challenge in all potential uses of AI/ML, but will likely be an especially significant barrier to the utility of ML in primary care, where inaccurate data are already abundant, which erodes accuracy of ML predictive models.58 In a detailed investigation of EHR data fidelity, diagnostic codes for hypoglycemia were found to have moderate positive predictive value (69.2%) and moderate sensitivity (83.9%).59 Accuracy rates of medical registry data ranged from 67% to 100% and completeness ranged from 30.7% to 100% in another study.60 Important drivers of EHR inaccuracies are copied and pasted notes, a work pattern that is not likely to change going forward.61 In addition, the timeliness and accessibility of EHR data are challenging. Raw data from an EHR must be cleaned and formatted before use, but if this process is delayed, the models cannot be applied in real time for patient care.62 Whether ML approaches can sort out which data inconsistencies are errors, and which add useful information, remains unknown.
Actionable Data
AI/ML has been shown to predict risks of future events in some cases. But merely providing the probability of a particular outcome, such as readmission or mortality risk, is unlikely to change physician or patient behavior in most primary care settings, as physicians incorporate their patient’s unique context and preferences into their decision making.63 Poor calibration might often be expected with application of a model from one population to another, which can lead to harmful decisions.63,64 Although modern imputation methods can mitigate some bias due to missing information, these methods are less useful in EHR settings where it is not possible to distinguish the true absence of a relevant characteristic (such as a particular comorbidity) from data incompleteness. Even if efforts are undertaken to maximize calibration, patterns seen in existing data sets should be considered as no more than hypothesis generating, and will still require classic hypothesis testing.65
A summary of the potential uses and features of AI/ML applications for which primary care physicians could be excited or apprehensive is provided in Table 3.
New Approaches May Emerge Using AI/ML in Complex Adaptive Systems
Complexity principles have been used to improve health system outcomes with AI/ML at the macro-level. Using hospital admissions and emergency data, Monash Health in Victoria, Australia developed an algorithm to predict patients who were at high risk of readmissions within the next 12 months. The algorithm had a positive predictive value of 33%.66,67 Monash Health leveraged the primary care teams’ preexisting relationships with patients and initiated an outreach team of medical, allied health and community health workers (CHWs) to augment the care by primary care physicians using an online system to predict deteriorating patient journeys. Nonclinicians made regular monitoring phone calls (≥ weekly) prompted by a clinical algorithm that continually predicted unmet needs and risk of deterioration or admission based on the most recent phone calls. The intense outreach effort using conversations aimed to address unintended repeat investigations, loneliness, hospital infection, and posthospital syndrome, aligned with the goals of patients.68 Interventions the clinical teams developed were not defined a priori, but emerged through dynamic feedback loops between the interacting agents.
They reduced readmissions by 1.1 bed-day per person per month, with a > 60% participation rate in eligible patients.68 The physicians, wider teams, and CHWs could continually update their interventions, which is a feature of a well-functioning CAS. Clinical teams learned from their ongoing interactions with patients and adjusted their recommendations based on each patient’s personal journey. The AI/ML software looked at individual call data, records of all the calls and outcomes, and all patients in the database. It improved CHWs’ prediction of an event, that is, the likelihood that something would happen before the next weekly call, by 80%.69
In this example, the primary care teams used AI/ML to manage a large volume of data that would be difficult and expensive for humans, but relied on existing relationships and clinician judgment to determine the best actions. The researchers concluded that the algorithm determined only 10% to 20% of the success; the primary care team workflow determined 80% to 90%.
Other Concerns with AI/ML
Health Equity
The US Food and Drug Administration and others realize AI/ML has inclusiveness problems and may exacerbate outcome differences in vulnerable populations. There are many types of biases in machine learning.70 Many data sets lack diversity and completeness of data by gender, race, ethnicity, geographic location, and socioeconomic status.71,72 For example, AI/ML was used to identify cancerous moles in whites, but not blacks.73 In contrast, AI/ML has been shown to positively impact implicit racial bias in the prevention of deep vein thrombosis.74 Human health care workers certainly have biases too.
Datasets that reflect historic disparities in care related to racism and privilege have been shown to produce AI/ML results that retain these biases and thereby perpetuate the structural disadvantages and disparities.75 Users of AI/ML may not even recognize the biases. Clinicians with a propensity to trust suggestions from AI/ML support systems may discount other relevant information, leading to so-called automation complacency.76 To combat this, fairness audits were used to reflect on AI/ML performance in prompts for end-of-life care planning, and found application performance differences by race/ethnicity, for example, in Hispanic/Latino males whose race was recorded as “other.”77 This particular audit required 115 person-hours and did not add clinically meaningful information due to poor demographic data quality and lack of data access. AI/ML was also largely unsuccessful at incorporating social determinants of health indicators into prospective risk adjustment for private insurance payments in the U.S improving the predictive ratio by only 3%,78 though this performance may worsen over time as so-called latent biases emerge with subsequent use of an AI/ML tool.79
It is beyond the scope of this commentary to review other concerns about AI/ML such as privacy, data ownership and transferability, intellectual property, cybersecurity, and medical liability for the creators and owners of AI/ML tools.80 AI/ML was recently challenged to “not just replicate human thinking processes, but should aim to exceed them,” a bar that is likely insurmountable for many aspects of complex primary care.65
Discussion
Despite the emergence of intriguing Al/ML tools such as ChatGPT, successful transformation of primary care using AI/ML is far from guaranteed. Primary care should play a critical role in developing, introducing, implementing, and monitoring AI/ML tools, especially regarding common symptoms, acute diseases, chronic diseases, and preventive services.81 To avoid making the same mistakes with AI/ML implementation as happened with the forced EHR implementation onto primary care without adequate vetting, US policy makers should assume that AI/ML products will only improve primary care if its stakeholders are heavily involved in its development, piloting, vetting, and wider implementation.
No tool can account for the inherent complexity of primary care. It is not the existence of AI/ML tools that are the potential problem, but the way the tools are used that matters–are they ultimately able to integrate the tacit domains required for participatory, effective, and ethical decision making? Do the potential cognitive and data-management improvements of AI/ML add any value to patient outcomes beyond the pre-existing deep relationships between primary care clinicians and their patients?
Complexity science recognizes that primary care decision making emerges from not only the doctor-patient relationships and knowledge of confounding factors for treating an individual patient such as comorbidities, social determinant challenges, and unique patient attitudes and beliefs, but also the interrelated hierarchical layers and feedback loops of health care systems.13 AI/ML approaches may harm primary care by acting to minimize an understanding of these complexities, often by limiting the number of features that it uses to develop its algorithms.57 In fact, many potential AI/ML tools are described as synthesizing information (EHR data and billing data) from the front lines of the health care system hierarchy and sending conclusions to the macro administrative layers (analysts and administrators), which is opposite that of a natural and sustainable CAS.82 The series of interventions in Victoria, Australia demonstrate a CAS-consistent flow of information. AI/ML was used to collate a large amount of data to identify patients who were potentially about to “tip” into worse health states, but the synthesized information was not sent to the top levels of the system hierarchy, but rather to the front-line clinicians. In addition, AI/ML was not used to make medical decisions of how to respond to the patients flagged as being at increased risk for hospitalization. What made this AI/ML application successful was not only the model itself (which actually played a relatively small role), but more importantly the way the model augmented existing relationships and human-driven processes of care.
A key limitation of AI/ML lies in the fact that its predictions arise from existing data. It cannot, at least at this point in time, synthesize the multiple perspectives of a health professional in the context of the patient in front of them. Creating new data upfront for unmet clinical needs and specific purposes is time and resource expensive, but gives the best chance of being useful in practice.
A deliberative patient-physician relationship is important for healing, particularly for complex conditions and when there is a high risk of adverse effects, because individual patients’ preferences differ.83 There are no algorithms for such situations, which change depending on emotions, nonverbal communication, values, personal preferences, prevailing social circumstances, and many other factors. For example, AI/ML will not likely reduce the uncertainty inherent in making ethical decisions about care at the end of life. AI/ML sceptics point out that algorithms and prediction instruments, ironically, exercise tyranny over the true freedom of moral agency that we claim to be respecting in our patients.84
Policy makers (and investors) should not just assume that AI/ML tools can significantly improve the complex person-centered work of primary care physicians and their teams. Useful applications of AI/ML in primary care will undoubtedly emerge. Complexity science suggests that it is much more likely that these tools may assist primary care with discrete functions with highly focused outcomes, but it is very unlikely that AI/ML tools will replace complex relationship-centered decision making by physicians and their teams (though the team composition may evolve if administrative burden can truly be reduced).
Acknowledgments
The authors acknowledge Jacqueline Kueper, PhD, Ginetta Salvaggio, MD, and C. J. Peek, PhD, for their participation in the original NAPCRG Forum and comments on this subject.
Appendix.
Complex Adaptive Systems Further Explained
Complex systems contain many direct and indirect feedback loops. Complex systems are open systems—they exchange energy or information with their environment (their context) — and operate dynamically. Any complex system thus has a history, and the history is of cardinal importance to the behavior of the system influenced by its previous path. Because the interactions are rich, dynamic, fed back, and, above all, nonlinear, the behavior of the system as a whole cannot be predicted from an inspection of its components. The notion of “emergence” is used to describe this aspect. The presence of emergent properties does not provide an argument against causality, only against deterministic forms of prediction. Complex systems are adaptive. They can (re)organize their internal structure without the intervention of an external agent.1
Practice Domains
Knowledge itself is complex and thus not all knowledge contributes equally to what we know.2 The Cynefin framework (Figure 1a) is 1 approach to understand decision making in complex systems, and it helps visualize the medical knowledge domains (Figure 1b) that facilitate clinical decision making in a primary care context.3 In the obvious quadrant, direct cause and effect relies on explicit knowledge trials such as randomized controlled trials (RCTs) and meta-analyses of RCTs. This is the domain most conducive to monitoring by simple single disease guidelines. Perhaps the recent success of large language models such as ChatGPT to answer medical license test questions fit in this domain.4 On the other hand, ChatGPT has been found to produce errors in stating facts and synthesizing data from the medical literature.5
In the complicated quadrant, cause and effect are discernable through multilayered interacting parts that also rely on explicit knowledge trials. In these clinical scenarios, complicating factors such as comorbidities may influence patient care recommendations, but the relationships of inputs and outputs are linear and follow parametric patterns. An example is balancing the negative impacts of comorbidities on a decision of whether a major surgical procedure is more likely to benefit or harm a patient.
In the complex quadrant, there are so many interacting parts whose relationships are perceivable, but not fully predictable in real time, and thus it is difficult to predict the behaviors and outcomes based on the knowledge of its component parts. Multiple layers of system hierarchies interact through nonlinear feedback loops with nonlinear relationships between inputs and outputs, making outcomes unpredictable and sometimes surprising. Relationships between inputs and outputs often follow log-linear, or Pareto distributions. An example is caring for a dying patient and her family, balancing all the often competing organic medical, psychological, legal, and familial needs. It is often tacit knowledge rather than explicit data that direct care for these complex needs.
In the chaotic quadrant, the various components have no apparent relationship to each other, leading to a crisis with an emergent new order and/or a breakdown of the existing order, for example, the early days of the COVID-19 pandemic.
The ability of AI/ML to add value to care delivery likely diminishes as 1 moves from the simple to the chaotic domains. A further examination of complex system understandings and the role of AI/ML is shown in Table A1.
Most existing successes in AI/ML represent small and constrained components of the complex health care system such as measuring pixels on an image to help make a diagnosis, or using voice recognition to monitor and treat a single mental health concern. The data used are relatively confined and have linear deterministic relationships with outcomes (a direct predictable link from input to output). AI/ML has generally failed when more complexities are considered across different informational silos, agents, and hierarchies; and when the relationship between data input and desired outcomes are nonlinear with power law distributions of inputs and outputs, include feedback loops, and are nondeterministic.
Notes
This article was externally peer reviewed.
Conflict of interest: Dr. Young discloses that he is the sole owner of SENTIRE, LLC, which is a novel primary care documentation, coding, and billing system. Dr. Lin is a principal investigator working with companies and non-profit organizations through grants and sponsored research agreements administered by Stanford University. Current and previous collaborators include Amazon, American Academy of Family Physicians, American Board of Family Medicine, Center for Professionalism and Value in Health Care, Codex Health, DeepScribe, Google Health, Omada Health, Predicta Med, Quadrant Technologies, Soap Health, Society of Teachers of Family Medicine, University of California San Francisco, and Verily. With the sole exception of Codex Health, where he serves as VP of Health Sciences as a paid consultant, neither he nor any members of his immediate family have any financial interest in these organizations. Dr. Lin is the James C. Puffer/American Board of Family Medicine Fellow at the National Academy of Medicine. The opinions expressed are solely his own and do not represent the views or opinions of the National Academies. The other authors declare no conflicts.
Funding: None.
To see this article online, please go to: http://jabfm.org/content/37/2/332.full.
- Received for publication June 6, 2023.
- Revision received August 8, 2023.
- Accepted for publication August 10, 2023.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵