Skip to main content

Main menu

  • HOME
  • ARTICLES
    • Current Issue
    • Archives
    • Special Collections
    • Abstracts In Press
  • INFO FOR
    • Authors
    • Reviewers
    • Call For Papers
    • Subscribers
    • Advertisers
  • SUBMIT
    • Manuscript
    • Peer Review
  • ABOUT
    • The JABFM
    • The Editing Fellowship
    • Editorial Board
    • Indexing
  • CLASSIFIEDS
  • Other Publications
    • abfm

User menu

Search

  • Advanced search
American Board of Family Medicine
  • Other Publications
    • abfm
American Board of Family Medicine

American Board of Family Medicine

Advanced Search

  • HOME
  • ARTICLES
    • Current Issue
    • Archives
    • Special Collections
    • Abstracts In Press
  • INFO FOR
    • Authors
    • Reviewers
    • Call For Papers
    • Subscribers
    • Advertisers
  • SUBMIT
    • Manuscript
    • Peer Review
  • ABOUT
    • The JABFM
    • The Editing Fellowship
    • Editorial Board
    • Indexing
  • CLASSIFIEDS
  • JABFM on Bluesky
  • JABFM On Facebook
  • JABFM On Twitter
  • JABFM On YouTube
LetterCorrespondence

Re: Performance Evaluation of the Generative Pre-Trained Transformer (GPT-4) on the Family Medicine In Training Examination

Karim Hanna
The Journal of the American Board of Family Medicine August 2025, DOI: https://doi.org/10.3122/jabfm.2024.240404R0
Karim Hanna
From the University of South Florida, Morsani College of Medicine, Department of Family Medicine, Tampa, Florida 33612
MD, FAAFP, FAMIA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • References
  • Info & Metrics
  • PDF
Loading

To the Editor: I am writing in response to the recent article in JABFM by Ting Wang et al. exploring the ability of ChatGPT, a large language model (LLM), to pass the family medicine in training examination.1 As someone who has also studied this phenomenon and published similar findings in Family Medicine, where our research compared LLMs' proficiency in taking the family medicine in-training examination,2 I am compelled to reflect on the implications of these advances.

Board examinations exist fundamentally to uphold public trust in physicians, mostly in physicians' medical knowledge. These rigorous, standardized tests serve as a benchmark, offering assurance to patients, institutions, and colleagues that a physician is competent and merits the responsibility of medical care. As seasoned practitioners, we may come up with our grievances with standardized exams, yet board exams help show the overarching principle: our patients, employers, and professional peers must trust in our expertise and abilities.

Historically, board certification has also provided reassurance to employers seeking to hire qualified, competent doctors. However, the landscape of certification became more nuanced when the American Board of Family Medicine (ABFM) transitioned to a longitudinal assessment model. This change emphasizes a continuous evaluation of medical knowledge through periodic questions, trusting that those who stay engaged are maintaining competence. Herein lies a potential vulnerability: the advent of publicly accessible and increasingly sophisticated LLMs.

These tools, now widely available and impressively accurate, can be leveraged to effortlessly answer board examination questions. Physicians can (with near certainty of passing) use LLMs to complete these assessments. This is not merely an abstract concern but a very real challenge to the integrity of our certification system. Of course, there has always been an implicit honor code—an expectation that physicians will not use extensive references like Harrison's Principles of Internal Medicine or similar resources while testing. However, as I consider the Harrison's textbook on my bookshelf while writing this letter, I recognize that AI models introduce an unprecedented ease and ubiquity of assistance.

For board certification to retain its credibility and fulfill its foundational purpose, I believe we must reevaluate our approach. The original high-stakes examination format, which emphasized unaided knowledge, could perhaps be reinstated. But as a proponent of AI in medicine, I recognize the incredible potential of these technologies to augment our practice and education. Nevertheless, we must draw a firm distinction: if we decide to integrate AI into our assessments, we are setting a precedent for physicians who may become overly reliant on AI and less capable of independent, critical medical decision making. This, to me, is an undesirable outcome that threatens the autonomy and reliability expected of medical professionals.

The board certification questions should evolve to focus on question styles that test medical reasoning and decision making under uncertainty. With scenarios where, even with access to AI, the public still relies on physicians' expertise. The measurement construct of the board certification should align with this goal, ensuring it truly evaluates a physician's ability to navigate complex medical situations independently. In addition, the examination format should explore testing competencies beyond medical knowledge, such as communication and teamwork, through methods like direct observation, simulated patient interactions, or case-based discussions.

Perhaps board certification should evaluate patient cases we’ve already cared for, or maybe an old-fashioned USMLE Step 2 CS-style standardized patient examination? The integration of AI into family medicine3 is inevitable, but our certification processes must carefully weigh the implications. If our goal is to preserve public trust and professional standards, we must ensure that our examinations—and by extension, our physicians—are uncompromising in their demonstration of medical competence.

Notes

  • To see this article online, please go to: http://jabfm.org/content/38/3/607.full.

References

  1. 1.↵
    1. Wang T,
    2. Mainous A,
    3. Stelter GK,
    4. O’Neill TR,
    5. Newton WP
    . Performance evaluation of the Generative Pre-trained Transformer (GPT-4) on the family medicine in-training examination. J Am Board Fam Med 2024;37:528–82.
    OpenUrlAbstract/FREE Full Text
  2. 2.↵
    1. Hanna RE,
    2. Smith L,
    3. Mhaskar R,
    4. Hanna K
    . Performance of language models on the family medicine in-training exam. Fam Med 2024;56:555–60.
    OpenUrlPubMed
  3. 3.↵
    1. Hanna K,
    2. Chartash,
    3. Liaw DW,
    4. et al
    . Family medicine must prepare for artificial intelligence. J Am Board Fam Med 2024;37:520–4.
    OpenUrlAbstract/FREE Full Text
PreviousNext
Back to top

In this issue

The Journal of the American Board of Family     Medicine: 39 (1)
The Journal of the American Board of Family Medicine
Vol. 39, Issue 1
1 Jan 2026
  • Table of Contents
  • Index by author
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on American Board of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Re: Performance Evaluation of the Generative Pre-Trained Transformer (GPT-4) on the Family Medicine In Training Examination
(Your Name) has sent you a message from American Board of Family Medicine
(Your Name) thought you would like to see the American Board of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
16 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Re: Performance Evaluation of the Generative Pre-Trained Transformer (GPT-4) on the Family Medicine In Training Examination
Karim Hanna
The Journal of the American Board of Family Medicine Aug 2025, DOI: 10.3122/jabfm.2024.240404R0

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Re: Performance Evaluation of the Generative Pre-Trained Transformer (GPT-4) on the Family Medicine In Training Examination
Karim Hanna
The Journal of the American Board of Family Medicine Aug 2025, DOI: 10.3122/jabfm.2024.240404R0
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Notes
    • References
  • References
  • Info & Metrics
  • PDF

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • The 4Cs of Primary Care, Leveraging Artificial Intelligence, and Improving Clinical Practice
  • The 4Cs of Primary Care, Leveraging Artificial Intelligence, and Improving Clinical Practice
  • Google Scholar

More in this TOC Section

  • Re: Practical Recommendations for Minimizing Pain and Anxiety with IUD Insertion
  • Response: Re: Practical Recommendations for Minimizing Pain and Anxiety with IUD Insertion
  • Response: Re: Prevalence and Associated Factors of Fluoride Varnish Application
Show more Correspondence

Similar Articles

Navigate

  • Home
  • Current Issue
  • Past Issues

Authors & Reviewers

  • Info For Authors
  • Info For Reviewers
  • Submit A Manuscript/Review

Other Services

  • Get Email Alerts
  • Classifieds
  • Reprints and Permissions

Other Resources

  • Forms
  • Contact Us
  • ABFM News

© 2026 American Board of Family Medicine

Powered by HighWire