Skip to main content

Main menu

  • HOME
  • ARTICLES
    • Current Issue
    • Abstracts In Press
    • Archives
    • Special Issue Archive
    • Subject Collections
  • INFO FOR
    • Authors
    • Reviewers
    • Call For Papers
    • Subscribers
    • Advertisers
  • SUBMIT
    • Manuscript
    • Peer Review
  • ABOUT
    • The JABFM
    • The Editing Fellowship
    • Editorial Board
    • Indexing
    • Editors' Blog
  • CLASSIFIEDS
  • Other Publications
    • abfm

User menu

Search

  • Advanced search
American Board of Family Medicine
  • Other Publications
    • abfm
American Board of Family Medicine

American Board of Family Medicine

Advanced Search

  • HOME
  • ARTICLES
    • Current Issue
    • Abstracts In Press
    • Archives
    • Special Issue Archive
    • Subject Collections
  • INFO FOR
    • Authors
    • Reviewers
    • Call For Papers
    • Subscribers
    • Advertisers
  • SUBMIT
    • Manuscript
    • Peer Review
  • ABOUT
    • The JABFM
    • The Editing Fellowship
    • Editorial Board
    • Indexing
    • Editors' Blog
  • CLASSIFIEDS
  • JABFM on Bluesky
  • JABFM On Facebook
  • JABFM On Twitter
  • JABFM On YouTube
Research ArticleOriginal Research

Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination

Ting Wang, Arch G. Mainous, Keith Stelter, Thomas R. O’Neill and Warren P. Newton
The Journal of the American Board of Family Medicine July 2024, 37 (4) 528-582; DOI: https://doi.org/10.3122/jabfm.2023.230433R1
Ting Wang
From the American Board of Family Medicine, Lexington, KY (TW, KS, TRO, WPN); Department of Health Services Research, Management and Policy, University of Florida, Gainesville, FL (AGM); Department of Community Health and Family Medicine, University of Florida, Gainesville, FL (AGM); Mayo Clinic Health System, Mankato, MN (KS); Department of Family Medicine, University of North Carolina, NC (WPN).
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Arch G. Mainous III
From the American Board of Family Medicine, Lexington, KY (TW, KS, TRO, WPN); Department of Health Services Research, Management and Policy, University of Florida, Gainesville, FL (AGM); Department of Community Health and Family Medicine, University of Florida, Gainesville, FL (AGM); Mayo Clinic Health System, Mankato, MN (KS); Department of Family Medicine, University of North Carolina, NC (WPN).
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Keith Stelter
From the American Board of Family Medicine, Lexington, KY (TW, KS, TRO, WPN); Department of Health Services Research, Management and Policy, University of Florida, Gainesville, FL (AGM); Department of Community Health and Family Medicine, University of Florida, Gainesville, FL (AGM); Mayo Clinic Health System, Mankato, MN (KS); Department of Family Medicine, University of North Carolina, NC (WPN).
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Thomas R. O’Neill
From the American Board of Family Medicine, Lexington, KY (TW, KS, TRO, WPN); Department of Health Services Research, Management and Policy, University of Florida, Gainesville, FL (AGM); Department of Community Health and Family Medicine, University of Florida, Gainesville, FL (AGM); Mayo Clinic Health System, Mankato, MN (KS); Department of Family Medicine, University of North Carolina, NC (WPN).
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Warren P. Newton
From the American Board of Family Medicine, Lexington, KY (TW, KS, TRO, WPN); Department of Health Services Research, Management and Policy, University of Florida, Gainesville, FL (AGM); Department of Community Health and Family Medicine, University of Florida, Gainesville, FL (AGM); Mayo Clinic Health System, Mankato, MN (KS); Department of Family Medicine, University of North Carolina, NC (WPN).
MD, MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • References
  • Info & Metrics
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Figure 1.
    • Download figure
    • Open in new tab
    Figure 1.

    The key element of Python code to use GPT-4 API.

  • Figure 2.
    • Download figure
    • Open in new tab
    Figure 2.

    Example of user inquiry with “Instruct” and “Prompt” components and GPT-4’s response.

  • Figure 3.
    • Download figure
    • Open in new tab
    Figure 3.

    Response pattern of GPT (GPT-4 on the left panel and GPT-3.5 on the left panel) ordered by item difficulty. Green dots indicate correct responses. Red circles indicate incorrect response.

  • Figure 4.
    • Download figure
    • Open in new tab
    Figure 4.

    One example of Chain-of-thought prompt.

  • Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab
    Figure
    • Download figure
    • Open in new tab

Tables

  • Figures
    • View popup
    Table 1.

    Correct Percentage and Scaled Score for GPT-3.5 and GPT-4, in Comparison with National Residents' Performance

    Correct PercentageScaled Score
    GPT-3.556%280
    National PGY-1 Average61%336
    National PGY-2 Average66%397
    National PGY-3 Average68%433
    GPT-484%690
PreviousNext
Back to top

In this issue

The Journal of the American Board of Family     Medicine: 37 (4)
The Journal of the American Board of Family Medicine
Vol. 37, Issue 4
July-August 2024
  • Table of Contents
  • Table of Contents (PDF)
  • Cover (PDF)
  • Index by author
  • Back Matter (PDF)
  • Front Matter (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on American Board of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination
(Your Name) has sent you a message from American Board of Family Medicine
(Your Name) thought you would like to see the American Board of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
2 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination
Ting Wang, Arch G. Mainous, Keith Stelter, Thomas R. O’Neill, Warren P. Newton
The Journal of the American Board of Family Medicine Jul 2024, 37 (4) 528-582; DOI: 10.3122/jabfm.2023.230433R1

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Performance Evaluation of the Generative Pre-trained Transformer (GPT-4) on the Family Medicine In-Training Examination
Ting Wang, Arch G. Mainous, Keith Stelter, Thomas R. O’Neill, Warren P. Newton
The Journal of the American Board of Family Medicine Jul 2024, 37 (4) 528-582; DOI: 10.3122/jabfm.2023.230433R1
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Introduction
    • Methods
    • Results
    • Discussion
    • Conclusions
    • Appendix
    • Notes
    • References
  • Figures & Data
  • References
  • Info & Metrics
  • PDF

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • Clinically Relevant Family Medicine Research: Board Certification Updates
  • Artificial Intelligence and Family Medicine
  • Google Scholar

More in this TOC Section

  • Identifying and Addressing Social Determinants of Health with an Electronic Health Record
  • Integrating Adverse Childhood Experiences and Social Risks Screening in Adult Primary Care
  • A Pilot Comparison of Clinical Data Collection Methods Using Paper, Electronic Health Record Prompt, and a Smartphone Application
Show more Original Research

Similar Articles

Keywords

  • Continuing Education
  • Family Medicine
  • Medical Education

Navigate

  • Home
  • Current Issue
  • Past Issues

Authors & Reviewers

  • Info For Authors
  • Info For Reviewers
  • Submit A Manuscript/Review

Other Services

  • Get Email Alerts
  • Classifieds
  • Reprints and Permissions

Other Resources

  • Forms
  • Contact Us
  • ABFM News

© 2025 American Board of Family Medicine

Powered by HighWire