ORIGINAL RESEARCH
Lillie D. Williamson, PhD; Hailie C. Hayes, BS; Peter-Jon C. Williams, BS; Wanda Jirau-Rosaly, MD; Christy J.W. Ledford, PhD, FACH
Corresponding Author: Hailie C. Hayes, BS; Department of Family and Community Medicine, Medical College of Georgia at Augusta University.
Email: hahayes@augusta.edu
DOI: 10.3122/jabfm.2025.250296R1
Keywords: Academic Medical Centers, Culturally Sensitive Research, Data Accuracy, Electronic Health Records, Ethnicity, Minority Groups, Patient-Oriented Research, Research Design, Self Report
Dates: Submitted: 07-31-2025; Revised: 12-04-2025; Accepted: 12-15-2025
Status: In Press.
INTRODUCTION: Accurate and complete measurement of race and ethnicity is essential in clinical research and practice but often undermined by inconsistent definitions and flawed data sources. Electronic Health Records (EHR) regularly contain race and ethnicity information, yet some studies reveal errors, particularly for individuals from minority groups. Our study evaluates how data quality varies across three methods of collecting race and ethnicity information: EHR-extracted demographics, self-reported information via closed-ended questions, and self-reported information via open-ended questions.
METHODS: We conducted a secondary analysis of four datasets from an academic medical center in the Southeastern U.S. Each dataset included two measures of race and ethnicity, enabling cross-tabulation to evaluate alignment and missingness. Datasets varied by administration mode and data collection method, including oral and written surveys with open- and closed-ended formats, and EHR-extracted values.
RESULTS: Among 337 participants, self-reported data were generally more complete and sometimes more accurate than EHR data. Open-ended responses improved data richness and allowed for more personal identities to be reported (e.g., “Vietnamese,” “Haitian-American”). These responses helped clarify identities obscured by limited preset categories and facilitated better representation of Middle Eastern and North African participants.
DISCUSSION: The way race and ethnicity questions are asked impacts both accuracy and completeness of data. Open-ended self-reporting allows for greater specificity and personal identity avowal. Findings urge family medicine researchers and clinicians to critically reflect on how and why race and ethnicity data are collected, and to prioritize participant-defined identity in both research and care.

