Abstract
Introduction: Cluttered documentation may contribute adversely to physician readers’ cognitive load, inadvertently obscuring high-value information with less valuable information. We test the hypothesis that a novel, collapsible assessment, plan, subjective, objective (APSO) note design would be faster, more accurate, and more satisfying to use than a conventional electronic health record (EHR) subjective, objective, assessment, plan (SOAP) note for finding information needed for ambulatory chronic disease care.
Methods: We iteratively developed physician clinic note prototypes with features designed to emphasize more important information and de-emphasize less clinically relevant information. Sixteen primary care physicians reviewed comparable clinic notes with the 4 note styles presented in random order to find key information in the notes during timed tasks. The 4 note styles were denoted A (traditional SOAP note), B (2-column APSO note), C (collapsible APSO note), and D (2-column collapsible APSO note). The 4 unique note styles were designed to have equal amounts of information in each section. We simulated their utility for clinical practice by imposing time limits and by interrupting 1 of the tasks with a typical clinical interruption. For each session, we recorded audio, computer-screen activity, eye tracking, and made field notes. We obtained usability ratings (System Usability Scale), new feature preference ratings, and performed semistructured post-task interviews with subsequent content analysis. We compared the effectiveness of the 4 note styles by measuring time on task, task success (accuracy), and effort as measured by NASA Task Load Index.
Results: Note styles C and D were significantly faster than A and B for the Review of Systems and Physical Examination tasks, as we expected. Notes B and C had the best success (finding requested data) scores. Users strongly endorsed all the new note features incorporated into the new note prototypes. Previously expressed concerns about temporarily hiding parts of the note (using the accordion display design pattern) were allayed. Usability ratings for note A were worst but comparably better for note styles B, C, and D.
Discussion: The new APSO note prototypes performed better than the traditional SOAP note format for speed, task success (accuracy), and usability for physician users acquiring information needed for a typical chronic disease visit in primary care. Moving Assessment and Plan to the top is 1 easily accomplished feature change. Innovative documentation displays of EHR data can safely improve information display without eliminating data from the record of the visit.
Physician clinic visit notes can be long and challenging to read. One reason for this is that currently these notes are designed to meet more than just the needs of the clinical care team. Additional stakeholders in the design of office notes are auditors, attorneys, billing staff, and insurance companies. Their tasks and information needs are often tangential to physicians’ patient-centered information needs.1
Previous work evaluating physician note content has focused on a variety of issues, including copy-paste of material from previous notes, automatic import of electronic health record (EHR) data, use of structured versus unstructured narrative data, and information overload from verbiage added to reduce risk exposure or to enhance quality measure reporting.2⇓⇓⇓–6 Beasley et al7 report that information chaos, composed of information overload, information underload, information scatter, information conflict, and erroneous information, is common in primary care. Using a human factors approach, they propose a framework of information chaos and its effect on physician mental workload and situational awareness, which affects physicians’ problem-solving and decision-making capacity. “Information chaos is more than inconvenient, annoying, and frustrating; there are operational implications that can impair physician performance, increase workload, and reduce the safety and quality of care delivered.”7 They call for improved display techniques to present the data needed at the time of the patient visit. Note quality has been evaluated by a variety of measures, including completeness and correctness; 22- and 9-item validated scales (Physician Documentation Quality Instruments [PDQI and PDQI-9]); and a multi-stakeholder evaluation of quality characteristics, desired elements, and system supports to improve note quality.8⇓⇓–11 A physician satisfaction study sponsored by the American Medical Association found a number of dissatisfactions with EHR use, among them “degraded clinical documentation (as a consequence of template-based notes).”12
Lin et al13 compared user satisfaction among physicians at an academic health center that implemented assessment, plan, subjective, objective (APSO) notes institution-wide for authoring and creating APSO versus standard subjective, objective, assessment, plan (SOAP) notes and compared their ability to find data in both note types. They found physicians favored the change to APSO both as authors and as readers. Physicians reported the APSO notes were faster and easier to use, although objective speed measurements did not detect a difference.13 Various editorials and blog posts have endorsed moving the Assessment and Plan to the top of physician notes.14,15 Brown et al16 used an eye-tracking device to assess the visual attention patterns of 10 hospitalists as they read 3 electronic notes. They found that of all the sections of inpatient hospitalist progress notes, physicians read the Assessment and Plan preferentially and most closely; indeed, over 90% of information conveyed during a verbal handoff came from the Assessment and Plan.16
Our earlier examination of information needs of physicians, patients, and billing staff led us to develop a new model of physician office notes.17,18 We held focus groups showing physicians a hide/reveal feature that temporarily hides information needed only by nonphysician stakeholders. This feature provoked a dichotomy of reactions to the relative value and risks of such an approach. Some physicians endorsed the feature, whereas others feared that hiding some information would create a safety hazard, provoking the question, “Does hiding less-relevant information help or hinder physicians in their information-gathering task for patient care?”
To answer this important question, we built interactive prototypes that would allow us to user-test the hypothesis that the new note design would be faster, more accurate, and more satisfying to use than a standard note format for finding information needed for ambulatory chronic disease care. We based the designs on human factors as well as iterative user feedback from previous design-review sessions.
We aimed to make the most desired information more visually prominent, whereas subduing information considered less clinically relevant to reduce cognitive load, improve perceived ease of use, and improve speed and accuracy of information retrieval. Features that make desired information more prominent include the following:
Adding visual emphasis to abnormal data values with bold type and color to take advantage of preattentive visual processing19,
Adding abnormal data elements to the collapsed accordion header to reduce scrolling and remove visual distraction from normal text, and
Using a collapsible accordion header to hide some information considered less relevant (but retrievable with a single click) to reduce necessary reading and scrolling.
The interactive prototype incorporated the features in our model, moving the most sought-after sections of the note (Assessment and Plan; History of Present Illness) to the top of the note, while allowing the dynamic hiding/revealing of unwanted or less desired sections of the note. Those hidden sections have headers that still display key summary information while hiding more verbose and cluttered portions of each section.
Methods
We designed 4 options for presentation of information in physician clinic notes. We then used comparative usability testing to examine time to acquire information and accuracy of information acquired with the use of different formats of physician clinic notes. We also assessed cognitive load and perceived usability and usefulness of the different note formats.
Note Design
We iteratively developed physician clinic note prototypes with features designed to emphasize vital information and de-emphasize less important information, with level of importance identified by clinicians in a previous study.17 We utilized 4 different visual display models (listed in Table 1 and pictured in Figure 1) for formatting the progress notes. We developed clinical content for the 4 simulated ambulatory physician progress notes for a patient with chronic disease. We designed the notes to be of similar complexity but with different clinical content to test 4 different visual display models. The notes were written by 2 authors (JB, SP) and were reviewed by 3 of the authors (JB, SP, RK). Each note included 4 problems, 1 of which was diabetes, 1 of which was an acute illness, and 2 of which chronic problems.
Before testing, the researchers conducted design reviews with practicing physicians to ensure that the 4 clinical designs used in the study were free from coding mistakes or design oversights. Designs were typical of an EHR note without being specific to any particular EHR.
For each collapsible note design (notes C and D), any abnormal elements from Review of Systems, Physical Examination, and Results are listed by numeric count of abnormal terms, along with the total count of organ systems (billing bullet points) for the Review of Systems and Physical Examination (see Figure 2). Abnormal terms have added visual emphasis (eg, bold type and color) in the expanded text. Abnormal text elements are also listed in the section header adjacent to the counts and truncated by an ellipsis when the characters exceed the capacity of the header to display that entire content in 1 line. Collapsed content can then be expanded to reveal the full text content of that section (see Figure 3).
Sample
We recruited sixteen physicians who practice in the ambulatory clinics of the University of Missouri. We sampled for heterogeneity in sex, years since residency graduation, and years using EHR. Family medicine attending physicians formed the sample majority; smaller numbers of internal medicine attending physicians also participated.
Testing
Each session was performed on a Windows laptop using Morae software (TechSmith Corporation, Okemos, MI) to record audio, video, on-screen activity, and keyboard/mouse input, and a Tobii X2-60 eye-tracker (Danderyd, Sweden) to record eye movement patterns. Participants evaluated 4 different model notes from the role of a physician preparing to see a patient for the first time using a colleague’s notes. They completed a series of tasks with each note sequentially, and after reviewing each note, provided an assessment of that note.
Each of the 4 visual display models was evaluated with 4 separate simulated physician progress notes with similar complexity but different clinical content. The order of exposure to the 4 notes changed with each user. To simulate the stress of clinical practice, researchers gave participants a limited amount of time to complete each task. Researchers also interrupted participants during the first task with questions common in actual clinical practice (eg, “The next patient has back pain and thinks she has a UTI. Should I do a urinalysis?”) to simulate the increased cognitive load of typical clinical-setting interruptions.
Participating physicians viewed 1 note at a time and engaged in a sequence of tasks 1 at a time, addressing the following 4 tasks for each note:
What were the diagnoses, the medications ordered or changed, and the labs ordered at the last visit? (30 seconds)
Does the patient have any abnormal Review of Systems findings from the last visit? If so, what were they? (20 seconds)
Does the patient have any abnormal Physical Examination findings from the last visit? If so, what were they? (25 seconds)
Does the patient have any abnormal Results from the last visit? If so, what were they? (15 seconds)
The eye-tracking data and audio-recorded information was used to determine time to first fixation for sections relevant to each question, total time in each section, and the accuracy of note review relating to each task question.
After the participants completed each note review, they answered 3 additional sets of questions before proceeding to the next note. First, they were asked to complete the NASA Task Load Index (NASA-TLX 7) consisting of 7 Likert-scale questions to determine task difficulty for each design.20 The second set of questions was composed of the System Usability Scale (SUS).21 These 10 questions are often used in evaluating system usability. The final set of questions asked participants to mark which sections of the physician’s note they would want to be open by default for every patient if collapsible sections were available and which features of the design they thought were the most useful. Participant sessions typically lasted 40 to 60 minutes. The University of Missouri Institutional Review Board reviewed and approved this study.
Analysis
We compared time on task for notes A, B, C, and D using a Cox proportional hazards model to test whether task completion times differed by note model, task, and a note model-task interaction term.
Because each task was time-limited to mimic the sense of the urgency in actual clinical practice, there were a few physicians who did not complete a task. This then created a need to deal statistically with both the skewed distribution common with timed data and the time for noncompleters; we decided to use median as the summary statistic to account for skew and to censor noncompleters for the time-to-completion analyses.
Task success for items detected is expressed as percent correct. Task success for the different note models was compared using a logistic regression model with note A as the reference category. In the logistic model, we used “participant” as a random effect to account for the repeated measures of tasks taken from the same participant. We used paired t-tests to compare the means of notes B, C, and D to note A for scores from the NASA-TLX and the raw scores of the SUS. We also conducted a 1-way Analysis of Variance (ANOVA) comparing note A NASA-TLX and SUS scores to those for notes B, C, and D with “participant” as a random effect.
Results
Sample
Of the sixteen physicians who participated, 56% were female. Most of the participants were faculty (81%), 13% of whom were internal medicine physicians. There was a range of time in practice since graduation, but 94% reported over 5 years’ experience using the EHR.
Task Time
Task time comparisons are influenced largely by the design features of each note section. The A&P (Assessment and Plan) design is the same in all 4 note models, and the median task times were not statistically different between any of the note models. The Results section is by far the simplest, containing the least content and the fewest visual distractions. However, the Review of Systems and Physical Examination include significant redesign of note models C (collapsible APSO) and D (2-column collapsible APSO), with several features that make the target content more visible to the subject (Figures 2 and 3). Note models C and D containing the collapsible accordion elements have substantially faster median task times of 13.4 and 12.9 seconds for the Review of Systems, versus 20.0 and 18.7 seconds for note models A (traditional SOAP) and B (2-column APSO), respectively. Note models C and D have much faster median times for the Physical Examination section as well, 12.2 and 12. 6 seconds versus 23.0 and 22. 0 seconds for note models A and B (Table 2), respectively. These comparisons are significant in the Cox proportional hazards regression model as shown in Table 3.
Task Success
By design, the task success (accuracy) rates for all note sections were less than perfect in our time-limited information retrieval tasks. The lowest task success rate (63% to 77%) was observed in the Assessment & Plan section (the most complex task, one that included a user interruption), while the greatest task success rate was observed in the Results section (85% to 97%), the simplest task containing the fewest visual distractions.
Regarding note model comparisons, we unfortunately discovered a content error in the Physical Examination section of note D (2-column collapsible APSO) that caused note D to perform disproportionately worse in task success rates, both in comparison with the other note models and across note sections within note D (See Table 4). To adjust for the impact of the erroneous item, we recalculated the note task success rates after removing the Physical Examination section from each of the 4 note models. After adjustment, compared with note A, physicians had increased odds of task success with notes B (odds ratio [OR], 1.66; 95% CI, 1.05 to 2.63) and C (OR, 1.73; 95% CI, 1.10 to 2.72); odds of task success were not statistically different with note D (OR, 1.37; 95% CI, 0.87 to 2.15) (see Table 5).
Task Load
We measured task load by employing the NASA Task Load Index. As expected, none of the notes posed a significant physical demand (Table 6, Physical). Compared with note A, note B (2-column APSO) differed on the frustration subscale (3.31 vs 2.69; P = .01), note C (collapsible APSO) differed on the performance subscale (3.81 vs 2.69; P = .048), and note D (2-column collapsible APSO) differed on the effort subscale (3.94 vs 2.69; P = .043). However, in ANOVA with participant as a random effect, none of the subscales differed significantly by note type.
Usability (SUS)
Our physician test subjects found note A the least usable compared with the other note models, based on the SUS (Table 6). In paired t-tests, SUS scores for note A were significantly lower than for note B (58.50 vs 74.83; P = .007), note C (58.50 vs 81.83; P = .005), and note D (58.50 vs 77.50; P = .009). In 1-way ANOVA with “participant” as a random effect, there was a significant mean difference between notes A and C (difference in means = 23.3; P = .01).
Feature Preferences
Users strongly endorsed all the new note features incorporated into the new note prototypes. Previously expressed concerns16 about temporarily hiding parts of the note (using the accordion display design pattern) were allayed. A large majority of subjects preferred to display these sections open by default in the expanded position in the collapsible note models (C & D): Chief Complaint, Assessment & Plan, Present Illness, Problem List, Medication List.
Discussion
Our study shows that collapsible accordion design notes may reduce physician time spent reviewing Review of Systems and Physical Examination sections of the notes. Physicians in our study completed increased proportion of tasks correctly with 2-column APSO and the collapsible APSO note designs. The SOAP note was perceived to be the least usable and the collapsible accordion design the most usable by our study participants. There were no differences in task difficulty between note designs. Lin et al13 reported comparable note reading speed and accuracy with APSO notes compared with SOAP notes.
Physicians face a dilemma in balancing competing values in their documentation work, both from a consumption and from a production perspective. Finding that balance has not been easy. Methods that make production easier, such as copying forward text from a previous note, automatically importing lists and labs, and using template text designed to satisfy nonclinician stakeholder demands, all add to the information overload and visual clutter when the time comes to consume that same note. Our 3 new model note styles address the needs of primary care physicians identified in our earlier studies17,18 and incorporate a number of human factors missing from current note designs. By far, the technologically simplest change is to move Assessment & Plan to the top of the note. This can readily be done with almost any existing EHR, either by the vendor or client information technology team or even by individual clinician users.
We demonstrated alternative methods of adding visual emphasis to abnormal list items in the Review of Systems, Physical Examination, and Results sections. The simplest is to add emphasis employing colored text and bold font to abnormal items. A second method was to separately list the abnormal item count and total organ system count, and to display the abnormal text in the section header bar itself (Figure 2).
In addition, we used the collapse/expand feature from the accordion-display design pattern. Clicking a section header alternately expands or collapses the content associated with the header. This feature is technically feasible with modern text display methods such as XHTML but much less feasible with older rich text format (RTF) or PDF displays.
There are significant technical and cultural challenges to enabling the consistent display of abnormal items within the Review of Systems, Physical Examination, and Results. Abnormal values may be identified by several different mechanisms. When words are selected as discrete data elements when physicians create the note (eg, clicking on the word “fever”), it is easy and measurable if each item carries a designation as abnormal or not. Typed or transcribed text is more problematic to identify as abnormal. Natural language processing can be employed after the note is completed but adds cost, complexity, and ambiguity. Abnormal lab values can be expected to arrive with a flag denoting abnormality (eg, a serum potassium result will have a flag for high, low, or critical if outside the normal range). If the labels of abnormality are not consistently reliable, users will not trust the information display, and its utility will suffer dramatically. Our erroneous note (note model D the with the faulty Physical Examination values) vividly illustrated that erosion of trust once a test subject discovered the discrepancy between the header summary and the remainder of the section hidden by default.
Several limitations should be acknowledged. Our sample size was small, so some differences between note performance measures may not have been detected. Although we purposefully sampled both family medicine and general internal medicine physicians with different levels of experience as well as both attending and resident physicians, our study was limited to a single academic center. Our model notes were not actual notes from a current EHR, but rather were HTML page displays designed to be representative of the typical EHR note in their clinical content.
Conclusion
Starting with an understanding of physicians’ information needs for a primary care chronic disease visit and using human factors design principles, we developed innovative note models incorporating an array of display improvements. The 3 new note models offered equivalent or improved speed, accuracy, and user satisfaction over the standard note. Assessment-plan-subjective-objective (APSO) notes would be simple and inexpensive to implement for most organizations, as would adding emphasis to abnormal elements with selective use of color and typography. Two of these models hide and reveal note content strategically resulting in reduced information chaos by reducing information overload and information scatter. Using these same human factors principles in other aspects of information transfer, such as in computerized order entry or in clinical decision support, might reduce clinicians’ burden further, and deserves systematic exploration. The next step is to incorporate as many features as feasible into the local implementation of our commercial EHR.
Acknowledgments
The authors thank Clayton Hicklin at the Tiger Institute for Health Innovation for development of the HTML interactive prototypes, Kenny Haggerty, Neeley Current, and Fatih Demir PhD at the Information Experience Lab for assistance with data collection and analysis. We thank Gaia Guirl-Stearley for critical review of the final manuscript.
Appendix
Appendix A. Electronic Health Record Note Model A: Subjective-Objective-Assessment-Plan (SOAP) Note
Appendix B. Electronic Health Record Note Model B: 2-column Assessment-Plan-Subjective-Objective (APSO) Note
Appendix C-1. Electronic Health Record Note Model C: Collapsible Assessment-Plan-Subjective-Objective (APSO) note, shown collapsed
Appendix C-2. Electronic Health Record Note Model C: Collapsible Assessment-Plan-Subjective-Objective (APSO) note, shown expanded
Appendix D-1. Electronic Health Record Note Model D: 2-column collapsible Assessment-Plan-Subjective-Objective (APSO) note, shown collapsed
Appendix D-2. Electronic Health Record Note Model D: 2-column collapsible Assessment-Plan-Subjective-Objective (APSO) note, shown expanded
Notes
This article was externally peer reviewed.
Funding: This work was supported by a faculty development grant from the Department of Family & Community Medicine, University of Missouri.
Conflict of interest: none declared.
To see this article online, please go to: http://jabfm.org/content/30/6/691.full.
- Received for publication February 6, 2017.
- Revision received May 25, 2017.
- Accepted for publication May 28, 2017.