Abstract
Background: Although many health care organizations require routine pain screening (eg, “5th vital sign”) with the 0 to 10 numeric rating scale (NRS), its accuracy has been questioned; here we evaluated its accuracy and potential causes for error.
Methods: We randomly surveyed veterans and reviewed their charts after outpatient encounters at 2 hospitals and 6 affiliated community sites. Using correlation and receiver operating characteristic analysis, we compared the routinely measured “5th vital sign” (nurse-recorded NRS) with a research-administered NRS (research-recorded NRS) and the Brief Pain Inventory (BPI).
Results: During 528 encounters, nurse-recorded NRS and research-recorded NRS correlated moderately (r = 0.627), as did nurse-recorded NRS and BPI severity scales (r = 0.613 for pain during the last 24 hours and r = 0.588 for pain during the past week). Correlation with BPI interference was lower (r = 0.409). However, the research-recorded NRS correlated substantially with the BPI severity during the past 24 hours (r = 0.870) and BPI severity during the last week (r = 0.840). Receiver operating characteristic analysis showed similar results. Of the 98% of cases where a numeric score was recorded, 51% of patients reported their pain was rated qualitatively, rather than with a 0 to 10 scale, a practice associated with pain underestimation (χ2 = 64.04, P < .001).
Conclusion: Though moderately accurate, the outpatient “5th vital sign” is less accurate than under ideal circumstances. Personalizing assessment is a common clinical practice but may affect the performance of research tools such as the NRS adopted for routine use.
Although it ranks among the most difficult and common problem in primary care, pain often goes undetected even after patients seek medical care.1,2 In an effort to assure better pain management, many health care systems in the United States, including the Department of Veterans Affairs (VA), have required routine outpatient screening for pain.3 Nationally, the Joint Commission emphasizes routine pain assessment, and the California legislature (AB 791) has required since 1999 that all licensed health care facilities assess pain with routine vital signs.4,5 This approach has also been emphasized internationally.6 This routine emphasis puts pain as a “5th vital sign,” on a similar level with blood pressure and pulse measurement.
The most commonly used method to assess pain as a 5th vital sign is the 0 to 10 pain numeric rating scale (NRS). The NRS has robust psychometric properties in research applications,7 but how the NRS performs in routine outpatient practice is less certain. Of 2 single-site studies conducted in primary care, a small effectiveness study showed that the NRS may be only moderately sensitive and specific.8 Another small “before and after” comparison study of the 5th vital sign did not find changes in clinician processes of care related to pain management.9
Clinicians may also object to pain screening because it competes with many other important tasks. Screening for affective disorders, tobacco and other substance use disorders, and implementing routine preventive care are increasingly common, but as these tasks proliferate it becomes more and more necessary to prioritize them. A finding with the NRS that pain is present calls for additional action from the physician and may compete with other patient and physician priorities.
Why has widespread implementation of the pain NRS not been as successful as hoped? One explanation is the need for a better understanding of the NRS's accuracy.10,11 Clinicians find screening tests useful to the extent that they are accurate and reflect the patient's recent, clinical state. Presently, clinicians are told that NRS ratings of 5 or higher are consistent with moderate or greater pain, and that pain at this level or higher should be further assessed and treated if possible.12 If these ratings are inaccurate, then other pain screening methods may be needed.
A study by Krebs and colleagues8 suggested that pain as the 5th vital sign is only moderately accurate when compared with a longer battery of pain questions; however, it could not distinguish inaccuracy because of intrinsic characteristics of the NRS from those caused by difficulties implementing the NRS in practice. To understand sources of potential inaccuracy, we studied pain as the 5th vital sign in 8 VA hospital- and community-based outpatient clinics in a large 3-county area. We evaluated pairwise comparisons of 3 pain ratings. We compared pain ratings gathered during vital sign intake (nurse-recorded NRS) to the identical pain scale applied under research conditions (research-recorded NRS) and a well-accepted, multidimensional standard measure of pain, the Brief Pain Inventory (BPI).13,14
We aimed to evaluate the accuracy of routinely obtained pain ratings and to explore potential causes of inaccuracy in the NRS. We hypothesized that the level of agreement with a gold standard of the NRS as administered routinely by a nurse would be lower than the NRS compared within the same interview to the gold standard BPI.
Methods
Patients
The Helping Veterans Experience Less Pain Study (HELP-Vets) enrolled a random outpatient visit-based sample of patients and their providers. HELP-Vets surveyed patients and their nursing and treatment providers (eg, physician, nurse practitioner, and physician assistant) from March 2006 to June 2007 at clinics at 2 hospitals and 6 affiliated community sites in 3 large counties (Los Angeles, Ventura, and Orange) in Veterans Integrated Service Network 22. Of the clinics based at those sites, 5 of 19 are oncology and cardiology clinics, and 14 of 19 exclusively offer primary care services. The HELP-Vets study had 2 components, only one of which is addressed in the current analysis. From March 2006 to March 2007 we surveyed a visit-based sample proportional to total visits at the clinic sites during the previous year, and from April to June 2007 we supplemented that with a convenience sample of cardiology outpatients to evaluate pain in cardiac conditions. This analysis focuses on the 528 patients in our proportional visit-based sample.
Research assistants approached patients leaving the clinic after their provider visits and queried them about eligibility. Eligible patients had vitals taken and had an examination by a consenting treatment provider that day in primary care, women's health, urgent care, cardiology, or oncology. They had to pass a brief cognitive screening test,15 have intact hearing, speak English, have not participated previously, and agree to have their medical records reviewed. To sample adequate patients with painful health conditions yet allow for inclusion of healthy patients, we selected all those who self-reported their health as fair or poor and selected every other patient among those who self-reported their health as excellent, very good, or good. Although impractical to formally evaluate patients more than once, intraclass correlation was low (0.06) among interviewers, supporting that most variation was subject-dependent.
After consent was given, the patient was immediately interviewed. The time between the routine vital sign assessment by the intake nurse and the interview was typically approximately 45 minutes. Chart review data, including ratings of the patient's pain as assessed by the nurse at intake, were abstracted from the VA's electronic health record and linked to the patient's self-report interview.
A total of 6138 patients in clinic waiting rooms were approached and screened for eligibility (Figure 1). Of those, 862 refused screening and 4337 were ineligible (2265 had not yet had their vital signs taken; 942 were not planning to visit a treatment provider that day; 310 were not visiting participating clinics; 103 were visiting providers who declined to participate; 171 exhibited behavioral problems or failed the cognitive screening test and thus were unable to consent, 49 were hearing impaired, and 61 had previously participated). An additional 436 patients of the 4337 who reported good or excellent health on the screening test were purposely not sampled to provide an approximately equal distribution with those who reported fair or poor health. Among the remaining 939 eligible patients, 650 (69.2%) completed the interview. This analysis draws on the 528 patients from the proportional visit-based sample, 18 (3.4%) of whom did not have a pain rating recorded and one of whom did not answer the research-recorded NRS.
Measures
Pain Screening at Time of Vital Sign Measurement (Nurse-recorded NRS)
Nurse-recorded pain rating using the NRS was widely used in the study clinics for several years before the VA national pain screening policy was implemented in 2003.3 Routinely, VA clinic staff ask patients to rate the intensity of their current pain” “on a scale of 0 to 10, where 0 means no pain and 10 equals the worst possible pain, what is your current pain level?” The nurse then enters the NRS score into the electronic health record.
Pain Rating Reported on the Research Interview (Research-Recorded NRS)
At the beginning of the interview, patients were asked by the research assistant to rate their current pain, using language identical to that used in the nurse screening tool.
Brief Pain Inventory (Research BPI)
During the interview, the research assistant also administered the BPI, which included 4 items for which patients rate their “worst pain,” “least pain,” and “pain on average” during the last week as well as “pain right now.” Scales for each item range from 0 to 10, and the total score averaged the 4 items. We also asked about worst and least pain during the last 24 hours and created a 24-hour severity score by averaging these 2 items along with the rating of “pain right now.”13,14
The BPI interference score used the same 0 to 10 scales and asked patients to rate how much during the past week pain had interfered with 7 activities: general activity, mood, walking ability (including ability to transport yourself in a wheelchair or scooter), normal work activities (including both work outside the house and housework), interpersonal relations, sleep, and enjoyment of life. A total score averaged these 7 items.
Based on previous research, for the BPI severity score, cutpoints of ≥5 as well as ≥7 were used to indicate moderate and more severe pain. These cutpoints were established for the BPI severity scale in both general population and clinical samples. For sensitivity analysis, we used cutpoints of ≥4 for moderate pain.16–19
Other Measures
Other interview measures included demographics (age, sex, ethnicity); a single item about the general rating of health (from excellent to poor); a single item about pain treatments used during the last week; and patient reports of whether or not the nurse asked about pain and used a 0 to 10 rating scale. The Patient Health Questionnaire-2 and medical problems gathered from chart review were used to score the Seattle Index of Comorbidity and to provide indicators of the presence of clinically relevant depression, anxiety, and cardiovascular and musculoskeletal problems.20,21
Analysis
To shed light on sources of disagreement in pain ratings, we compared nurse to researcher administration of the same tool (eg, nurse-recorded NRS to research-recorded NRS) with nurse-researcher and researcher-researcher administration of different tools (eg, NRS to BPI). Agreement between the nurse (nurse-recorded NRS) and the researcher documentation of pain (research-recorded NRS) was assessed by the interclass correlation (referred to subsequently as “correlation”). Correlations among research ratings (research-recorded NRS and BPI) were used to distinguish tool from measurement differences. In addition, to determine the sensitivity and specificity of cutpoints on the nurse-recorded NRS to clinically significant pain, we fit receiver operator characteristic curves for the nurse-recorded NRS compared with the reference standards for moderate and more severe pain cutpoints on the BPI severity (last week and past 24 hours) and research-recorded NRS scales and calculated the area under the curve (AUC) for each comparison. The AUC reflects overall accuracy between measures, where 0.5 indicates an inaccurate test and 1.0 indicates a perfect, 100% accurate test.22
Informal qualitative preparatory work suggested several sources of variation that we evaluated. We examined whether research-recorded and nurse-recorded NRS scale reliability varied by patient report of whether or not the nurse asked if the patient had any pain using a 0 to 10 rating scale when vital signs were taken. The role of interview timing and intrinsic variation in pain intensity was examined by evaluating the association between patient-reported change in pain intensity since arrival in the clinic during the research interview and the nurse-recorded NRS to research-recorded NRS difference. This study was approved by the Institutional Review Board of the VA Greater Los Angeles Health Care System.
Results
Patient descriptive statistics are presented in Table 1. Consistent with national VA users, our sample was older (mean age, 62 years) and more likely to be male (94%) than the US national average. Ethnicity was mixed, with 50% white, 24% black, and 16% Hispanic or multiple ethnicities including Hispanic. As expected, because of intentional oversampling of patients with fair or poor health, many patients were ill. Substantial numbers had depression, anxiety, or posttraumatic stress disorder (44%), whereas 12% had cancer, 28% had cardiovascular disease, and 45% had musculoskeletal disease. The mean Seattle Comorbidity score was 5.6 (range, 0–15).20
Pain ratings varied between the nurse-recorded NRS and the research-recorded NRS. Only 192 (38%) reported pain (eg, NRS >0) on the nurse-recorded NRS whereas 311 (61%) patients reported some pain on the research-recorded NRS. Among those veterans experiencing pain, the mean nurse-recorded NRS score was 5.9 and the mean research-recorded NRS pain score was 5.1, consistent with nurses’ routine screening detecting pain in the most severely affected patients.
The modest correspondence between the 2 ratings as well as with the BPI was substantiated in further analyses. The correlation between the nurse-recorded NRS and research-recorded NRS was 0.653 (Table 2); between the nurse-recorded NRS and BPI severity scales it was 0.633 for the 24-hour version and 0.612 for the version that referenced pain during the last week, which suggests only moderate to fair accuracy. As expected, because it measures somewhat different constructs (eg, severity and interferences), the correlation with BPI interference was even lower (0.417). In contrast, the research-recorded NRS correlated substantially with the ratings from the BPI severity during the past 24 hours (0.892) and BPI severity during the last week (0.847).
The area under the receiver operator characteristic curve for the nurse-recorded NRS compared with the research-recorded NRS was 0.80 for a pain rating score cutpoint of 5 and 0.80 for a cutpoint of 7. Similar results for AUC for both cutpoints were found for BPI severity during the last week (0.78 and 0.80), BPI severity during the past 24 hours (0.78 and 0.81), and BPI interference (0.69 and 0.65). AUC results for a cutpoint of 4 were almost identical to those results with a cutpoint of 5.
We explored the nature of the discrepancy between nurse-recorded NRS and research-recorded NRS by examining the direction and magnitude of differences among pain scores (Table 3). Though nurse and research ratings agreed 55% of the time, almost 20% of patients rated their pain as 3 or more points higher (on the 0–10 scale). Approximately one-third of ratings differed by 2 or more points, irrespective of the direction of difference. Nurse ratings underestimated patient pain 33% of the time and overestimated pain 12% of the time.
Discrepancies in pain ratings could not be explained by nurses failing to ask about pain during routine vital sign intake because patients reported that nurses asked them about pain during 92% of interviews. However, patients further reported that nurses used a 0 to 10 scale to quantify pain in only half of encounters (66% among those with pain greater than 0). Nurses were more likely to underestimate pain, among those with pain, if they had not used the 0 to 10 scale (χ2 = 100.6; P < .001). The nurse and research rating agreed in 95% of cases where the patient reported that the nurse used an NRS; when there was a discrepancy between the nurse and research ratings, the patients reported the nurse had not used an NRS in 45% of those cases.
We evaluated whether the patient's pain had changed from the time of the nurse assessment to the research interview. Agreement between the nurse-recorded NRS and the research-recorded NRS was somewhat less when the patient's pain changed, especially when it worsened between the nurse rating and the research interview; this accounts for part of the nurses’ underestimation of pain. Of patients reporting pain, 80% reported that their pain stayed the same, and agreement between NRS ratings was within 1 point for 70% (versus 60% of 42 patients whose pain improved and only 48% of 36 patients whose pain worsened). Among these patients nurses underestimated pain by 2 or more points for 25% and overestimated for only 6%.
Discussion
Our multisite study of paitents in hospital- and community-based outpatient care found only moderate accuracy of pain as the “5th vital sign” routinely obtained by nurses as compared with research administration of the identical measure. However, a research-administered 5th vital sign was highly accurate compared with the gold standard BPI, especially with the severity subscale. We found that lower accuracy of the nurse-administered 5th vital sign was associated with the use of informal qualitative screening instead of adherence to standardized quantitative NRS. This informal approach occurred in approximately half of encounters. The NRS, which only measures severity underestimated pain relative to the overall BPI (which also captures interference), and informal rating procedures were associated with underestimation.
Our findings extend the results of a recent single-site primary care-based study that also found moderate accuracy for the 5th vital sign, but that was not designed to distinguish inherent NRS inaccuracy from implementation.8 These findings suggest NRS is indeed a reasonable tool for use in routine pain screening, consistent with previous literature that evaluated it in research settings.7 Still, to ensure its accuracy in practice, more research is needed to understand the barriers that nurses face when applying the NRS.
However, another recent study also highlighted the lack of effect to implement routine primary care pain screening on clinical care,9 suggesting that screening needs to be better integrated with treatment and follow-up. A recent surgical case review raised concerns for pain screening based on overmedication of trauma patients.21 Because overestimation of pain is rare when the NRS is administered as intended, medication errors bolster the need to improve pain management competence, not abandon pain screening.19 Our findings indicate efforts to improve routine pain management can confidently include NRS as part of that strategy. However, especially because there are many competing demands, pain screening algorithms should provide actionable information; perhaps other simple questions about pain, including its chronicity and impact, would improve its clinical relevance.
Nurses using pain screening as the 5th vital sign may not perceive the NRS to have specific meaning and, as all clinicians do, they personalize tools to suit their own practice styles or adopt informal approaches to save time. In other words, it would be common to ask, “Mr. Jones, is your sore knee today feeling good?” than to use a 0 to 10 scale to evaluate pain, especially if a personal, informal inquiry suggests that significant pain is absent. This same phenomenon probably applies to depression, substance abuse, or many types of screening for which clinicians initially apply formal criteria and methods.
Indeed, we found that use of the NRS was more common among patients who reported pain (66% of cases) than among those without pain, suggesting that nurses use the NRS as a second-stage screening tool. If that adaptation is inevitable, then perhaps the NRS should be used in a self-administered rather than nurse-administered manner (ie, waiting room kiosks). Self-administered techniques would reduce variability in administration caused by item adaptation and interrater effects.
Although the current analysis cannot shed light on why nursing staff do not use the NRS as intended, a number of factors may be at work. Previous research suggests that more experienced nurses are more likely to underestimate pain.20,21 Some staff may have been employed before the institution of routine pain screening or their training may reflect less emphasis on pain, compared with those more recently trained. Previous research also showed that routine pain screening activity declined when screening practices were not monitored, and regular feedback and clinician mentoring to sustain appropriate screening behavior may be needed.
Whatever approach is taken, variability of using these tools in practice needs to be better understood and providers would benefit from ongoing support in their pain assessment practices. Depending on the factors that may influence adherence to the use of the NRS, specific approaches that may be helpful include training, mentoring, and monitoring. With regard to monitoring, recent research has demonstrated the usefulness of a variety of quality indicators in monitoring pain, and regular feedback of performance information may provide a useful approach to fostering better routine symptoms management.23
Limitations of our study include the focus on VA sites only, although our approach provides insight into the challenges of implementation in a system known for excellence in chronic illness care that has institutionalized routine pain screening over the last decade and uses an electronic health record to capture ratings,3,24 As such, it may conservatively estimate the challenges of implementing routine pain screening in non-VA settings. We limited our study to outpatient evaluation, and the factors that affect variability may be different in inpatient and nursing home settings. Thus, our findings underscore the need for additional research within those settings. Because pain intensity fluctuates, our findings reflect the test as well as patients’ changing pain, although a relatively small proportion of patients reported a change in pain between the nurse evaluation and research interview.
There are also limitations to the study design that could influence our findings. Being both research administered and administered closely together (eg, in the same survey and a few questions apart in the ordering) could contribute to correlation between the research-recorded NRS and the BPI, whether because of better recall or because the entire survey focuses on pain. This elaboration may warrant the patient's attention to make them focus more intently on their pain (rather than the primary reason for the examination), which could lead to higher agreement between the two. However, the degree of pain we found was very similar to other studies of routine pain assessment in the primary care setting.8,9
Conclusion
Though the accuracy of the 5th vital sign for pain assessment is moderate, it is much lower in practice than under ideal research circumstances. Uniquely, we found that nurses may not always use the 0 to 10 scale to properly quantify pain levels and that informal screening practice leads to underestimation. Efforts to improve routine pain management can confidently use NRS, but provider training, education, and monitoring in screening techniques are needed, as are efforts to link the 5th vital sign to clinician action for better pain management.
Acknowledgments
The authors would like to thank Dr. Kurk Kroenke for his generous, helpful suggestions, which greatly improved this project.
Notes
This article was externally peer reviewed.
Funding: Veterans Administration Health Services Research and Development IIR 03–150.
Conflict of interest: none declared.
- Received for publication August 1, 2008.
- Revision received November 21, 2008.
- Accepted for publication November 26, 2008.