|
|
||||||||
Evidence-Based Clinical Medicine |
Department of Family Medicine and Community Health (IMB, JCC, JN), University of Pennsylvania School of Medicine, Philadelphia
Department of Psychiatry (JCC), University of Pennsylvania School of Medicine, Philadelphia
Leonard Davis Institute of Health Economics (IMB), University of Pennsylvania School of Medicine, Philadelphia
Lancaster General Hospital Department of Family and Community Medicine and Lousie von Hess Medical Research Institute (AC, MH, SR), Lancaster, PA
Department of Liaison Psychiatry, Leicester General Hospital (AJM), Leicestershire, UK
Good Samaritan Family Practice Residency Program (EJ), Lebanon, PA
Correspondence: Corresponding author: Ian M. Bennett, MD, PhD, Family Medicine and Community Health, University of Pennsylvania School of Medicine, 2nd Floor Gates Pavilion, 3400 Spruce Street, Philadelphia, PA 19104 (E-mail: ian.bennett{at}uphs.upenn.edu)
| Abstract |
|---|
|
|
|---|
Methods: Cross-sectional assessments at 15 weeks' gestation (n = 414), 30 weeks' gestation (n = 334), and 6 to 16 weeks postpartum (n = 193) among women from a diverse set of races/ethnicities, participating in the IMPLICIT maternal care quality improvement network. The Edinburgh Postnatal Depression Scale score (
13) was used as the criterion measure for the PHQ-2.
Results: A positive 2-item screen had sensitivity of 93%, 82%, and 80% and specificity of 75%, 80%, and 86% for Edinburgh Postnatal Depression Scale score of
13 for assessment at 15 and 30 weeks gestational age and postpartum, respectively. The positive/negative predictive values for the PHQ-2 were 44/98, 24/91, and 30/98 for each time point, respectively. Areas under the receiver operating characteristic curve analysis suggested that 2-item assessments at each time point had approximately equal diagnostic validity.
Conclusions: Two questions were efficient to rule out depression and reduced the need for further screening of approximately 60% to 80% of women, depending on the point in pregnancy or postpartum. A diagnostic interview follow-up of women screening positive is still required.
In this study, we assess the ability of the 2-item version of the patient health questionnaire (PHQ-2), with a dichotomous (yes/no) response, to increase the efficiency of screening for risk of major depression during pregnancy and postpartum. Women were evaluated at 2 times during pregnancy and once in the postpartum period using the Edinburgh Postnatal Depression Scale (EPDS) as the criterion. The EPDS is a widely used first step in identifying perinatal depression.15,16 The EPDS is most appropriately followed up with an in depth diagnostic interview of women screening positive, but it is efficient in ruling out further assessment for women screening negative; with that caveat in mind, Milgrom et al17 found that the EPDS had a positive predictive value (PPV) of 76% when measured against Diagnostic and Statistical Manual of Mental Disorders-IV criteria. The goal of the current study was to determine whether the 2-item screening measure could adequately identify women with elevated perinatal depressive symptoms so that a more extensive assessment of women unlikely to have major depression could be avoided.
| Methods |
|---|
|
|
|---|
Measures of Depression
Depressive symptomatology was measured using the English and Spanish versions of the EPDS, a 10-item instrument developed for use in the postpartum period and validated for use during pregnancy, with scores ranging from 0 to 30.18,19 The EPDS has been used in a wide range of populations, including low-income women similar to those in the current study.2,20,21 A number of cut points have been used and proposed for the identification of perinatal depression, although there are considerable methodological inconsistencies and inadequacies in available studies, precluding a definitive choice among them.22 We have chosen to evaluate EPDS score
13 because it has the strongest support for identifying major depression.13 Although studies are limited, the pooled sensitivity for the identification of major depression in high-quality studies at this cut point both in pregnancy and postpartum has been found to be >95% with a specificity of >80%.13,23 The potential pre-screening tool was the 2-item version of the PHQ-2, modified so that response choices were dichotomous (yes/no) rather than 4 ordinally related items.10,24,25 Responding yes to either of the items was considered a positive result: "During the past month have you often been bothered by feeling down, depressed, or hopeless?" and "During the past month have you often been bothered by little interest or pleasure in doing things?" Both the PHQ-2 and EPDS were initially provided to the patients in printed form as self-response questionnaires. Physicians then orally reviewed the responses to these assessments with patients individually. History of depression was determined by abstracting documentation of previous depression from prenatal and postpartum clinical charts.
Descriptive Variables
Descriptive variables were selected to aid in the characterization of the sample because of their established associations with depression during pregnancy and postpartum. Demographic variables selected were age, race/ethnicity, educational level, marital status, parity (previous live births), and health insurance status.26–33 To differentiate between women who are still in school and those who did not complete school, we divided the <high school completion category based on age:
18 years old (potentially still in school) and >18 years old (no longer eligible for high school). Self-reported lifetime history of depression is associated with depression during pregnancy and postpartum and was included for evaluation as a potential additional screening item for depression.34–36
Statistical Analysis
Analyses for each of the study time points were conducted independently of the others and so are distinct cross sectional assessments. Bivariate associations between descriptive variables at each time period and a positive EPDS (
13) were assessed using Student's t test and the
2 statistic, with appropriate extension when assessing variables with more than 2 (dichotomous) categories.
Receiver Operating Characteristic Curve Analysis
Receiver operating characteristic (ROC) curve analysis was used to evaluate the discrimination capacity of the predictive test (the PHQ-2 alone or with history of depression), or its ability to accurately identify those women in the dichotomized categories (EPDS positive or negative at the cut point of
13).37 ROC curves plot sensitivity (true positive ratio) by 1-specificity (true negative ratio) for a series of thresholds or cut points established by responses to the instrument items. These thresholds provide information regarding the test characteristics, which can be used to determine the relative usefulness of the test and the specific threshold that maximizes the desired characteristics for a specific clinical setting (emphasizing either the sensitivity or specificity of the instrument). For a 2-item screening instrument, requiring a positive response from only one of the items (cut point = 1 of 2) maximizes the likelihood of identifying cases (increased sensitivity) but also increases the likelihood of false positives (reduced specificity). Requiring both screening items to be positive to identify a case (cut point = 2) increases the likelihood of missing true positives but also reduces the likelihood of false positives. The area under the ROC curve represents an overall measurement of performance of the screening test, with 1.0 a perfect test and 0.5 representing a test with no discriminating capacity.38 All analyses were conducted using SPSS version 15.0 (SPSS, Inc., Chicago, IL).
| Results |
|---|
|
|
|---|
|
Diagnostic Accuracy
The characteristics of the PHQ-2 for identifying positive EPDS cases, alone and in combination with a history of depression, are shown in Table 2 and Figure 1. The cut points for the combined PHQ-2 and history of depression screening instrument are labeled as 1/2 (positive for either the PHQ-2 or history of depression) and 2 (positive for both PHQ-2 and history of depression). The overall performance of the measure as determined by the area under the ROC curve were similar across the 3 study points and had similar point estimates and overlapping confidence intervals when the PHQ-2 was used alone or in combination with the history of depression (Table 2, column 2).
|
|
(2) Screening at week 30 (EGA).
The PHQ-2 facilitated the detection of 36 of 44 women with depression, defined by a score of 13 or above on the EPDS, with a sensitivity of 81.8%. Specificity was 80.0%, with a PPV of 38% and NPV 97%. When the PHQ was combined with a history of depression, sensitivity was improved to 93.2% but at the expense of specificity (63.1%).
(3) Postpartum Screening (6 to 16 weeks after birth)
The PHQ-2 identified 8 of 10 women with postpartum depression, defined by a score of 13 or above on the EPDS, with a sensitivity of 80.0%. Specificity was 85.8%, with a PPV of 24% and NPV 99%. When the PHQ was combined with a history of depression sensitivity was improved to 90.0% but at the expense of specificity (67.2%).
| Discussion |
|---|
|
|
|---|
13 on the Edinburgh Postnatal Depression Scale. The addition of a question regarding history of depression improved the sensitivity, at the expense of specificity, and the PPV of this screening. Out of 100 unselected cases seen at 15 weeks, the PHQ-2 alone would help correctly detect 16 out of 17 depressed patients (missing only one patient) and correctly reassure 62 out of 83 patients, falsely diagnosing 21. Similarly, out of 100 unselected patients seen postpartum, the PHQ-2 alone would help correctly identify 4 out of 5 depressed patients (the prevalence is 5% in this case) and correctly reassure 81 out of 95 patients, misidentifying 13. This demonstrates that when the PHQ-2 is negative it has a 97% to 99% chance of accurately ruling out depression with very few false negatives. Thus, this rapid screening procedure may be useful as part of a multistage case finding strategy to rule out women who would otherwise require further time-consuming assessment for the presence of maternal depression. Based on the rate of positive 2-item screens in the current study, 60% to 80% of the women being assessed (depending on the particular point in pregnancy and postpartum) can be ruled out with high certainty of not having depression. When the history of depression item is added, the number of women quickly ruled out is slightly higher but at a cost of approximately 10% in case-finding accuracy. It is important to keep in mind that even when the PHQ-2 is negative and the NPV is 90% or higher, it is possible that these women actually have major depression. For this reason we suggest that a negative screen should not overrule clinical concern and that anyone judged to be symptomatic clinically but with a negative pre-screen or screen for depression should have a further concurrent or follow-up examination. It may be somewhat surprising that adding history of depression questions did not substantially improve the performance of prescreening, particularly given the recognition that depression is a recurrent episodic disorder.39 One explanation is that the women in our sample were young enough that they were still experiencing their first episode of depression. A second reason might be that the PHQ-2 alone has reached a ceiling regarding rule-out accuracy. A third possible reason questions the validity of simple self-report questions for assessing history without a basis for explaining the questions or probing the responses that an interview provides, a finding that has been reported elsewhere.40
Importantly, we have shown that the modified PHQ-2 used here functions well as a first pre-screening step in early and late pregnancy and postpartum. By efficiently ruling out those who do not have depression, resources can be targeted at the women at highest risk of a major depressive disorder. However, the value of this strategy depends on diagnostic follow-up of the women registering positive with both the pre-screening and the EPDS. A positive score on the EPDS is by itself an inadequate basis for clinical decision making and, if used as such, scarce resources might be used on the treatment of women who are not truly depressed and in need of help. There is some evidence that busy primary care clinicians are inclined not to follow up on the results of screening questionnaires, instead accepting a positive score on a screening questionnaire as a sufficient basis for diagnosis.41 It would be unfortunate if the introduction of a pre-screen further encouraged this practice; before recommending this pre-screening strategy clinically it would be important to determine that it would not be used inappropriately.
Although we assessed the PHQ-2 as a pre-screen it has been recommended for use alone as a screening instrument in general primary care practice.10 It is possible that this instrument, without the use of the EPDS, has sufficient accuracy to identify women at risk (or those not at risk) for depression during pregnancy and postpartum. Currently, the EPDS is well validated and highly accepted in perinatal settings, but little work has justified the use of the PHQ-2 to date. A 3-question approach (essentially the PHQ-2 plus a help question) has been recommended by the British health system's National Institute for Clinical Excellence guidelines but has never been formally evaluated in perinatal settings.12 In one study, Olson et al42 found that 17% of 1398 mothers answered positively to at least one question of the PHQ-2, although validity was not reported. Recently, Dubowitz et al43 adapted the PHQ-2 into the "Parent Screening Questionnaire." The authors evaluated this questionnaire among 216 mothers in a primary care clinic compared with the Beck Depression Inventory II completed 2 months later. When a positive response to either or both of the 2 questions was considered, the sensitivity was 74%, the specificity was 80%, and the NPV was 95%, but the PPV was only 36%.
An unresolved issue is how many of those who screen positive actually are willing to accept professional help. In one study,44 only 30% of those who screened positive agreed to be contacted for further help. In a second study,45 only 23% of high scorers on the Beck Depression Inventory took up the offer for psychological therapy and 10% agreed to medication. This highlights that screening alone is rarely sufficient for improvements in quality of care, and enhanced detection should be paired with enhanced treatment and follow-up.46
A number of limitations of the current study should be reviewed. First, we use as a criterion measure the EPDS screening instrument rather than the clinical diagnosis of depression. However, the goal of the current analysis was to reduce the burden of screening for depression through a pre-screen step, not to diagnose major depressive disorder. Because the formal diagnosis of depression requires significant diagnostic expertise and the investment of valuable clinical time, the rapid identification of women at risk for depression (and not at risk) is critical to the efficient delivery of maternal care. Another limitation of the current study is that the women included were primarily low income as indicated by the high rate of Medicaid insurance. The findings from this study may not be generalizable to higher-income and other less vulnerable populations. However, women with low incomes have higher risk for depressive symptomatology and are less likely to receive mental health services and so are a particularly important group to target for screening.4,47 One recent study48 found that the sensitivity of the PHQ-2 for a positive EPDS score was lower for women with a high school education or less than for those who went on beyond high school. We did not find any difference in sensitivity across levels of educational attainment in our sample. This difference may be because of our use of a simpler dichotomous yes/no response choice for the PHQ-2 and a follow-up oral review of the items, which reduces concerns of low education- and literacy-related inaccuracy of a written and more complex instrument.11 Finally, although we had complete data for more than 50% of the women eligible for analysis at each of the time points, it is possible that missing data were related to depression risk status. For example, women with more obvious symptoms may have been either more or less likely to have both the pre-screen (PHQ-2) and screening (EPDS) conducted (both were required to be included in the analysis). Because of the protocol-driven system of data collection in this study, we believe that it is unlikely that any such bias was widespread enough to greatly influence our findings.
Based on the results of these analyses it is reasonable for maternal care providers who are currently using or are planning to use the EPDS to use the PHQ-2 pre-screen, with simplified response items (yes/no), as an initial component of a multistep case-finding strategy for depression during pregnancy and postpartum. Although this approach should not overrule clinical concern, women who answer "no" to both of the questions in the PHQ-2 can, with high confidence, avoid the longer EPDS and a diagnostic clinical interview. Women who answer "yes" to either of these items should proceed to the EPDS, followed by an in-depth clinical interview for women who have a positive EPDS (
13). Although further research is needed, we recommend that the screening process make use of a direct oral interview rather than relying entirely on a written questionnaire to elicit or review screening questions to avoid error caused by low education and literacy.
| Acknowledgments |
|---|
| Notes |
|---|
|
|
|---|
Funding: This work was supported by a grant from the March of Dimes. IMB was supported by grants from the NICHD (1K23HD048915-01A2) and the NIMH (1R03MH074750-01).
Prior presentation: This work was presented in part at the 2007 North American Primary Care Research Group annual meeting, Vancouver, Canada, October 20 through October 23, 2007.
Conflict of interest: none declared.
Received for publication February 23, 2008. Revision received April 28, 2008. Accepted for publication April 29, 2008.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. A. Bowman, A. V. Neale, and P. Lupo Third Journal of the American Board of Family Medicine Practice-based Research Theme Issue J Am Board Fam Med, July 1, 2008; 21(4): 255 - 257. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | CONTACT US | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |