Abstract
Objective: Systems for efficient case finding of women with major depression during pregnancy and postpartum are needed. Here we assess the diagnostic accuracy of a modified 2-item patient health questionnaire (PHQ-2) as a pre-screen in assessing depression.
Methods: Cross-sectional assessments at 15 weeks' gestation (n = 414), 30 weeks' gestation (n = 334), and 6 to 16 weeks postpartum (n = 193) among women from a diverse set of races/ethnicities, participating in the IMPLICIT maternal care quality improvement network. The Edinburgh Postnatal Depression Scale score (≥13) was used as the criterion measure for the PHQ-2.
Results: A positive 2-item screen had sensitivity of 93%, 82%, and 80% and specificity of 75%, 80%, and 86% for Edinburgh Postnatal Depression Scale score of ≥13 for assessment at 15 and 30 weeks gestational age and postpartum, respectively. The positive/negative predictive values for the PHQ-2 were 44/98, 24/91, and 30/98 for each time point, respectively. Areas under the receiver operating characteristic curve analysis suggested that 2-item assessments at each time point had approximately equal diagnostic validity.
Conclusions: Two questions were efficient to rule out depression and reduced the need for further screening of approximately 60% to 80% of women, depending on the point in pregnancy or postpartum. A diagnostic interview follow-up of women screening positive is still required.
The identification of perinatal major depression occurring during pregnancy and the first year postpartum is considered a critical goal of the maternal care system but is poorly conducted.1–4 Only a minority of women suffering from perinatal depression are identified by health care providers of either maternal or infant care.5–8 Although limitations of mental health service delivery play a role in this discrepancy, a major impediment to low rates of depression case finding is the difficulty in the administration of depression screening in busy clinical settings.9 A range of instruments have been used during pregnancy and postpartum, but they generally take at least 5 minutes to complete or rely on adequate literacy of patients for self-administration. In the general primary care setting a 2-item pre-screen has been validated and has reasonable psychometric properties for ruling out patients at low risk for depression, allowing them to avoid further assessment.10–12 This approach has obvious advantages during pregnancy and has been recommended, with the addition of a third item concerning a request for help, for use by maternal care providers in the British health system.13,14 However, to our knowledge this instrument has not been validated in the prenatal and postpartum settings where the normal symptoms of pregnancy can overlap with those of depression.
In this study, we assess the ability of the 2-item version of the patient health questionnaire (PHQ-2), with a dichotomous (yes/no) response, to increase the efficiency of screening for risk of major depression during pregnancy and postpartum. Women were evaluated at 2 times during pregnancy and once in the postpartum period using the Edinburgh Postnatal Depression Scale (EPDS) as the criterion. The EPDS is a widely used first step in identifying perinatal depression.15,16 The EPDS is most appropriately followed up with an in depth diagnostic interview of women screening positive, but it is efficient in ruling out further assessment for women screening negative; with that caveat in mind, Milgrom et al17 found that the EPDS had a positive predictive value (PPV) of 76% when measured against Diagnostic and Statistical Manual of Mental Disorders-IV criteria. The goal of the current study was to determine whether the 2-item screening measure could adequately identify women with elevated perinatal depressive symptoms so that a more extensive assessment of women unlikely to have major depression could be avoided.
Methods
Setting
The study sample included patients receiving prenatal and postpartum care from 11 family medicine residency clinical practices in the IMPLICIT (Interventions to Minimize Preterm and Low birth weight Infants through Continuous Improvement Techniques) continuous quality improvement network, which includes rural (n = 7) and urban (n = 4) sites in Pennsylvania (n = 9) and New York (n = 2); institutional review board approval was received from all participating institutions. Women were recruited sequentially to the study from November 2004 through February 2007. Eligibility criteria included English- or Spanish-speaking ability and a singleton intrauterine pregnancy. Women were enrolled in the study at any of 3 time points: twice during pregnancy (15 and 30 weeks' gestation) and once postpartum (6 to 16 weeks after delivery). All eligible women who had complete depression screening data (scores for PHQ-2 and EPDS, see below) for a particular time point were assessed. Women were excluded who had missing depression screening data because of the requirements of the statistical analyses described below. At the 15 week prenatal assessment, 414 of the 785 eligible women had complete data and were used for analysis (52%), at 30 weeks' gestation 334 of 785 had complete data (58%), and at the postpartum assessment 193 of the 274 eligible women had complete data (70%). Women with missing depression screening data did not differ significantly from those included in the analysis by any demographic variables (P > .05). Women were assessed by physicians or nurses during routine maternal care. Data were then recorded using standardized forms and then entered, after being de-identified, into an Internet-based centralized database for the IMPLICIT network.
Measures of Depression
Depressive symptomatology was measured using the English and Spanish versions of the EPDS, a 10-item instrument developed for use in the postpartum period and validated for use during pregnancy, with scores ranging from 0 to 30.18,19 The EPDS has been used in a wide range of populations, including low-income women similar to those in the current study.2,20,21 A number of cut points have been used and proposed for the identification of perinatal depression, although there are considerable methodological inconsistencies and inadequacies in available studies, precluding a definitive choice among them.22 We have chosen to evaluate EPDS score ≥13 because it has the strongest support for identifying major depression.13 Although studies are limited, the pooled sensitivity for the identification of major depression in high-quality studies at this cut point both in pregnancy and postpartum has been found to be >95% with a specificity of >80%.13,23 The potential pre-screening tool was the 2-item version of the PHQ-2, modified so that response choices were dichotomous (yes/no) rather than 4 ordinally related items.10,24,25 Responding yes to either of the items was considered a positive result: “During the past month have you often been bothered by feeling down, depressed, or hopeless?” and “During the past month have you often been bothered by little interest or pleasure in doing things?” Both the PHQ-2 and EPDS were initially provided to the patients in printed form as self-response questionnaires. Physicians then orally reviewed the responses to these assessments with patients individually. History of depression was determined by abstracting documentation of previous depression from prenatal and postpartum clinical charts.
Descriptive Variables
Descriptive variables were selected to aid in the characterization of the sample because of their established associations with depression during pregnancy and postpartum. Demographic variables selected were age, race/ethnicity, educational level, marital status, parity (previous live births), and health insurance status.26–33 To differentiate between women who are still in school and those who did not complete school, we divided the <high school completion category based on age: ≤18 years old (potentially still in school) and >18 years old (no longer eligible for high school). Self-reported lifetime history of depression is associated with depression during pregnancy and postpartum and was included for evaluation as a potential additional screening item for depression.34–36
Statistical Analysis
Analyses for each of the study time points were conducted independently of the others and so are distinct cross sectional assessments. Bivariate associations between descriptive variables at each time period and a positive EPDS (≥13) were assessed using Student's t test and the χ2 statistic, with appropriate extension when assessing variables with more than 2 (dichotomous) categories.
Receiver Operating Characteristic Curve Analysis
Receiver operating characteristic (ROC) curve analysis was used to evaluate the discrimination capacity of the predictive test (the PHQ-2 alone or with history of depression), or its ability to accurately identify those women in the dichotomized categories (EPDS positive or negative at the cut point of ≥13).37 ROC curves plot sensitivity (true positive ratio) by 1-specificity (true negative ratio) for a series of thresholds or cut points established by responses to the instrument items. These thresholds provide information regarding the test characteristics, which can be used to determine the relative usefulness of the test and the specific threshold that maximizes the desired characteristics for a specific clinical setting (emphasizing either the sensitivity or specificity of the instrument). For a 2-item screening instrument, requiring a positive response from only one of the items (cut point = 1 of 2) maximizes the likelihood of identifying cases (increased sensitivity) but also increases the likelihood of false positives (reduced specificity). Requiring both screening items to be positive to identify a case (cut point = 2) increases the likelihood of missing true positives but also reduces the likelihood of false positives. The area under the ROC curve represents an overall measurement of performance of the screening test, with 1.0 a perfect test and 0.5 representing a test with no discriminating capacity.38 All analyses were conducted using SPSS version 15.0 (SPSS, Inc., Chicago, IL).
Results
Sample Characteristics
The description of the study sample is included in Table 1. Overall the sample was comprised primarily of young, low-income women with demographic characteristics consistent with that setting. More than half of the participants were between the ages of 19 and 24 years and were white, with approximately one quarter having less than high school education (only one quarter had any college or more); nearly half of the women in the study (45%) were having their first child, and nearly all with Medicaid insurance or self paid. The distribution of these demographic variables did not vary significantly between the 3 study periods except for race/ethnicity (P > .05); there was a smaller proportion of African-American women in the postpartum sample than in the 2 pregnancy groups.
The percentage of women in the 15- and 30-week prenatal and postpartum study periods who were positive by EPDS score for current depressive symptomatology was 17% (72 of 414), 13% (44 of 334), and 5% (10 of 193), respectively. Of the demographic variables, only Medicaid insurance status was associated with a positive screen with the EPDS. Approximately 30% of women at each of the 3 time points reported a history of depression. Depression history was significantly associated with a positive EPDS screen for the 15-week and 30-week assessments but not the postpartum assessment. The PHQ-2, both alone and when combined with history of depression, was highly associated with a positive EPDS at all 3 of the study assessment points.
Diagnostic Accuracy
The characteristics of the PHQ-2 for identifying positive EPDS cases, alone and in combination with a history of depression, are shown in Table 2 and Figure 1. The cut points for the combined PHQ-2 and history of depression screening instrument are labeled as 1/2 (positive for either the PHQ-2 or history of depression) and 2 (positive for both PHQ-2 and history of depression). The overall performance of the measure as determined by the area under the ROC curve were similar across the 3 study points and had similar point estimates and overlapping confidence intervals when the PHQ-2 was used alone or in combination with the history of depression (Table 2, column 2).
(1) Screening at week 15 (estimated gestational age [EGA])
The PHQ-2 allowed the detection of 67 of 72 women with depression, defined by a score of 13 or above on the EPDS, with a sensitivity of 93.1%. Specificity was 75.1%, with a PPV of 44% and negative predictive value (NPV) 98%. When the PHQ-2 was combined with a history of depression, sensitivity was improved to 97.2% but at the expense of specificity (61.1%).
(2) Screening at week 30 (EGA).
The PHQ-2 facilitated the detection of 36 of 44 women with depression, defined by a score of 13 or above on the EPDS, with a sensitivity of 81.8%. Specificity was 80.0%, with a PPV of 38% and NPV 97%. When the PHQ was combined with a history of depression, sensitivity was improved to 93.2% but at the expense of specificity (63.1%).
(3) Postpartum Screening (6 to 16 weeks after birth)
The PHQ-2 identified 8 of 10 women with postpartum depression, defined by a score of 13 or above on the EPDS, with a sensitivity of 80.0%. Specificity was 85.8%, with a PPV of 24% and NPV 99%. When the PHQ was combined with a history of depression sensitivity was improved to 90.0% but at the expense of specificity (67.2%).
Discussion
In this analysis of women during pregnancy and postpartum we determined that pre-screening with the 2-item PHQ generally had good sensitivity and specificity for identifying women with positive scores ≥13 on the Edinburgh Postnatal Depression Scale. The addition of a question regarding history of depression improved the sensitivity, at the expense of specificity, and the PPV of this screening. Out of 100 unselected cases seen at 15 weeks, the PHQ-2 alone would help correctly detect 16 out of 17 depressed patients (missing only one patient) and correctly reassure 62 out of 83 patients, falsely diagnosing 21. Similarly, out of 100 unselected patients seen postpartum, the PHQ-2 alone would help correctly identify 4 out of 5 depressed patients (the prevalence is 5% in this case) and correctly reassure 81 out of 95 patients, misidentifying 13. This demonstrates that when the PHQ-2 is negative it has a 97% to 99% chance of accurately ruling out depression with very few false negatives. Thus, this rapid screening procedure may be useful as part of a multistage case finding strategy to rule out women who would otherwise require further time-consuming assessment for the presence of maternal depression.
Based on the rate of positive 2-item screens in the current study, 60% to 80% of the women being assessed (depending on the particular point in pregnancy and postpartum) can be ruled out with high certainty of not having depression. When the history of depression item is added, the number of women quickly ruled out is slightly higher but at a cost of approximately 10% in case-finding accuracy. It is important to keep in mind that even when the PHQ-2 is negative and the NPV is 90% or higher, it is possible that these women actually have major depression. For this reason we suggest that a negative screen should not overrule clinical concern and that anyone judged to be symptomatic clinically but with a negative pre-screen or screen for depression should have a further concurrent or follow-up examination. It may be somewhat surprising that adding history of depression questions did not substantially improve the performance of prescreening, particularly given the recognition that depression is a recurrent episodic disorder.39 One explanation is that the women in our sample were young enough that they were still experiencing their first episode of depression. A second reason might be that the PHQ-2 alone has reached a ceiling regarding rule-out accuracy. A third possible reason questions the validity of simple self-report questions for assessing history without a basis for explaining the questions or probing the responses that an interview provides, a finding that has been reported elsewhere.40
Importantly, we have shown that the modified PHQ-2 used here functions well as a first pre-screening step in early and late pregnancy and postpartum. By efficiently ruling out those who do not have depression, resources can be targeted at the women at highest risk of a major depressive disorder. However, the value of this strategy depends on diagnostic follow-up of the women registering positive with both the pre-screening and the EPDS. A positive score on the EPDS is by itself an inadequate basis for clinical decision making and, if used as such, scarce resources might be used on the treatment of women who are not truly depressed and in need of help. There is some evidence that busy primary care clinicians are inclined not to follow up on the results of screening questionnaires, instead accepting a positive score on a screening questionnaire as a sufficient basis for diagnosis.41 It would be unfortunate if the introduction of a pre-screen further encouraged this practice; before recommending this pre-screening strategy clinically it would be important to determine that it would not be used inappropriately.
Although we assessed the PHQ-2 as a pre-screen it has been recommended for use alone as a screening instrument in general primary care practice.10 It is possible that this instrument, without the use of the EPDS, has sufficient accuracy to identify women at risk (or those not at risk) for depression during pregnancy and postpartum. Currently, the EPDS is well validated and highly accepted in perinatal settings, but little work has justified the use of the PHQ-2 to date. A 3-question approach (essentially the PHQ-2 plus a help question) has been recommended by the British health system's National Institute for Clinical Excellence guidelines but has never been formally evaluated in perinatal settings.12 In one study, Olson et al42 found that 17% of 1398 mothers answered positively to at least one question of the PHQ-2, although validity was not reported. Recently, Dubowitz et al43 adapted the PHQ-2 into the “Parent Screening Questionnaire.” The authors evaluated this questionnaire among 216 mothers in a primary care clinic compared with the Beck Depression Inventory II completed 2 months later. When a positive response to either or both of the 2 questions was considered, the sensitivity was 74%, the specificity was 80%, and the NPV was 95%, but the PPV was only 36%.
An unresolved issue is how many of those who screen positive actually are willing to accept professional help. In one study,44 only 30% of those who screened positive agreed to be contacted for further help. In a second study,45 only 23% of high scorers on the Beck Depression Inventory took up the offer for psychological therapy and 10% agreed to medication. This highlights that screening alone is rarely sufficient for improvements in quality of care, and enhanced detection should be paired with enhanced treatment and follow-up.46
A number of limitations of the current study should be reviewed. First, we use as a criterion measure the EPDS screening instrument rather than the clinical diagnosis of depression. However, the goal of the current analysis was to reduce the burden of screening for depression through a pre-screen step, not to diagnose major depressive disorder. Because the formal diagnosis of depression requires significant diagnostic expertise and the investment of valuable clinical time, the rapid identification of women at risk for depression (and not at risk) is critical to the efficient delivery of maternal care. Another limitation of the current study is that the women included were primarily low income as indicated by the high rate of Medicaid insurance. The findings from this study may not be generalizable to higher-income and other less vulnerable populations. However, women with low incomes have higher risk for depressive symptomatology and are less likely to receive mental health services and so are a particularly important group to target for screening.4,47 One recent study48 found that the sensitivity of the PHQ-2 for a positive EPDS score was lower for women with a high school education or less than for those who went on beyond high school. We did not find any difference in sensitivity across levels of educational attainment in our sample. This difference may be because of our use of a simpler dichotomous yes/no response choice for the PHQ-2 and a follow-up oral review of the items, which reduces concerns of low education- and literacy-related inaccuracy of a written and more complex instrument.11 Finally, although we had complete data for more than 50% of the women eligible for analysis at each of the time points, it is possible that missing data were related to depression risk status. For example, women with more obvious symptoms may have been either more or less likely to have both the pre-screen (PHQ-2) and screening (EPDS) conducted (both were required to be included in the analysis). Because of the protocol-driven system of data collection in this study, we believe that it is unlikely that any such bias was widespread enough to greatly influence our findings.
Based on the results of these analyses it is reasonable for maternal care providers who are currently using or are planning to use the EPDS to use the PHQ-2 pre-screen, with simplified response items (yes/no), as an initial component of a multistep case-finding strategy for depression during pregnancy and postpartum. Although this approach should not overrule clinical concern, women who answer “no” to both of the questions in the PHQ-2 can, with high confidence, avoid the longer EPDS and a diagnostic clinical interview. Women who answer “yes” to either of these items should proceed to the EPDS, followed by an in-depth clinical interview for women who have a positive EPDS (≥13). Although further research is needed, we recommend that the screening process make use of a direct oral interview rather than relying entirely on a written questionnaire to elicit or review screening questions to avoid error caused by low education and literacy.
Acknowledgments
Thanks to all the contributing members of the IMPLICIT network and the Family Medicine Educational Consortium for contributions that made this work possible.
Notes
This article was externally peer reviewed.
Funding: This work was supported by a grant from the March of Dimes. IMB was supported by grants from the NICHD (1K23HD048915-01A2) and the NIMH (1R03MH074750-01).
Prior presentation: This work was presented in part at the 2007 North American Primary Care Research Group annual meeting, Vancouver, Canada, October 20 through October 23, 2007.
Conflict of interest: none declared.
- Received for publication February 23, 2008.
- Revision received April 28, 2008.
- Accepted for publication April 29, 2008.