Abstract
Objectives: To compare the Edinburgh Postnatal Depression Scale (EPDS) and Patient Health Questionnaire (PHQ-9) as screening tools for postpartum depression.
Methods: This study population included the first 500 women to enroll and return their packets during an ongoing study of postpartum depression.
Results: The primary outcome of this study was to find rates of concordance and discordance in the EPDS and PHQ-9 categories of “normal” and “increased risk for major depressive disorder.” Overall, 97% of eligible women enrolled and 70% returned the packets that included the EPDS and PHQ-9. Four hundred eighty-one of the first 500 packets had complete data, with elevated EPDS or PHQ-9 scores in 138 and 132 women, respectively. Concordance of the EPDS and PHQ-9 were present in 399 women (83%): 326 (67.8%) had “normal” score on both, and 73 (15.2%) had elevated scores for both. Discordant scores in 82 women included 17 with elevated PHQ-9 scores but normal EPDS scores and 65 with elevated EPDS scores and PHQ-9 scores <10. In multivariate logistic regression modeling, only age >30 and low education level were predictive of discordant scores, using EPDS and PHQ-9 scores of ≥10 as elevated (odds ratio, 1.9 and P = .02; and odds ratio, 2.3 and P = .01, respectively). PHQ-9 scores of 5 to 9 have been referred to as consistent with “mild depressive symptoms” and appropriate for “watchful waiting” and repeat PHQ-9 at follow-up. Using this follow-up approach would require re-evaluation of 120 (25%) of the women screened.
Conclusions: Postpartum depression screening is feasible in primary care practices, and for most women the EPDS and PHQ-9 scores were concordant. Further work is required to identify reasons for the 17% discordant scores as well as to provide definitive recommendations for PHQ-9 scores of 5 to 9.
Primary care office-based screening for postpartum depression (PPD) has been shown to increase recognition and treatment of PPD1–8 but has not been shown to improve outcomes—such as lower levels of depressive symptoms among the women, greater parenting comfort, or increased relationship satisfaction between the parents—at 12 months postpartum. Although inadequate outcome data have prevented national recommendations for routine screening for PPD, some large health care and professional organizations, plus a few state legislatures, are recommending or requiring routine PPD screening.9–12 Most organizations are recommending use of the well-validated Edinburgh Postnatal Depression Scale (EPDS), developed specifically for PPD screening.13–18
Routine depression screening has been recommended for all adults19–24 using tools, such as the Patient Health Questionnaire (PHQ-9),24 that have been validated in primary care practices.24–28 None of the tools used for adults have been adequately assessed during the postpartum period. A recent report suggests that a modification of the first 2 questions of the PHQ-9 that had dichotomized responses could be used as a prescreening tool for PPD, but the number of patients in the study was modest and the outcomes inconclusive.29 If the PHQ-9 could be used for PPD screening, physicians and health care systems might be able to use a single tool for screening all adults for depression.
Here we report methods and initial results from the Translating Research into Practice for Postpartum Depression (TRIPPD) study, a large randomized, controlled trial of PPD screening and follow-up in primary care practice-based research network (PBRN) practices. These initial results focus on the concordance and discordance of EPDS and PHQ-9 scores. In addition to the total PHQ-9 and EPDS scores, concordance and discordance of suicidal ideation answers are also discussed. These findings identify issues to consider when using the PHQ-9 for PPD screening.
Methods
TRIPPD is a randomized, controlled trial assessing the feasibility and impact of a practice change for screening, diagnosis, and follow-up of PPD in 29 PBRN practices associated with the American Academy of Family Physicians National Research Network.30 Institutional Review Board approval was obtained for all investigators and all sites.
Design
The 29 practices were selected from PBRN practices that provide maternity or newborn care for at least 50 women or infants each year. Practice sites were randomized to continue usual care or to introduce a PPD practice change that included routine screening of all women 5 to 12 weeks postpartum. The PPD screening led to either a simple review of a low score on the EPDS and return to usual care or additional diagnostic evaluation when the EPDS score was elevated and, if depression was diagnosed, selection of appropriate therapy and a recommended follow-up program (see Figure 1) At the time of enrollment, women were given a packet that included both the EPDS and PHQ-9 for them to complete and return by mail to the central study site, the Olmsted Medical Center. Return rate for the enrollment packet was 70.4%.
Setting
The 29 practices in the TRIPPD study are all family medicine-based practices, including 15 rural and 14 metropolitan sites. Sixteen of the sites were family medicine residency sites. The 29 practices were in 20 states from Oregon to Mississippi, representing all regions of the country.
Patients
In both intervention and usual care practices, enrollment was offered to all women 18 years of age or older who spoke and read either English or Spanish (self-reported) and who were 5 to 12 weeks postpartum at the time of the visit. Women who did not return the packet were called and asked if it would be helpful to have someone read the packet to them to allow those women with low reading levels to participate without having to admit they were unable to read either English or Spanish. Women who did not intend to continue care at the participating practice, those who had an emergency condition, those who were outside the window of interest (5 to 12 weeks postpartum), or those who could not read English or Spanish were excluded. The requirement of continuity care was necessary to assure that sites could provide depression management for any mothers identified with PPD.
Intervention
In preparation for implementation of the intervention, the intervention practices received education about diagnosing, treating, and monitoring PPD, as well as information about the specific intervention procedures and tools that were to be used in this study. Physicians were advised to follow elevated screening scores by a short physician diagnostic interview.31 Recommendations regarding depression follow-up care and monitoring were based on the national depression management guidelines32,33 and our work with adult depression in primary care practices34–38 (Figure 1). Practices assigned to usual care continued to follow their usual approach to recognizing and managing PPD; eg, they did not use standardized screening tools or follow-up procedures.
Main Instruments
The EPDS is considered a sensitive (96%) but only moderately specific (82%) screening tool for PPD (positive predictive value, 23%) when a score of ≥10 is used as an indication for further assessment.39 The EPDS has been validated against in-depth interview and mental health assessment for use in postpartum women in all types of care settings and is responsive to improvement in depressive symptoms.27,40,41
The PHQ-9 has been validated against in-depth mental health interviews24,31,41,42 and is reported to be specific (>86% at scores of >10) for identification of people with major depressive disorders (MDD).24,31,41,42 No studies or subanalyses of published studies have assessed use of the PHQ-9 during the postpartum period. The suggested interpretations of PHQ-9 scores in the general primary care adult population are shown in Tables 1 and 2.28,42 The PHQ-9 has been described as both a screening and severity measure for MDD that is responsive to change over time.42 To make PHQ-9 scoring comparable to EPDS scoring, we explored use of PHQ-9 cut points of ≥5 and ≥10 to trigger further assessment by the clinician without requiring functional impairment. We also explored the impact of scoring the PHQ-9 with and without the requirement that either sadness or anhedonia be present more than half the time during the past 2 weeks.
Data Collection
Both an EPDS and a PHQ-9 were included in a packet of questions completed by the enrolled women at both control and intervention sites when the women came for their 5- to 12-week postpartum visits. After completing the packets women mailed them directly to the central study site so they would not interfere with the control arm of the study. This process allowed us to compare EPDS and PHQ-9 scores for all enrolled women in both arms of the study.
Other data for the larger study aims were also gathered through the enrollment packet, including demographic information and measures of parent-parent dyad relationship satisfaction (Dyad Assessment of Satisfaction)43 and parenting comfort (Parenting Stress Inventory),44 and re-assessed at 6 and 12 months postpartum. To assure patient safety, all sites were informed by telephone and email of very high EPDS and PHQ-9 scores (>19) or a positive response to the suicidal ideation questions that were included in both the EPDS and PHQ-9.
The demographic information and EPDS and PHQ-9 scores collected from the enrollment packets were machine scored and linked for each woman. A 10% sample was hand scored to assess reliability of the machine scoring. No scoring differences were found.
Data Analysis
For the aims of this study we developed a comparison of EPDS and PHQ-9 scoring categories (Table 1) and used it to determine concordance and discordance in the EPDS and PHQ-9 scores for each woman. Concordance means that both scores are “normal” or that both scores are in the elevated range. Discordant scores have one in the elevated range and one in the “normal” range.
Because this is a first step in exploratory analysis, we used primarily simple summary statistics. χ2 statistics were used to assess age trends in other demographic categories such as marital status, income above the poverty level, and completion of a high school education. Demographic factors were examined for association with discordant scores using logistic regression. Results are presented in tables and plots.
Results
From the first 500 packets returned, 481 packets (96.2%) had both the EPDS and the PHQ-9 as well as demographic information completed and could be included in the analysis. Demographic information for the women is presented in Table 3. The percentage of women who reported being married, being employed outside the home, having an income of >200% of the poverty level, having completed high school, and living in a house with more than one child increased with the increasing age of the woman (P < .05 for all categories). Overall, 76% lived with at least one other adult, 77% reported being white, 23% reported being black, and 2% answered the questionnaires in Spanish.
When all EPDS and PHQ-9 scores were divided into 2 (dichotomous) categories (scores <10 or scores of ≥10), 399 (83%) of the women had concordant scores. Overall, 326 women (67.8%) had both EPDS and PHQ-9 scores in the low risk of depression range and 73 (15.2%) had both scores in the increased risk of MDD range. This left 17% of the women with discordant scores, meaning that one score was <10 and one was ≥10. In univariate modeling, age, household income, marital status, Hispanic ethnicity, and working outside the home were not found to be statistically significant predictors of discordant scores (P > .13 for each). However, not having completed high school was significant (odds ratio [OR], 2.0; P = .03). In multivariate logistic modeling, age >30 and low education level were both significant (OR, 1.9; P = .02 and OR, 2.3; P = .01, respectively).
When the low risk of PHQ-9 category is further broken down into 2 groups, with “normal” (scores 0–4) and “slightly increased risk of depressive symptoms” (scores of 5–9) separated, the rate of concordance is lower (n = 322; 66.9%) (see Table 4 and Figure 2). Separation of this “watchful waiting” category43 from the “normal” category for the PHQ-9 affects the concordance for “normal” (EPDS <10 and PHQ-9 <5), lowering it to only 51.8% (n = 249). In our study, 120 women (25% of all women screened) were included in this watchful waiting category (scores 5–9). Among these 120 women, most (77 of 120; 64.2%) had normal EPDS scores whereas 43 (35.8%) had EPDS scores of >10 and only 4 had scores of >15. Adding a second level of the subgroup analysis by requiring that one of the 2 major criteria for depression must be present at least half the days (questions 1 and 2 from the PHQ-9) before putting women into the watchful waiting group would remove 69 of the 77 women with PHQ-9 scores of 5 to 9 and normal EPDS scores from further evaluation. Alternatively, of the 43 women with PHQ-9 scores in the 5 to 9 range and elevated EPDS scores, 34 have at least one of the 2 major criteria for depression.
For 8.1% of the women the scores were highly discordant, meaning that the EPDS was ≥10 but the PHQ-9 was in the normal range of <5 (n = 22; 4.6%) or the PHQ-9 was in the increased risk of depression range (≥10) but the EPDS was “normal” (n = 17; 3.5% of all women).
Both the EPDS and the PHQ-9 have questions that address suicidal ideation (Tables 2 and 5). The question in the EPDS is active, asking about self harm, whereas the PHQ-9 question is passive, asking if “you have thought you would be better off dead.” These suicidal ideation questions were as likely to be completed as any of the other questions (96% vs. 97%, respectively). Overall, 435 women (90.4%) answered “never” to suicidal ideation questions on both the EPDS and PHQ-9; 5 responded positively to both screening questions. Thirty women (6.2%) had answers that seemed clinically disparate on the 2 instruments.
Discussion
Postpartum women in this study were willing and able to complete both the PHQ-9 and the EPDS without significant missing data. When separated into only 2 categories of normal versus increased risk of MDD, the EPDS and PHQ-9 score were concordant for the vast majority of women screened (83%). The number of women who would require additional evaluation after screening varied from 138 (28.7%) when using the EPDS to 90 to 210 women (18.7% to 43.7%) when using the PHQ-9, depending on how the scores of 5 to 9 are categorized (as part of “normal” vs “watchful waiting”). Limited data have been published on the outcomes of these 2 different approaches to the PHQ-9 scores of 5 to 9, yet the decision of categorization is crucial to determining the number and burden of the follow-up evaluation of women with “abnormal” scores as well as the rates of false normal screening results. Questions regarding suicidal ideation did not seem to be a barrier to completion of the screening tools; levels of concordance were very high for the active suicide questions on the EPDS and the passive suicide questions on the PHQ-9, with only 30 women (6.2%) having clinically discordant responses.
The decision of how to deal with PHQ-9 scores in the range between 5 and 9 becomes very important in determining how many women the PHQ-9 recommends for further evaluation. If a score of <10 is considered normal, 90 women in our study would be recommended for further evaluation of depression. However, if PHQ-9 scores (5–9) are followed by repeat assessment, then 210 women in our study would be appropriate for further evaluation. Kroenke and Spitzer42 state that an elevated score cut off of ≥10 provides a sensitivity of 88% for MDD, but they offer little discussion of what is meant by “watchful waiting and possible repeat PHQ-9 at follow-up” for scores of 5 to 9. Suggesting that each woman with a score of 5 to 9 be reassessed adds a substantial burden to PPD screening. Because many postpartum women do not plan another visit to their physician for 12 months or more, the reassessment would require additional health care utilization. Adding a requirement of positive response to PHQ-9 questions about anhedonia or questions about feeling sad (first 2 questions of the PHQ-9) to the 5 to 9 scores would cut the percentage of required reassessments by approximately 45%. Although this additional requirement has been reported to be used in practice, little data about using just the PHQ-2 as a screening tool are published.42
The EPDS was developed specifically to avoid over-identification of PPD based on “physical” symptoms such as fatigue, weight and appetite changes, and problems with sleeping that can be suggestive of depression but are a normal part of postpartum recovery.13,18,45 The PHQ-9 includes these physical symptoms and may therefore over-identify women in this early period of motherhood.18 Our data are compatible with this explanation. The comparison of the EPDS and PHQ-9 scores should be assessed later in the postpartum period, when most women have completed breastfeeding and have returned to normal eating and near-normal sleeping habits. These data may also help to determine the value of the “watchful waiting” category of the PHQ-9 for postpartum women. Future data from the TRIPPD study will allow us to complete this long-term comparison.
Our regression analysis suggests that older age and lower levels of completed education may also lead to greater discordance between the EPDS and the PHQ-9. Bennett et al29 also reported that lower levels of education attained (specifically “did not complete high school”) seemed to increase discordance between the EPDS results and a modification of the first 2 questions of the PHQ-9 used as a “prescreening tool.” These differences require further assessment to determine whether the differences were based on the concepts presented in the questions or the language used to address that content.
The EPDS has been compared with screening tools for depression other than the PHQ-9.46 Lee et al47 found that several tools used for non-PPD screening were comparable for identification of women at a high risk of depression. The Chinese cohort was small (n = 145) and the study used the Chinese version of all the instruments, but comparisons were based on results of the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders III-R. In comparison to the Beck Depression Inventory, the EPDS was reported to be “superior.”16 We found no studies that compared the EPDS and PHQ-9 for postpartum women or any studies that used the PHQ-9 as a screening tool for PPD. A recent study reported on the sensitivity and specificity of modifications of the first 2 PHQ-9 questions (anhedonia and feeling sad or down) for prediction of EPDS scores. The rate of positive response to these modified PHQ-2 questions was reported as compared with rates of EPDS scores of ≥13 versus ≤12. Thirteen is a high cutoff score for risk of depression in community-based screening18 and the assessment of the PHQ-2 “sensitivity” is based on identifying 4 of 5 women with high EPDS scores who were called “depressed.”29 The results do not provide information about the possibility of replacing the EPDS with the PHQ-9 or the PHQ-2, which could allow all depression screening for adults in primary care to use a single tool and might facilitate greater integration of routine depression screening into daily primary care practice.48
Addressing suicidal ideation is often uncomfortable for health professionals. Among the 481 women who did complete and return their packets, it was reassuring that we did not find any apparent reluctance to answer the suicidal ideation questions. Using either the PHQ-9 or the EPDS may help facilitate discussions between patients and their health professionals, who might otherwise be reluctant or unsure of how to broach this important topic. Although questions about suicidal ideation were answered at the same rate as the other questions in the depression screeners, it is possible that some women did not return their packets at all because they did not want to answer the suicidal ideation questions. The 30 discordant responses to the suicidal ideation questions may be because of the difference in the questions: the EPDS asks about harming oneself (an active question) and the PHQ-9 asks about being better off dead (a passive approach to suicidal ideation).
Certain limitations of this study should be recognized. Although the study population may not have been representative of the full spectrum of postpartum women, a substantial proportion reported incomes of <$30,000 a year, the group believed to have the greatest group of stressors. Further, forms could be completed in English or Spanish and women who did not return the packet within 21 days received a call offering help with completion of the forms. This provided those uncomfortable with form completion because of literacy or cultural concerns the opportunity to participate. Finally, identification of depression risk was by means of PHQ-9 and EPDS only. No formal diagnostic interviews for depression were required. The addition of a formal interview, such as the Structured Clinical Interview for DSM Disorders, is impractical in this type of translational study. To make PHQ-9 scoring comparable to EPDS scoring for this analysis we used PHQ-9 scores of ≥5 as well as ≥10 without regard to other Diagnostic and Screening Manual IV49 criteria to trigger clinician assessment of depression. This is in keeping with the focus of this translational study, completed in community practices. Both the EPDS and the PHQ-9 have been validated against formal psychiatric interviews in several previous studies.13–18,25,26,28,39,41,42,46
Conclusion
These preliminary data from a large randomized control trial of PPD screening and follow-up show that the PHQ-9 and the EPDS have good concordance in identifying those women not at increased risk of PPD. The large number of women (25%) who may require further follow-up because of PHQ-9 scores in the range of 5 to 9 seems excessive but would be reduced substantially if, to be considered positive, the PHQ-9 results were required to include anhedonia or feeling sad more than half the days during the past 2 weeks. Whether the cutoff for the PHQ-9 is 5 or 10 when used for screening and whether the reevaluation of the women with scores in the range of 5 to 9 is a benefit or an additional cost without a benefit requires further evaluation.
Acknowledgments
We thank Dawn Littlefield for her help in preparing this manuscript and the 29 practices and 500 women who participated.
Notes
This article was externally peer reviewed.
Funding: Funding provided by Agency for Healthcare Research and Quality grant R01 HS014744-01.
Conflict of interest: none declared.
- Received for publication July 23, 2008.
- Revision received November 12, 2008.
- Accepted for publication November 21, 2008.