Validation of the Patient Health Questionnaire–9 and Patient Health Questionnaire–2 in the General Korean Population
Article information
Abstract
Objective
The Patient Health Questionnaire–9 (PHQ-9) and PHQ-2 have not been validated in the general Korean population. This study aimed to validate and identify the optimal cutoff scores of the PHQ-9 and PHQ-2 in screening for major depression in the general Korean population.
Methods
We used data from 6,022 participants of the Korean Epidemiological Catchment Area Study for Psychiatric Disorders in 2011. Major depression was diagnosed according to the Korean Composite International Diagnostic Interview. Validity, reliability, and receiver operating characteristic curve analyses were performed using the results of the PHQ-9 and Euro Quality of life-5 dimension (EQ-5d).
Results
Of the 6,022 participants, 150 were diagnosed with major depression (2.5%). Both PHQ-9 and PHQ-2 demonstrated relatively high reliability and their scores were highly correlated with the “anxiety/depression” score of the EQ-5d. The optimal cutoff score of the PHQ-9 was 5, with a sensitivity of 89.9%, specificity of 84.1%, positive predictive value (PPV) of 12.6%, negative predictive value (NPV) of 99.7%, positive likelihood ratio (LR+) of 5.6, and negative likelihood ratio (LR-) of 0.12. The optimal cutoff score of the PHQ-2 was 2, with a sensitivity of 85.3%, specificity of 83.2%, PPV of 11.6%, NPV of 99.5%, LR+ of 5.1, and LR- of 0.18.
Conclusion
The PHQ-9 and PHQ-2 are valid tools for screening major depression in the general Korean population, with suggested cutoff values of 5 and 2 points, respectively.
INTRODUCTION
Major depressive disorder (MDD) is a serious, yet common mental disorder that lowers patient quality of life and increases suicide rate [1]. In addition, MDD imposes an economic burden on patients and their families owing to poor patient functioning and medical expenses [2]. Therefore, accurate assessment and treatment of MDD is one of the most important tasks in public mental health.
The prevalence of MDD in Korea is 6.7% [3], which is lower than that in the United States (10.4%) [4] and Europe (11.32%) [5]; however, Korea’s suicide rate is 24.6 per 100,000 people [6], which was the highest in the Organisation for Economic Co-operation and Development from 2003 to 2019. There is a gap between the prevalence of MDD and the suicide rate in Korea, considering that psychological autopsy studies of suicide victims reported comorbid MDD or other mood disorders in approximately 60% of cases [7,8]. It has also been reported that early screening and management of depression in the community lowers the risk of suicide [9]. Therefore, it is important to develop a concise and valid tool for screening depression in the general Korean population.
The Patient Health Questionnaire–9 (PHQ-9), developed by Spitzer et al. [10] in 1999, is a self-report instrument designed to detect depression in primary care. The PHQ-9 has several advantages, which include being brief, multipurpose, free, and easy to score [11]. The PHQ-2, which consists of the first two items of the PHQ-9, was developed in response to the need for briefer measures [12]. Currently, the PHQ-9 and PHQ-2 are used worldwide, and validation studies and cutoff scores have been set for each country [13,14]. In Korea, validation of the PHQ-9 or PHQ-2 has been carried out only for specific populations, such as the elderly [15], patients with migraine [16], and psychiatric patients [17]. For the general public population, one study has been conducted to obtain the normative data of the PHQ-9 using the nationwide cross sectional survey data of Korea from 2014 to 2016 (Korea National Health and Nutrition Examination Survey) [18]; however, criterion validation and cutoff setting were not performed.
The present study therefore aimed to validate and set the cutoff points of PHQ-9 and PHQ-2 in major depression screening in the general Korean population. We calculated the sensitivity, specificity, and optimal cutoff of the PHQ-9 and PHQ-2 through receiver operating characteristic (ROC) analysis with a Korean national representative survey of mental health.
METHODS
Participants and procedures
The Korean Epidemiological Catchment Area Study for Psychiatric Disorders (KECA)-2011 [19], a nationally representative survey of mental health in the general population aged ≥18 years, was conducted between March and December of 2011. Multistage stratified sampling was conducted across 12 catchment areas, where each included sample was independent of the others. Based on 2010 Population and Housing Census data [20], 14,204 households were selected. The households selected for the survey were visited in advance; those suitable for the survey were identified and interviewees were selected. As a result of the previsit, we excluded households that were unsuitable; for example, when there was no actual household, redevelopment, no resident, no resident who met the criteria for the investigation, or when it was impossible to confirm who could be the target of the investigation. In total, 7,650 adults were contacted, and 6,022 completed the interview (response rate: 78.7%) [3,19].
All participants were informed of the methods and purpose of the survey and provided written informed consent prior to participation. This study was approved by the Institutional Review Board of the Seoul National University Hospital (IRB No. C-1104-092-359).
Measures
In KECA-2011, structured interviews were conducted using the Korean version of the Composite International Diagnostic Interview 2.1 (K-CIDI 2.1) and various auxiliary tools, including the PHQ-9 and Euro Quality of life-5 dimension (EQ-5D). Prior to the interviews, 78 interviewers completed a 5-day training session based on the standard protocol and training materials developed by the World Health Organization (WHO).
K-CIDI
The CIDI is a fully structured interview used to identify mental illnesses [21]. In the CIDI, an algorithm automatically makes a diagnosis based on the answers to each question. The K-CIDI was translated and validated by Cho et al. [22] according to the WHO guidelines [23]. The K-CIDI can identify both the International Classification of Diseases 10th revision and Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) diagnoses. In this study, only DSM-IV MDD was diagnosed using Chapter E of the K-CIDI. Lifetime MDD (1 year, 1 month, and 2 weeks) can be diagnosed using the K-CIDI algorithm. For comparison with the PHQ-9, the 2-week K-CIDI MDD results were used.
PHQ-9 and PHQ-2
The PHQ-9 is a self-reported scale designed to identify depression in a primary care setting [24]. The PHQ-9 consists of nine symptoms that are the diagnostic criteria for MDD in the DSM-IV [25]. Each item is measured on a scale of 0 to 3 points (0=never, 1=a few days, 2=less than half of the day, and 3=most of the day) for the past two weeks, including the day the questionnaire was completed. The total score ranged from 0 to 27, and the higher the score, the higher the severity of depressive symptoms. Kroenke et al. [24] developed the PHQ-9 and conducted a validation study targeting primary care and obstetric gynecology clinics. As a result, when the cutoff score of PHQ-9 was ≥10, major depression was detected with a sensitivity and specificity of 88%.
Kroenke et al. [12] also adapted the PHQ-9 to identify depression with only two items for use in busy clinical settings, naming this two-item questionnaire the PHQ-2. The PHQ-2 assesses depressive mood and loss of interest/pleasure, which are essential items for the diagnosis of DSM-IV MDD. As in the PHQ-9, each item of the PHQ-2 is scored from 0 to 3, and the total score ranges from 0 to 6. They confirmed that when the PHQ-2 score ≥3, major depression could be detected with a sensitivity of 83% and a specificity of 92% [12]. We used the Korean version of the PHQ-9 translated by Park et al. [26] in 2010, who demonstrated the reliability and validity of the Korean version of the PHQ-9 in 86 psychiatric outpatients at a university hospital.
EQ-5D
The EQ-5D was developed by the Euro Quality of Life Group as a tool to measure health-related quality of life [27]. It consists of five areas that assess health status: mobility, self-care, usual activity, pain/discomfort, and anxiety/depression. Each item is rated on three levels: 1, 2, and 3 represent no problems, some problems, and extreme problems, respectively, and the EQ-5D index can be obtained using the formula [28]. A visual analog scale (EQ-VAS) is also included in the EQ-5D. Current health is displayed on a vertical line drawn on a scale from 0 (worst imaginable health) to 100 (best imaginable health) on the EQ-VAS. The Korean version of the EQ-5D has been developed and its reliability and validity have been demonstrated in several clinical populations [29].
Statistical analysis
Sociodemographic characteristics were checked descriptively by dividing them into the total, those diagnosed with major depression, and those undiagnosed with major depression. Cronbach’s α coefficient was computed to ascertain internal consistency and was recalculated after each item of the PHQ-9 was deleted individually. Pearson’s correlation coefficients were used to examine the relationships between the PHQ-9 and the EQ-5D for construct validity of PHQ-9. The sensitivity, specificity, positive predictive value (PPVs), negative predictive value (NPVs), positive likelihood ratio (LR+), and negative likelihood ratio (LR-) of the PHQ-9 and PHQ-2 were calculated using ROC analysis. Using ROC analysis, the optimal cutoff score was calculated by comparing the PHQ-9 and PHQ-2 with the 2-week major depression diagnosis of the K-CIDI. All statistical analyses were performed using Statistical Package for the Social Sciences (IBM SPSS version 26.0; IBM Corp., Armonk, NY, USA) for Windows. A p<0.05 was considered statistically significant.
RESULTS
Sociodemographic characteristics and scores of PHQ-9 and PHQ-2
Sociodemographic and clinical characteristics are shown in Table 1, divided according to major depression as diagnosed using the K-CIDI. Among the participants included in the analysis, 2,308 (38.3%) were male and 3,714 (61.7%) were female. The mean patient age was 47.88 years. The number of participants diagnosed with major depression using the K-CIDI was 150 (2.5%) and the number of undiagnosed participants was 5,867. The mean scores of PHQ-9 and PHQ-2 in the major depression group were 12.93 (±8.05) and 3.41 (±2.01), respectively, and in the undiagnosed group, 2.08 (±3.25) and 0.56 (±1.01), respectively.
Reliability and item analysis
The Cronbach’s α coefficient of PHQ-9 and PHQ-2 were 0.882 and 0.807, respectively. Table 2 shows that all items of the PHQ-9 were significantly and positively associated with the total PHQ-9 score. Cronbach’s α coefficient did not decrease, even when each item was deleted individually from the PHQ-9.
Validity analysis
To confirm the construct validity of the PHQ-9 and PHQ-2, a correlation test with the EQ-5D was performed. The results are summarized in Table 3. All five items of the EQ-5D descriptive system, EQ-5D index, and EQ-VAS were significantly correlated with the PHQ-9 and PHQ-2. The EQ-5D anxiety/depression category had the highest correlation with the PHQ-9 and PHQ-2 among the items of the EQ-5D, with correlation coefficients of 0.471 (p<0.001) and 0.451 (p<0.001), respectively. The correlation coefficients of the EQ-5D index with the PHQ-9 and PHQ-2 were 0.311 (p<0.001) and 0.288 (p<0.001), respectively, and the correlation coefficients of the EQ-VAS with the PHQ-9 and PHQ-2 were -0.317 (p<0.001) and -0.272 (p<0.001), respectively.
ROC analysis
The results of the PHQ-9 and PHQ-2 ROC analysis are shown in Table 4, and the ROC curves of the PHQ-9 and PHQ-2 are shown in Figure 1. The area under the curve (AUC) of PHQ-9 was 0.909 (95% confidence interval [CI]=0.879–0.939; standard error [SE]=0.015; p<0.001). A cutoff score of 5 indicated the highest Youden’s index. At this cutoff score, the sensitivity and specificity of the PHQ-9 were 89.9% and 84.1%, respectively, with a PPV of 12.6%, NPV of 99.7%, LR+ of 5.6, and LR- of 0.12. The AUC of PHQ-2 was 0.895 (95% CI=0.864–0.925; SE=0.015; p<0.001). A cutoff score of 2 indicated the highest Youden’s index. At this cutoff score, the sensitivity and specificity of the PHQ-2 were 85.3% and 83.2%, respectively, with a PPV of 11.6%, NPV of 99.5%, LR+ of 5.1, and LR- of 0.18.
DISCUSSION
We analyzed KECA-2011 data collected from participants selected as representative of Korean society through a multi-stage and stratified cluster sampling method to confirm the validity and cutoff scores of the PHQ-9 and PHQ-2 in screening for MDD in the general population. We examined the reliability of the PHQ-9 and PHQ-2, conducted construct validity, calculated sensitivity and specificity, and obtained the AUC and optimal cutoff scores through ROC analysis.
The Korean versions of the PHQ-9 and PHQ-2 showed high reliability and validity. Cronbach’s α coefficient of the PHQ-9 and PHQ-2 were 0.882 and 0.807, respectively, indicating relatively high internal consistency, according to Nunnally’s guideline which suggests α ≥0.70 is an acceptable value for internal consistency [30]. These results are consistent with those of a previous study focusing on the general population, although few such studies have been conducted. The Cronbach’s α of the PHQ-9 for the aforementioned study that confirmed the normative data of the PHQ-9 for the general population was 0.79 in Korea [18], 0.82 in Hong Kong [31], and 0.87 in Germany [32]. The Cronbach’s α of the PHQ-2 for the general population in Hong Kong was 0.76 [33]. In addition, to confirm the construct validity of the PHQ-9 and PHQ-2, we checked their correlations with the EQ-5D, and the correlation coefficient between the EQ-5D index and the PHQ-9 and PHQ-2 were confirmed to be 0.311 and 0.288, respectively. The highest correlation was with the anxiety/depression category of the EQ-5D (PHQ-9: r=0.471, p<0.001; PHQ-2: r=0.451, p<0.001), which is similar to the results of a previous study that confirmed construct validity through the correlation between the PHQ-9 and EQ-5D [18].
In our study, the appropriate cutoff scores for the PHQ-9 and PHQ-2 in the general population were 5 and 2, respectively. In a previous systematic review, the cutoff scores of the PHQ-9 varied between 5 and 15 points depending on the study group and country, but were generally around 10 points [13]. Our study showed lower cutoff values than other studies. These results are common in East Asian studies that analyzed the cutoff score of the PHQ-9, which usually shows relatively low cutoff scores of 5–7 [34-36]. In East Asian cultures, reporting mental health difficulties is sometimes viewed as acknowledgment of personal weakness [37], and mental disorders are often viewed as stemming from personal or family conflicts [38,39]. Therefore, East Asians may be reluctant to tick off depression-related symptoms on self-report questionnaires because of this cultural stigma [34,36]. This tendency was also noted in the present study, which may have resulted in a low cutoff score. In addition, the low cutoff score might have been influenced by the present study being based on community-based data. The cutoff score for the PHQ-9 obtained from Korean psychiatric outpatient data was 10 points [17]. A previous meta-analysis has shown that the optimal cutoff score for the PHQ-9 varies by population and setting [40]. Furthermore, a study that used the CIDI in a community-based sample, such as ours, suggested a cutoff score of 8 [41], whereas a study that used a structured diagnostic interview in a psychiatric sample suggested a cutoff score of 13 [42]. The PHQ-2 has few items; therefore, there is no wide range of cutoffs across studies. However, a previous review of the PHQ-2 found that 19 out of 21 studies suggested a cutoff score of 3 [14]. Like the PHQ-9, the PHQ-2 appears to be influenced by culture and setting, resulting in lower scores than in other studies. A Taiwanese study suggested a PHQ-2 cutoff score of 2 [43], and a study that used the CIDI also indicated a cutoff score of 2 [44]. The present study’s results highlight that different cutoffs should be applied depending on culture, population, and setting.
The cutoff score of 5 for the PHQ-9 has appropriate sensitivity and specificity (89.9%, 84.1%, respectively); however, the low PPV (12.6%) may be an issue. The PHQ-2 exhibits a similar trend (cutoff score of 2, sensitivity of 85.3%, specificity of 83.2%, and PPV of 11.6%). PPV can express the probability of the correctness of a test result; however, it is known that PPV is highly affected by prevalence [45]. Therefore, the low PPV in this study was likely due to the fact that the prevalence of major depression was 2.5%. Our results showed an LR+ of 5.6 and LR- of 0.12 for the PHQ-9, and an LR+ of 5.1 and LR-of 0.18 for the PHQ-2. In general, when the LR+ is between 5 and 10, it is considered to provide moderate evidence in ruling out a diagnosis. Similarly, when the LR- is between 0.1 and 0.2, it is considered to provide moderate evidence in ruling out a diagnosis [46]. Therefore, the probabilities of PHQ-9 and PHQ-2 in this study were appropriate.
Our study has several limitations. First, major depression was diagnosed using the K-CIDI, not by a medical professional. However, the results showed moderate to good concordance at the individual level between the CIDI and clinician-administered structured interviews [47]. Second, although a large number of subjects were investigated, the number of major depression cases was small, which affected not only the sensitivity and specificity but also the PPV. Third, as this was a cross-sectional study, test–retest reliability was not assessed, and further evaluation is required. However, the PHQ-9 and PHQ-2 were generally consistent [26,33].
In conclusion, the results of this study suggest that the Korean versions of the PHQ-9 and PHQ-2 are valid screening tools for major depression in Korea, with the optimal cutoff scores of PHQ-9 and PHQ-2 for screening major depression in a representative Korean general population using data from the Korean National Epidemiological Study (KECA-11) are ≥5 for the PHQ-9 (sensitivity 89.9%, specificity 84.1%) and ≥2 for PHQ-2 (sensitivity 85.3%, specificity 83.2%).
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Sanghyup Jung, Bong-Jin Hahm. Data curation: Jee Hoon Sohn, Su Jeong Seong, Jin Pyo Hong. Investigation: Bong-Jin Hahm, Chan-Woo Yeom. Methodology: Minah Kim, Sanghyup Jung, Bong-Jin Hahm. Software: Sanghyup Jung. Writing—original draft: Minah Kim. Writing—review & editing: Minah Kim, Sanghyup Jung, Jee Eun Park, Byung-Soo Kim, Sung Man Chang, Bong-Jin Hahm, Chan-Woo Yeom.
Funding Statement
This study was supported by a research grant from the Korean Ministry of Health and Welfare.