Development and Validation of a Screening Scale for Depression in Korea: The Lee and Rhee Depression Scale
Article information
Abstract
Objective
The aim of this study was to develop a culturally sensitive instrument that addressed how individuals express and experience depression to detect this disorder in Koreans. We also assessed the validity, reliability, and diagnostic utility of this scale (Lee and Rhee Depression Scale; LRDS).
Methods
The sample consisted of 3,697 normal adults selected from 12 administrative districts (Do) and 448 Korean patients diagnosed with depression using the Structured Clinical Interview for DSM-IV Axis I disorders (SCID-I). Reliability was calculated using Cronbach's α. Construct validity, discriminant validity, and concurrent validity were also measured. Receiver-operator-characteristic (ROC) analysis was employed to evaluate diagnostic efficiency.
Results
The LRDS was found to be a reliable instrument (Cronbach's α=0.95) consisting of six factors: negative thinking about the future, negative thinking about the self, worry and agitation, depressed mood, somatization, and loss of volition. Comparison of LRDS scores discriminated the group of patients with depression from the normal individuals in the control group. The measure showed good concurrent validity in that scores were significantly and strongly correlated with scores on established scales such as the Beck Depression Inventory (BDI), the Hamilton Depression Rating Scale (HAM-D), and the D scale of the Minnesota Multiphasic Personality Inventory-second edition (MMPI-2). Diagnostic efficiency was 77.7%, and the cut-off scores were 65 for males and 70 for females.
Conclusion
To our knowledge, this is the first study to develop a depression-screening scale on the basis of Korean patients' complaints about the disorder. As a culturally sensitive tool, the LRDS will be useful in clinical and research settings in Korea.
INTRODUCTION
Depressive disorder, one of the most common psychiatric disorders, tends to result in functional impairment. Mortality rates related to suicide are higher in patients with depression than in those with other mental diseases, and depression tends to recur. For these reasons, depression is considered to occupy an important position among mental disorders.1
A recent epidemiological study in South Korea in 2006 found that the prevalence of major depressive disorder was 3.6% higher than the estimate from 2 years earlier.2
According to a cross-national epidemiological study of major depression, the lifetime prevalence of this disorder varies from 1.5% in Taiwan and 2.9% in Korea to 19.0% in Beirut.3 Ustun et al. also showed various prevalence rates in 15 countries, finding the highest rate in South America, the middle rate in the US and Europe, and the lowest rate in East Asia.4 Lower estimated prevalence rates in Asian than in Western countries reflect the influences of ethnicity, culture, and different research methodologies.5 The social stigma attached to depression contributes to the underestimation of the prevalence of depression in Korea. Indeed, the cultural context affects the way individuals express the disorder. Asian people tend to express depression as somatic symptoms, whereas Europeans and North Americans are prone to emphasizing more affective symptoms.6 Japanese women are likely to express emotional complaints by referring to physical problems or worries about childcare rather than by expressing depressed feelings.7 Chinese women frequently mention a "wind inside the head" and a "wind illness" as physical symptoms of depression.8
Using data based on the Korean version of the Center for Epidemiologic Studies Depression scales (CES-D), Kim et al.9 found a factor structure that differed from those reported in studies in Western countries.10 Namely, among Korean patients, somatic symptoms and affective symptoms were combined into one factor, and emotional hardship and interpersonal issues constituted another factor. The unique factor structure of depressive symptoms in Korean individuals suggests the need for different approaches to the diagnosis and treatment of depression in Korea.
According to a study investigating the current use of depression rating scales in a Korean mental health setting, psychiatrists and clinical psychologists use the Beck Depressive Inventory (BDI)11 most frequently.12
The BDI, which is characterized by good reliability and validity, is widely favored in many countries. However, its items have limited ability to accurately screen for the symptoms of depression as expressed by Korean people. For instance, if questions addressing physical dimensions of depression are limited to topics such as sleep disturbances, changes in appetite and weight, and somatic concerns, they cannot detect somatic symptoms such as chest pain, hot flashes, and dizziness that are often reported by Korean patients with depressive disorder. The use of standardized Western tools may be culturally insensitive and increase the risk of overlooking symptoms or signs that are prevalent in non-Western cultures.13,14 Therefore, it is necessary to develop a Korean tool for screening for depressive symptoms.
Many domestic studies have attempted to standardize depression rating scales developed in Western cultures such as the BDI, the CES-D, the Zung Self-Rating Depression Scale (SDS),15 and the Geriatric Depression Scale (GDS).16 However, no attempts to develop a domestic scale for depression have been reported thus far.
This study was designed to develop a culturally sensitive self-report scale, the Lee and Rhee Depression Scale (LRDS), to measure depressive symptoms and screen for depressive disorder in Korea.
METHODS
Development of preliminary items
To develop the questions to be included in the LRDS, we translated existing depression scales into Korean and then performed structured interviews with patients suffering from depression to investigate their symptoms. The detailed procedure is described below.
Collection, analysis, and translation of existing depression scales
We collected depression scales developed in other countries. These included the BDI, CES-D, SDS, GDS, Hamilton Rating Scale for Depression (HAM-D),17 Carroll Depression Scale (CDS),18 Montgomery-Asberg Depression Rating Scale (MADRS),19 Depression Self-Rating Scale (DSRS),20 Depression scale in the MMPI-2,21 Depressive Experiences Questionnaire (DEQ),22 and Beck Hopelessness Scale (BHS).23
We analyzed the content, reliability, validity, and other characteristics of each scale. Several scales were translated into Korean. Translation was done jointly by two psychiatrists and a clinical psychologist, and questions deemed unsuitable for the Korean culture were modified through group discussions to fit the national context. Translated questions were collected to form the item pool.
Collection of information about symptoms of depression among Koreans
The symptoms identified by Korean patients with depression were very important elements in composing the questions that constitute the LRDS. Thus, we examined inpatients and outpatients in the Neuropsychiatric Department who were being treated for depression, focusing on their symptoms and on life events prior to the onset of depression. The patients' symptoms and life events were examined through content analysis, and the results were used to create a candidate item pool. Holsti offers a broad definition of content analysis as "any technique for making inferences by objectively and systematically identifying specified characteristics of messages."24
A survey of 302 patients with depression (85 males and 217 females) at the psychiatric clinic of Anam Hospital of Korea University Medical Center concerning their main symptoms and life events identified 81 primary types of symptoms and 975 total symptoms. The rank order of frequency (n, % of total) of individual symptoms was as follows: insomnia (112, 11.5%), anxiety (89, 9.1%), headache (85, 8.7%), indigestion (49, 5.0%), poor appetite (45, 4.6%), palpitations and rapid pulse (41, 4.2%), chest pain (37, 3.8%), heavy chest (35, 3.7%), agitation (36, 3.7%), nausea and vomiting (35, 3.6%), loss of volition (34, 3.5%), depressed mood (33, 3.7%), stomach ache and abdominal discomfort (32, 3.3%), and inertia (31, 3.2%). Additionally, some patients reported symptoms such as widespread pain, feelings of brachial and crural palsy, vertigo, weakness, and fatigue.
Preliminary item selection
As a result of discussions within the research team of senior psychiatrists and clinical psychologists, we selected 127 preliminary items from 357 candidate items to address the ways in which Koreans experience depression.
Final item selection
Of the 127 preliminary questions, items with low construct validity and reliability were eliminated through factor and reliability analyses. The final scale contained 30 questions, including five questions for each of the six dimensions: negative thinking about the self, negative thinking about the future, worry and agitation, depressed mood, somatization, and loss of volition. We used a 5-point self-report Likert scale ranging from 0 ("not at all") to 4 ("absolutely yes"), yielding a total score from 0 to 120 points; higher scores indicate more severe depression. The procedure for selecting items is depicted in the following diagram:
Sampling of subjects
Normative data
The current study aimed to develop a scale to measure depression in non-clinical populations and to screen for depression in clinical settings. To this end, nationwide data were collected from the non-clinical population of Korean adults through multistage mixed sampling that relied on area sampling, proportional stratified sampling, and quota sampling. By combining these sampling methods, we enhanced the representativeness of the sample and minimized sampling error. A total of 4,000 normal adults aged over 18 were recruited from throughout the country. The questionnaire included items pertaining to history of treatment for psychiatric disorders, age at onset, and duration of illness to exclude those with mental disorders.
Patient data
A total of 448 outpatients visiting the psychiatric clinic of Anam Hospital of Korea University participated in this study. Trained psychiatrists examined all subjects using the Structured Clinical Interview for DSM-IV Axis I disorders (SCID-I).25 The patient group included those with major depressive disorder without psychotic symptoms and dysthymic disorder. We obtained written informed consent from all patients and control subjects, and the review committee of Korea University Medical Center approved this study.
Reliability and validity
Reliability was measured in terms of internal consistency using Cronbach's alpha coefficient. Exploratory factor analysis was performed to evaluate construct validity. Confirmatory factor analysis was performed to identify the six dimensions constituting the full scale.
Concurrent validity was assessed using Pearson correlation coefficients for scores on scales known to detect depressive symptoms: the BDI, the 21-item HAM-D, and the MMPI-2.
The BDI is a widely used self-report scale designed to measure the severity of depressive symptoms and the extent of clinical change over the treatment period. The HAM-D, considered by many to be the "gold standard," is a clinician-rated scale for depression. The 21-item version of the HAM-D used in the present study was modified from the original 17-item version by the addition of four items (diurnal variation, paranoid ideation, obsessive/compulsive symptoms, and depersonalization/derealization). The MMPI-2, an upgraded and restandardized version of the MMPI, consists of 567 true/false self-report items. It contains numerous scales to assess personality constructs within the normal range as well as psychopathological symptomatology. As one of the most commonly used instrument, it has been translated into numerous languages around the world. The Korean version of MMPI-226 was used in this study.
Discriminant validity was evaluated by comparing the LRDS scores in the depressed and control groups via analysis of variance (ANOVA). Finally, receiver-operating-characteristic (ROC) analysis was performed to test the diagnostic utility of the scale for the identification of depression. ROC analysis, a method often used for evaluating the accuracy of diagnostic tests, provides a graph of the true-positive (sensitivity) versus the false-positive (1-specificity) rates.27
Data were analyzed using SPSS 15.0 for Windows and AMOS 7.0.
RESULTS
A final sample of 3,697 control subjects and 448 patients with depression were enrolled in the current study after excluding data with random responses or unanswered questions.
Construct validity and reliability
Exploratory factor analysis was performed to identify factors reflected by each dimension of the scale. First, we calculated the measure of sampling adequacy (MSA) for each dimension. MSA is an index that compares the sum of the squares of the correlation coefficients with the sum of the squares of the partial correlation coefficients. Kaiser28 described a MSA of 0.50 or less as unacceptable, rendering factor analysis is impossible; 0.50-0.59 was described as miserable, 0.60-0.69 as mediocre, 0.70-0.79 as middling, 0.80-0.89 as meritorious, and 0.90 or higher as marvelous. Reliability was measured by a coefficient of internal consistency (Cronbach's α). Table 1 shows the results of factor analysis, reliability analysis, and the MSA for each dimension.
Negative thinking about the future
The MSA for this dimension was 0.85; because this value met Kaiser's criterion for meritorious, the data constituted an adequate correlation matrix for factor analysis. One factor was extracted for this dimension; with an eigenvalue of 2.63, it explained 52.60% of the total variance. The reliability of this dimension was good (Cronbach's α=0.84).
Negative thinking about the self
The MSA for this dimension was 0.82, which met Kaiser's criterion for identification as meritorious. One factor was extracted for this dimension; its eigenvalue was 2.12, and it explained 42.40% of the total variance. The reliability of this dimension was satisfactory (Cronbach's α=0.78).
Worry and agitation
The MSA for this dimension was 0.83, which met Kaiser's criterion for meritorious. One factor was extracted for this dimension; its eigenvalue was 2.83, and it explained 56.60% of the total variance. The reliability of this dimension was good (Cronbach's α=0.81).
Depressed mood
The MSA for this dimension was 0.84, which qualifies as meritorious according to Kaiser. One factor was extracted for this dimension; its eigenvalue was 2.32, and it explained 46.40% of the total variance. The reliability of this dimension was good (Cronbach's α=0.81).
Somatization
The MSA for this dimension was 0.83, which met Kaiser's criterion for meritorious. One factor was extracted for this dimension; its eigenvalue was 2.47, and it explained 49.40% of the total variance. The reliability of this dimension was good (Cronbach's α=0.83).
Loss of volition
The MSA of this dimension was 0.77, a middling level for factor analysis. One factor was extracted for this dimension; its eigenvalue was 1.88, and it explained 37.60% of the total variance. The reliability of this dimension was adequate (Cronbach's α=0.74).
Verification of the six-factor model
We performed confirmatory factor analysis to verify that the LRDS was composed of six dimensions, as theoretically proposed. This hypothesis was tested by maximum-likelihood estimation using covariate matrix data. The results showed that the overall fit was good, with the following values for the representative agreement indices: χ2=3381.59, RMSEA=0.06, RMR=0.04, standardized RMR=0.05, NFI=0.92, NNFI=0.91, CFI=0.92, and GFI=0.92. These agreement indices supported the six-factor model.
Discriminant validity
The means and standard deviations of scores on the LRDS are presented in Table 2. The means±standard deviations of those in the depressed group (n=448) were 63.99±22.87, and those for individuals in the control group (n=3,697) were 42.30±16.97. The ANOVA showed that the scores of women were significantly higher than were those of men on the total scale and on the dimensions of worry and agitation, depressed mood, somatization, and loss of volition (F=11.8, p<0.001). Hence, the comparison between patient and control groups was performed using an ANOVA with sex as a covariate. The results of the multivariate analysis of covariance (MANCOVA) are presented in Table 3. Scores on each factor were significantly higher in patients with depression than in the control group.
Concurrent validity
It is recommended that validation of a new scale includes evidence of the relationship between the new scale and other measures that have already been extensively validated. In this study, all 448 subjects with depression completed the BDI and HAM-D. Of these 448 subjects, 97 also completed the MMPI-2. Table 4 presents the result of the correlation analysis using the aforementioned data.
The Pearson correlation coefficients between the scores on the LRDS and those on the BDI and HAM-D were 0.78 (p<0.001) and 0.59 (p<0.001), respectively. These significant correlations suggest that the LRDS has good convergent validity as a depression-screening instrument.
The MMPI-2 has 10 clinical scales: Hypochondriasis, Depression, Hysteria, Psychopathic Deviate, Masculinity-Femininity, Paranoia, Psychasthenia, Schizophrenia, Hypomania, and Social Introversion. Each scale assesses different personality constructs and psychopathological symptoms. The Depression (D) scale was designed to measure various symptoms associated with depression. The results of the statistical analysis in this study showed that LRDS scores were most strongly correlated with scores on the D scale (Pearson correlation coefficient=0.5, p<0.001), and no correlation between scores on the LRDS and those on the Hypomania (Ma) scale, which measures hypomanic symptoms (Pearson correlation coefficient=0.19, p>0.05), was observed.
Diagnostic utility of the scale
ROC analysis was performed to test the diagnostic utility of this scale and to identify the appropriate cut-off score. Table 5 presents the results of the ROC analysis, and Figure 1 shows the data related to diagnostic efficiency (area under the curve: AUC) for the overall scale and its dimensions. The diagnostic utility of the full scale was significant at 77.7% (p<0.001); the diagnostic utility of each dimension was also significant.
Preparation of national norms
Because scores on the overall scale and its dimensions differed significantly according to age and sex, we constructed norm tables by age and sex. For each norm score, we used the T-score at which the mean was 50 and the standard deviation was 10. Norm tables for men and women and for age groups are presented in the Appendix.
DISCUSSION
All currently used tools for screening for depression in Korea originated in Western countries. Despite the good validity and reliability of those scales, they are limited in their ability to detect depressive symptoms in Korean patients given the many ethnic and cultural differences that affect mental disorders. Hence, the limitations of currently used Western tools may increase the risk of missing symptoms or signs in non-Western populations. This study attempted to develop a culturally sensitive measure to rate depression and to assess the validity and reliability of this measure.
This LRDS addressed six dimensions and posed five questions for each dimension, yielding a total of 30 questions. The results of the exploratory factor analysis showed that each dimension represented a single factor, and each dimension and the full scale reached satisfactory levels of reliability. The confirmatory factor analysis demonstrated a six-factor model consisting of negative thinking about the future, negative thinking about the self, worry and agitation, depressed mood, somatization, and loss of volition. We used the following overall measures of fit: 1) The χ2, which tests the degree of agreement between a theoretical model and observed data but is sensitive to sample size.29-30 Thus, some researchers suggest that the χ2 may be reported but should not be used as a main index.31 2) The root mean square error of approximation (RMSEA), an index for estimating the whole population using sampled data. A RMSEA value of 0.05 or less indicates good fit or high agreement between the whole population and the sample, 0.05-0.08 indicates moderate agreement, and 0.10 or higher indicates low agreement.32,33 3) The comparative fit index (CFI), which ranges between 0 and 1. A value of 0.90 or higher indicates high agreement between data and hypothesis.30,31,34 4) The root mean square residual (RMR) is the square root of the average squared amount by which the sample variances and covariances differ from their estimates. It indicates the mean correlation between the observed and hypothesized data that are not explained by the model. A low RMR indicates a high level of agreement between the hypothesis and the observed data. The measures of overall fit calculated in this study were all adequate (χ2=3381.59, RMSEA=0.06, RMR=0.04, standardized RMR=0.05, NFI=0.92, NNFI=0.91, CFI=0.92, and GFI=0.92). Thus, the RMSEA was less than 0.10, the standardized RMR was close to 0.05, and the CFI and GFI were over 0.90. These results suggest that the data reflected the six-factor model of the LRDS.
These data also indicated good concurrent validity in that they revealed significant and strong correlations with previously validated scales such as the BDI, HAM-D, and MMPI-D. According to a survey of mental health professionals about the current use of depression rating scales in Korean mental health settings, psychiatrists usually prefer the BDI and HAM-D, in that order, and clinical psychologists usually prefer the BDI and MMPI. This survey suggests that currently used depression rating scales are not various at clinical practice. Reasons for not using particular scales included doubts about reliability and validity, lack of familiarity with the scale, and ability to diagnose and assess via psychiatric interviews without the assistance of scales.12 Given the current state of clinical practice in Korea, the development of a valid and reliable standardized depression rating scale for the Korean population should not be postponed. In this context, the LRDS, which is correlated with depression scales that have been validated on an international level, can be a helpful in screening for depression.
The LRDS is distinguished from existing scales such as the BDI with respect to items addressing somatic issues. Whereas the BDI contains items on sexual activity, work, fatigue, and health concerns, the LRDS contains the following items regarding somatic symptoms: 5) "My head aches and is heavy;" 11) "I feel pressure in my chest;" 17) "I have cold sweats and chills;" 23) "I have a fever in my whole body;" and 29) "My mouth is dry, and I have a bitter taste." These somatic symptoms are observed frequently among depressed people in Korea. Social stigma, such as the belief that "depression is a sign of a weak mind," leads individuals to express distress in somatic rather than emotional terms. Thus, many patients with depression are referred to psychiatry departments via departments of neurology and internal medicine. Therefore, the use of the LRDS to screen for depression will be useful not only in psychiatric but also in other departments of hospitals. Compared with the BDI, the LRDS can avoid response bias through positively worded items such as "My future will be happier than my life is now" and "I still feel that life is worth living." However, given that the LRDS contains no items pertaining to changes in appetite or sleep disturbances, which constitute common symptoms of depression, the use of both the BDI and the LRDS will increase true positives for depression.
In actual clinical practice, the accuracy with which depression is detected can be enhanced by using this scale with other depression scales. Lewinsohn and Teri35 reported that the hit rate was higher when BDI was used with their depression scale. Lee and Song36 also reported that the simultaneous use of both BDI and another depression scale decreased false positives compared with the use of BDI alone. Completing the 30 items of the LRDS in addition to the BDI-21 may be burdensome for patients in clinical settings, suggesting that future studies should attempt to decrease the number of items and assess the validity and reliability of the abbreviated instrument.
According to the ROC analysis, the diagnostic utility of this scale was 77.7%, indicating that it can be used to diagnose depression and to identify subjects experiencing depression for purposes of research. Use of this scale for research purposes will require specification of an objective cut-off score for classifying individuals into the depression group. It would be possible to find the relative position of each individual's depression score in a norm table prepared by calculating standard scores based on the means and standard deviations of groups by condition and then converting the scores into T-scores in which the mean is 50 and the standard deviation is 10.
In general, a T-score of 70 or higher is suggested as the threshold for depression, but we propose that scores +1.5 SD higher than the average T-scores be considered as indicative of depression. Indeed, use of this cut-off point in the present study would have placed 6.7% of the subjects into the depressed group. That is, if an individual's score was 65 or higher, he or she would be placed in the depression group; lower scores would place the individual in the non-depressed group. However, the cut-off score is not an absolute standard. The tester can use the norm table to choose an appropriate standard according to the purpose of the test. Indeed, the cut-off score can be set depending on the relative importance of false positives and false negatives. For example, if misclassifying non-depressed people as depressed (false positive) resulted in a serious situation, the possibility of false positives can be minimized by setting the cut-off score higher, and vice versa.
Individuals are classified on the LRDS based on the T-score table according to sex and age. According to the table presented in the Appendix, the cut-off score is 65 for males and 70 for females. It should be remembered that an individual with a high score should not necessarily be diagnosed with depression.
This study has the following limitations. First, this study attempted discriminant validation of the scale by comparing the total scores of a group of patients with depression with those of a control group. However, comparisons with other psychiatric disorders, such as an anxiety disorder, are needed to strengthen the discriminant validity. Second, we tested only the six-dimension model that we assumed would represent a good fit for the scale. Despite these limitations, the current study has important implications for future research. Although the internal consistency of this scale was satisfactory, its test-retest reliability should be studied to enhance its reliability. Furthermore, future investigations should examine other possible models including three-, four-, and five-factor models.
The current study is significant in that it is the first to develop a culturally sensitive scale for screening depression in Korea. The LRDS is not only valid and reliable, it also offers cut-off scores on the basis of age and sex; it is a useful and easy tool to administer in clinical and research settings.
Acknowledgments
This study was supported by a grant of the Korean Health 21 R & D Project, Ministry of Health and Welfare, Republic of Korea (A050047).
This study was supported by a grant of the Korea Health 21 R & D Project. Ministry of Health & Welfare, Republic of Korea (03-PJ10-PG13-GD01-0002).