Reliability and Validity of the Korean Version of the Empathy Quotient Scale
Article information
Abstract
Objective
The Empathy Quotient (EQ) is a self-reported test developed by Baron-Cohen et al. (2004) to measure the cognitive and affective aspects of empathy. The purpose of this study was to develop a Korean version of EQ and to establish its psychometric properties based on a representative Korean sample.
Methods
The Korean version of EQ and its correspondence with another popular measure of empathy, the Korean version of the Interpersonal Reactivity Index (IRI), were evaluated in a sample of 478 volunteers (156 men, 322 women; mean age, 27.2 years). A test-retest study was conducted at 1 month on a selected sample of 20 subjects from the original sample. Correlation and confirmatory factor analyses were conducted.
Results
The test-retest reliability was good, and the internal consistency was acceptable (Cronbach's alpha=0.78). Positive correlations were found between the EQ and three subfactors of the IRI, perspective taking, empathic concern, and fantasy, and the total EQ score was negatively correlated with the personal distress subscale. The confirmatory analyses suggested that the three-factor structure offered a good fit to the data.
Conclusion
These findings support the reliability and validity of the Korean version of the EQ.
Introduction
Empathy is essential to our comprehension of social behavior. It allows us to understand the intentions of others, predict their behavior, and experience emotions triggered by their emotions. Successful social interaction presumably depends at least in part on empathy. Indeed, it is widely accepted that empathic skills support long-term social commitment and are an essential prerequisite for higher social functioning.
The word "empathy" is of comparatively recent origin, having been invented by Titchener1 as a translation of the German word "Einfühlung," which had its roots in German aesthetics. The term "Einfühlung" was first used by Robert Vischer in 1873 in his discussion of the psychology of aesthetic appreciation, which involved a projection of the self into an object of beauty.2 The concept was developed as a formula for psychology by Theodore Lipps,3,4 who conceptualized it in terms of a kind of "inner imitation." Titchener borrowed Lipps's notion of "Einfühlung" and translated it as "empathy," from the Greek "empatheia," which means, literally, "in" (en) "suffering or passion" (pathos). Although various terms have been used to describe empathy, the general consensus is that affect is a central component of empathy, thus, that empathy is the act of "feeling into" another's affective experience.
Empathy is a complex form of psychological inference in which observation, memory, knowledge, and reasoning are combined to yield insight into the thoughts and feelings of others.5 Given the complexity of this construct, numerous other definitions of empathy exist.6 However, broad agreement exists on three primary components: 1) a cognitive capacity to take the perspective of the other person; 2) an affective response to another person that entails sharing that person's emotional state; and 3) some regulatory mechanisms that keep track of the origins of self- and other-feelings.7-15 The foundation of empathy requires an awareness, understanding, or knowledge of another's feelings or emotions. Some refer to this as role taking or perspective taking, while others would use the term "cognitive empathy". On the other hand, the empathy experienced by a person who witnesses the pain or intense distress of another is frequently different from the cognitive aspects of the empathy. The former aspect of empathy, ordinarily designated as affective empathy, can involve in-depth cognitive processing of another's condition or consciousness. Thus, affective empathy may have greater motivational force in our altruistic and prosocial behaviors. Meanwhile, some regulatory mechanisms, also called parallel empathy16 or inhibitory empathy, are frequently reactive, with thoughts and feelings that arise in response to the other's experience, but they help to maintain self-other awareness and distinguish between one's own and another's emotions.
Just as various concepts of empathy exist, diverse methods to measure empathy have also emerged. These include questionnaires, picture-story methods and non-verbal methods (e.g., facial expression, behavioral, and physiological measures). Self-report questionnaires are one of the most commonly used instruments because they are easy to use and can access multiple dimensions more straightforwardly than can other methods. Hogan's empathy scale17 attempted to measure empathy understood in a cognitive sense; however, a factor analysis suggested that the proposed technique actually reflects social self-confidence, even-temperedness, sensitivity, and non-conformity.18 Critics also argue that it simply measures social skills rather than empathy itself.9 Mehrabian and Epstein think of empathy as an exclusively affective phenomenon, and they developed the Questionnaire Measure of Emotional Empathy19 which was designed to assess an individual's tendency to react strongly to the experiences of another person. However, the authors suggest that it may measure emotional arousability in general, rather than response to others' emotions in particular.20 The Interpersonal Reactivity Index (IRI)21 is a questionnaire to measure empathy. The IRI includes subscales that measure perspective-taking, which fits the traditional definitions of cognitive empathy; empathic concern, which specifically addresses the capacity of the respondent for warm, concerned, compassionate feelings for others, a facet of affective empathy; fantasy items, which measure a tendency to identify with fictional characters; and personal distress, which is designed to tap the occurrence of self-oriented responses to others' negative experiences. However, personal distress, although this dimension is important, is not empathy itself and it is unclear whether the fantasy subscale taps pure empathy.
The 60-item Empathy Quotient (EQ)22 is the most recent addition to self-report measures of empathy. Unlike previous questionnaires, it was explicitly designed for clinical applications and was intended to be sensitive to a lack of empathy as a feature of psychopathology. The original, the Japanese,23 and the French versions of the EQ24 have been validated in samples of university students and of the general population, in adults with high-functioning autism or Asperger's disorder, and with depersonalization disorder.25 A further series of studies revealed that the EQ could be successfully reduced to three factors: 1) cognitive empathy, 2) emotional reactivity, and 3) social skills. Moreover, the EQ was found to have high test-retest reliability over a period of 12 months.
Thus, the aim of the present study was to develop a complete Korean version of the EQ and to establish its psychometric properties based on a representative Korean sample. This was intended not only to examine the reliability and validity of Korean version of the EQ, but also to evaluate several different models of the EQ previously proposed using confirmatory factor analysis.
Methods
Participants
Participants in this study included 478 volunteers (156 men and 322 women; mean age, 27.2 years). Some (208, 44%, 91 men and 117 women) were students at Kyungpook National University School of Medicine. The remainder (270, 56%, 65 men and 205 women) were recruited from among graduate students and non-medical staff at Kyungpook National University Hospital. A test-retest study of the Korean version of the EQ was conducted across 1 month on a selected sample of 20 subjects from the original group. Ten subjects with Asperger's disorder, who had been diagnosed by psychiatrists using established criteria,26 were also recruited via the psychiatric out- and in-patient department of Kyungpook National University Hospital. Their mean age and intelligence quotient were 19.2 years [standard deviation (SD)=2.7, range 16-25] and 109 (SD=15.6, range 93-133), respectively.
Measures
Empathy Quotient
The EQ22 was designed to be short, easy to use, and easy to score. The EQ consists of 60 questions divided into 40 questions tapping empathy and 20 filler items. The 20 filler items were included to distract the participant from a relentless focus on empathy. An initial attempt to separate items into purely affective and purely cognitive categories was abandoned because in most instances of empathy, the affective and cognitive components co-occur and cannot be readily disentangled. Each of the items listed scores 1 point if the respondent records the empathic behavior mildly or 2 points if the respondent records the behavior strongly. To avoid a response bias, approximately half the items were worded so that empathy is indicated by a "disagree" response, and half so that it is indicated by an "agree" response. Then, the items were randomized. The EQ has a forced-choice format, can be self-administered, and is straightforward to score because it does not require any interpretation.
Lawrence et al.25 used a principal components analysis to identify key dimensions of the original scale, identifying 28 items that showed reasonable communalities and loaded onto three factors. Factor 1 was identified as cognitive empathy. Factor 2 was identified as emotional reactivity. Factor 3 was identified as social skills.
With the permission of the authors, the EQ was translated into Korean by an experienced psychiatrist and a clinical psychologist. It was then back-translated by a bilingual individual, and modifications were made. The final version was approved by the two original translators.
Interpersonal Reactivity Index
The IRI is a 28-item self-report scale designed to measure both cognitive and emotional components of empathy.21 The subscales of the IRI were derived by factor analysis and consist of perspective taking (IRI-PT), fantasy (IRI-FS), empathic concern (IRI-EC), and personal distress (IRI-PD). Items are presented as statements, and participants are asked to express their own degree of agreement on a 5-point Likert-type scale ranging from 1 ("does not describe me well") to 5 ("describes me well").
Items of the IRI-PT scale address one's tendency to take another's point-of-view, akin to the "theory of mind" (e.g., "When I am upset at someone, I usually try to 'put myself in his shoes' for a while."). IRI-FS scale items address the tendency to identify with fictional characters (e.g., "I really get involved with the feelings of the characters in a novel."). IRI-EC items relate to feelings of empathy toward others (e.g., "When I see someone being taken advantage of, I feel kind of protective towards them."), and IRI-PD addresses the tendency to experience distress in stressful situations (e.g., "In emergency situations, I feel apprehensive and ill at ease.").
The IRI has demonstrated good intrascale and test-retest reliability, and convergent validity is indicated by correlations with other established empathy scales.21,27
Statistical analyses
We used the Kolmogorov-Smirnov test for Goodness of Fit Index (GFI) to assess the normality of the distribution of the EQ scores. We used independent-samples t-tests to estimate any gender effect in the self-report scores. The internal consistency of the EQ scale and subscales was estimated using Cronbach's alpha. Test-retest reliability was assessed using Pearson's correlation coefficients. Correlation analysis between the EQ scales and subscales was also performed using Pearson's correlation coefficients. To test the discriminant validity of the EQ (that is, whether individuals categorized as low empathic have lower scores on the other measures of empathy), we conducted an analysis of variance with empathy as the between-group factor and the EQ, and four IRI subscores as the criterion variables. The Statistical Package for the Social Sciences (SPSS) software (version 13; SPSS Inc, Chicago, IL, USA) was used for calculating these statistics. To test whether our EQ data fitted a three-factor structure,12 we conducted confirmatory factor analysis with the LISREL 8.80 software (Scientific Software International Inc, Lincolnwood, IL, USA). Among the fit indices, the chi-squared tests are evaluated in two ways. First, a non-significant chi-squared suggests that the model does not deviate from the data. Second, if the chi-squared statistic is significant but less than twice the degrees of freedom, the model is thought to be a good representation of the data. However, in general, chi-squared values are very sensitive to sample size and tend to overestimate the badness of a model fit. Thus, fit statistics minimizing the influence of sample size and model complexity, namely the Comparative Fit Index (CFI), and the Root Mean Square Error of Approximation (RMSEA), were determined in addition to the more traditional chisquared and GFI values. Among these fit indices, the CFI seemed to be the best and most valid index because it has a very small sampling variability and a rather negligible downward bias relative to other indices. As a conventional rule, GFI values greater than 0.85, CFI values greater than 0.90, and a RMSEA of 0.08 and lower are considered satisfactory, with CFI values higher than 0.95 indicating an excellent model fit.
Results
Mean total and subfactor EQ scores for men and women are presented in Table 1. Mean EQ scores were similar to (albeit lower than) those reported by Baron-Cohen and Wheelwright.22 The Kolmogorov-Smirnov GFI test for a normal distribution indicated that the distribution of the EQ scores was normal [D (478)=0.039, p>0.05; skewness=0.114; kurtosis=0.152](Figure 1). No significant difference between males and females was found for total EQ or EQ-CE scores (t=-1.24, df=476, p=0.216; t=0.38, df=476, p=0.705, respectively), whereas significant gender differences were found on the EQ-ER and EQ-SS scores (t=-3.15, df=476, p=0.002; t=3.90, df=476, p<0.001, respectively).
Reliability
The internal consistency of the EQ, measured by the Cronbach's alpha coefficient, was 0.78, which is in the acceptable range. For the 20 participants who completed the EQ on two occasions with a 4-week interval, the test-retest reliability, as measured by Pearson's r correlation coefficient, was r=0.84 (p<0.001).
The 40-item scale, excluding the 20 filler items, had a Cronbach's alpha of 0.83. A number of the items, however, had low item-total correlations, suggesting that they are not contributing strongly to the measurement of empathy. Items 37, 39, 57, and 59 had item-total correlations of less than 0.1.
The Cronbach's alpha for the EQ-CE for our data was 0.85, which is acceptable for an 11-item scale. All items correlated with the total score above 0.25, with the lowest at 0.48. The Cronbach's alpha for the EQ-ER was 0.65, which is acceptable for an 11-item scale.
All items except 29, 48, and 59 showed correlations with the total score above 0.25. The Cronbach's alpha for the EQ-SS was 0.55, which is low, but still acceptable, given that it has only six items. Item 57 ("I do not consciously work out the rules of social situations") correlated negatively with the total score. It is worth noting that Lawrence et al.25 kept item 57 despite its low communality, because it loaded onto the third factor.
Validity
Correlations between the total EQ and subscale scores and with the IRI are shown in Table 2. Analysis of the relationship between the EQ total and subscales scores showed that the EQ total score was positively correlated with the EQ-CE (r=0.79, p<0.001), EQ-ER (r=0.82, p<0.001), and EQ-SS scores (r=0.60, p<0.001). Additionally, EQ-CE and EQ-ER were positively correlated (r=0.49, p<0.001), as were EQ-CE and EQ-SS (r=0.37, p<0.001), and EQ-ER and EQ-SS (r=0.33, p<0.001). These associations were as expected; however, the coefficients are not so high as to preclude discriminant validity.
Moderate positive correlations were found between the EQ and three subscales of the IRI (r=0.33, p<0.001 for 'perspective taking'; r=0.25, p<0.001 for 'empathic concern'; r=0.20, p<0.001 for 'fantasy' subscale), whereas the total EQ score was negatively correlated with the 'personal distress' subscale (r=-0.17, p<0.001).
Correlations between the IRI scores and the individual factor scores were also calculated to explore concurrent validity. The EQ-CE score was positively correlated with the IRI-PT, IRI-EC, and IRI-FS, but negatively with the IRI-PD. The EQ-ER score was positively correlated with the all of the IRI scores except the IRI-PD score. The EQ-SS score was positively correlated with the IRI-PT and negatively correlated with the IRI-PD. It did not correlate with the IRI-FS or IRI-EC.
Regarding known-group validity (normal vs. patient), all of the participants with Asperger's disorder (n=10) had an EQ score of ≤30 (20.7±4.4, range, 15-30). As expected, patients with Asperger's disorder scored significantly lower than normal male participants (t=-8.6, df=16.7, p<0.001).
Confirmatory factor analysis
The fit indices for several models are presented in Table 3. The confirmatory analysis for the three factorial structure reported by Lawrence et al.25 showed that most, but not all, GFI statistics were indicative of a satisfactory fit. Indeed, the chi-squared value was significant (χ2347=929.4, p<0.001) and over the desired 2 : 1 chi-squared : df ratio. Nevertheless, the other fit indices achieved their conventional adequacy standards: RMSEA=0.064, 90% confidence interval for RMSEA=(0.059, 0.068), p-value for test of close fit (RMSEA<0.05)<0.01, GFI=0.87, CFI=0.92, NNFI=0.91.
We also tested the short three-factor version of EQ proposed by Muncer and Ling28 The authors extracted five items for each subscale: items 25, 26, 44, 52, and 54 for 'cognitive empathy'; items 4, 8, 12, 14, and 35 for 'emotional reactivity'; and items 6, 27, 32, 50, and 59 for 'social skill.' The fit indices for the model in this study, which were comparable to Muncer's results,28 suggest that this model is a reasonably good fit to the data (chi-squared : df ratio=2.43, RMSEA=0.056, GFI=0.94, CFI=0.93).
Discussion
The present study examined data on the reliability, validity, and factor structure of a Korean version of the EQ. Mean EQ scores (35.4±9.6) in this study were slightly lower than those observed in previous Western studies.22,24,25,28 However, interestingly, they were quite similar to the findings from a few studies carried out in Eastern cultures, such as a preliminary study in a sample of 371 Korean college students29 and another in 1,250 Japanese college students (Table 4).28 Further research is required to clarify any effect of cultural differences on the EQ.
On the other hand, as expected, the present study showed that participants with Asperger's disorder scored significantly lower on the EQ. More importantly, all of the participants with Asperger's disorder had an EQ score of ≤30, which corresponds to the cut-off score found to be the most useful to differentiate adults with autistic spectrum disorders from controls.22,24 This provides some support for the view of Autism Spectrum Disorders (ASD) as an empathy disorder.30 In fact, many of the EQ items represent a 'theory of mind,' which previous studies have found to be impaired in ASD.22,31,32
The Korean version of the EQ was observed to be stable and reliable. High internal consistency was found for the Korean version of the EQ scale. The current study also showed that the Korean version of the EQ scale had good test-retest reliability; the correlation coefficient was 0.84 with a 4-week interval, which is consistent with the reports of Lawrence et al.25 and Berthoz et al.24
The correlations observed between the EQ scores and the IRI scores further demonstrate the concurrent validity of EQ. As in the original work of Lawrence et al.,25 moderate positive correlations were found between the EQ and the IRI-PT and IRI-EC, and weak inverse correlations were observed between the EQ and the IRI-PD. However, unlike Lawrence et al.'s findings,25 the IRI-FS score was associated with the EQ score. Because the positive correlation between the EQ and the IRI-FS was the weakest among IRI subfactors, although statistically significant (r=0.20, p<0.001 in this study; r=0.28, p<0.001 in Berthoz et al.24), the notion that the 'fantasy' concept is not empathy per se should still be considered.
The confirmatory factorial analyses suggested that a three-factor structure with a 28-item scale offered a more satisfactory fit to the data than did the 40-item unifactorial scale, even in the Korean version of the EQ. The Korean version of the EQ also indicated a satisfactory fit to the short three-factor version of EQ proposed by Muncer and Ling28 suggesting the possibility that this version could be used as a short form. However, as mentioned, the derived short version of the EQ requires confirmation with a new sample of participants.
In the present study, items 37, 39, 57, and 59 had significantly low item-total correlations. In fact, items 37, 39, and 57 were previously reported to be poorly correlated items. Item 37 ("When I talk to people, I tend to talk about their experiences rather than my own.") showed a negative correlation in the original work,25 suggesting that this result may be due to chance factors. Furthermore, we suggest that talking about other people may be interpreted as a socially ill-mannered behavior. For item 39 ("I am able to make decisions without being influenced by people's feelings."), the phrase 'being influenced by people's feelings' could be understood as being unprincipled, irresolute, or unstable when one is in a decision-making situation. In the same context, when 'emotionally' was translated into Korean in item 59 ("I tend to get emotionally involved with a friend's problem."), the Korean version may have implied 'unreasonably' or 'irrationally' rather than 'empathically.' This is a typical item to reflect 'emotional reactivity,' the tendency to have an emotional reaction in response to others' mental states. Such self-oriented emotional reactions may not only increase empathic abilities, but may also cause personal distress and prevent other-oriented thinking.33 Thus, subjects in this study may have tended to understand this item in the latter rather than the former way. Lawrence et al.25 mentioned that the lack of control over 'personal distress' was a drawback of the EQ. Item 57 ("I don't consciously work out the rules of social situations.") may also be understood in a manner opposite of that intended by the developer.25,28 That is, it may be interpreted as 'I sometimes break social rules' rather than 'I am flexible or natural in the social situations.' Indeed, because it showed the lowest correlation with the total score of any item, item 57 was omitted in the final five items of EQ-SS factor in the short version of Muncer and Ling.28
As in previous reports22,24,25,28 females' EQ scores were higher than males' in this study, although the difference was not statistically significant. Regarding the three factors, we also found no gender difference in EQ-CE factor,24 and the same result was also found in a corresponding factor of PT in the IRI. Although an initial report indicated female superiority in the EQ-CE,25 the gender difference on the EQ-CE was smaller than that on the EQ-ER. Hoffman34 concluded that consistent differences between males and females probably do exist with respect to their affective responses to others' experiences, whereas no consistent gender difference was revealed concerning role-taking or recognition of affect in others. The EQ-ER factor showed statistically significantly higher scores for females, consistent with previous studies.22,24,25,28 We found higher scores for males on the EQ-SS in our study when using 6 items originally proposed by Lawrence et al.25 However, further analysis using 5 items28 except item 57 revealed that females (5.3±1.7) scored higher than males (4.7±1.8) (t=-3.2, p=0.001). This discrepancy, again, proved that item 57 was interpreted in a different way from original intention. Regarding the sex difference on the EQ-SS, Berthoz et al.24 found higher female scores after adjusting for the gender difference in emotional state and three previous studies reported no gender difference.22,25,28 Our result from the analysis using 5 items goes with the assumption that the higher female EQ will be manifested in a greater sensitivity to social situations and correspondingly better social skills. However, it should be noted that Muncer and Ling28 raised the possibility that males overestimate their social skills on a self-report measure, and they considered this as a caveat of the EQ-SS factor.
The current study has some limitations. First, caution is needed in generalizing our findings because most subjects were university students and non-medical staff at our hospital. Thus, further study is needed in more representative population samples. Second, despite there being ten subjects with Asperger's disorder in the present study, we had a small number of participants with ASD. More participants with ASD are needed to determine the cutoff scores of the EQ in the ASD group. Third, although the current findings provide some evidence for the factorial validity of EQ, more studies are needed to establish its discriminant validity.
In conclusion, despite these weaknesses, the present study confirmed the reliability, validity, and sound psychometric properties of the Korean version of the EQ. Thus, the Korean version of the EQ can be readily administered with clinical populations, and mental health workers can assess multidimensional aspects of empathy.