INTRODUCTION
Autism Spectrum Conditions are a group of neurodevelopmental conditions characterized by persistent difficulties in social interaction and communication, and the presence of repetitive and restricted patterns of interests, behavior, and activities [
1]. Early identification of autism spectrum disorder (ASD) is important because it allows early intervention, etiologic investigation, and counseling regarding the risk for recurrence [
2]. Screening refers to the use of standardized tools at specific intervals to support and refine risk assessment [
3]. A screening procedure during regular well-baby checkups has been recommended [
4], with the aim of detecting the warning signs of Autism Spectrum Conditions. The process should involve early screening of warning signs and subsequent diagnosis based on clinical judgment, in combination with the application of reliable and standardized gold standard measures [
5]. Screening measures suitable for use in young children (i.e., <24 months of age) are available and can be classified as either Level 1 or Level 2 instruments [
6].
The Checklist for Autism in Toddlers (CHAT) is the most widely used autism screening tool [
7], followed by the Modified Checklist for Autism in Toddlers (M-CHAT) [
8] and the Modified Checklist for Autism in Toddlers, Revised with Follow-up (M-CHAT-RF) [
9]. All these tools conceptualize ASD-specific traits as categorical variables using a binary scoring system with dichotomous “yes/no” items. This categorical approach has the potential to identify the most obvious signs of ASD among children, and does not identify less evident cases [
10]. The Quantitative Checklist for Autism in Toddlers (Q-CHAT), developed by the Autism Research Centre (ARC) of Cambridge University, has been devised to resolve the issues concerning dichotomous items.
The CHAT was originally developed based on behaviors typically exhibited by children around 18 months of age. Its primary purpose is to help physicians identify possible cases of autism, pervasive developmental disorder, and related disorders in children at 18 months [
7]. The American Academy of Pediatrics autism expert panel recommends administering the CHAT during preventive care visits for all children at both the 18- and 24-month check-ups [
2].
Park et al. [
11] conducted a preliminary Korean Q-CHAT validation study with 24 toddlers and preschooler children in a clinical group (2 to 5 years old; mean age, 47.71±13.14 years) and 80 toddlers and children in an unselected group (2 to 4 years old, 46.18±8.57 years). They reported test-retest reliability, internal reliability, and optimal cutoff using a receiver operating characteristic (ROC) curve. In the Korean sample, the optimal cut-off was 33.5, with a sensitivity of 0.75 and specificity of 0.73 [
11]. In the selected clinical group, the total Q-CHAT score was 39.1, which was lower than the score of 51.8 reported from the initial study conducted by Allison et al. [
12] The Korean version of the Q-CHAT has a lower Cronbach’s alpha (0.83) [
11]. The total Q-CHAT score tended to decrease with increasing age [
11].
This study aimed to further evaluate the clinical utility of the Korean version of the Q-CHAT in hospital settings by examining the distribution of Q-CHAT scores in both a community sample of toddlers and preschool children and a clinical sample of toddlers and preschool children presenting with developmental delays with a high probability of ASD. Similar to the initial study conducted by Allison et al. [
12]. we analyzed toddlers and preschool children aged 12-54 months. Finally, we investigated which Q-CHAT items were more frequently endorsed in the hospital sample, aiming to identify items with greater sensitivity for detecting ASD traits across age groups.
RESULTS
Q-CHAT total score distribution for the hospital and community groups across age group
The mean Q-CHAT total score in the hospital sample (mean=42.05, SD=13.62) was significantly higher than in the community sample (mean=29.88, SD=7.83; p<0.001). The Kolmogorov-Smirnov test for both the hospital and community groups revealed that the scores were normally distributed (hospital group: K-S[62]=0.097, p>0.01; community group: K-S[144]=0.061, p>0.01). The total score distribution of the Q-CHAT is shown in
Supplementary Figure 1. A large percentage of the participants reported Q-CHAT scores ranging from 21 to 40. Overall, the hospital group obtained higher total Q-CHAT scores.
The analysis reported an area under the curve (AUC) of 0.794, which indicated fair diagnostic testing validity. The sensitivity and specificity associated with different cutoff scores for children with ASD-specific traits were calculated. Youden’s index was used to determine the optimal cutoff score for the Korean version of the Q-CHAT. Based on the ROC analysis, the Q-CHAT total score, which better differentiated between children with ASD-specific traits and typically developing children, maximized sensitivity while maintaining an adequated specificity of 32.5 (sensitivity=0.811, specificity=0.687).
Accuracy of the Q-CHAT for predicting ASD and Q-CHAT cut-off under 54 months of age
The Q-CHAT total score, which better differentiated between children with ASD-specific traits and typically developing children, maximized sensitivity while maintaining an adequated specificity was 32.5 (sensitivity=0.811, specificity=0.687) for children aged 12-54 months. The AUC for the Korean version of the Q-CHAT total score for the hospital and community groups under 54 months is shown in
Figure 1.
Q-CHAT total scores and SCQ, CARS, and CBCL among participants under 54 months of age
Descriptive statistics of age, total Q-CHAT scores, and mean scores for other ASD screening instruments (SCQ, CARS, and CBCL) among participants under 54 months of age are shown in
Table 2. Since the SES (parent education level, occupation, and perceived economic status) of the samples obtained from the CBCL was not significantly different, an independent t-test was performed to confirm the difference between the hospital and community groups and between sex. The mean total Q-CHAT score for all the participants was 31.62 (SD=11.26). Specifically, the hospital sample (M=42.05, SD=13.62) demonstrated a significantly higher Q-CHAT score than the community sample (M=29.88, SD=7.83; t[49.45]=4.99, p<0.001), assuming equal variances. With respect to the CBCL subscale, the hospital sample reported a higher CBCL-pervasive developmental (PD) problem score (M=66.5, SD=9.25) than the community sample (M=54.03, SD=7.41; t[99]=6.83, p<0.001).
The mean score of 40.96 (SD=13.99) for boys in the hospital group was not significantly higher than the mean score of 43.86 (SD=13.30) for girls, while the mean score for boys in the community group (M=33.17, SD=7.79) was significantly higher than the mean score of 26.06 (SD=6.02) for girls. Considering all the ASD screening instruments, including the CARS, SCQ, and CBCL, no significant differences were observed between the boys and girls in the hospital sample (all non-significant [ns]). In contrast, boys (M=55.63, SD=9.03) in the community sample reported greater CBCL-PD problem than girls (M=52.16, SD=4.36; t[52.06]=2.05, p=0.045). This pattern was similar to that of the CBCL-oppositional defiant disorder (ODD) problem, in which boys (M=56.72, SD=8.81) showed higher CBCL-ODD problems than girls (M=52.74, SD=5.73; t[62.73]=2.15, p=0.030).
Convergent validity of the Q-CHAT with the SCQ, CARS, and CBCL under 54 months of age
The correlations among the total Q-CHAT score, age, and other stable screening instruments are shown in
Table 3. Age was not significantly correlated with the total Q-CHAT score in the hospital sample, whereas a significant negative correlation was observed in the community sample (r=-0.45, p<0.001).
The total scores of the three screening instruments (SCQ-lifetime, SCQ-current, and CARS) were obtained from the hospital sample. The SCQ-lifetime and -current total scores were positively correlated with the Q-CHAT total scores in the hospital sample (r=0.508 and r=0.565, respectively; both p<0.001). However, no significant correlation was observed between the total Q-CHAT and CARS scores (r=0.278, ns).
The total Q-CHAT scores of the hospital sample were significantly positively correlated with only one t-score from the CBCL’s PD subscales (r=0.387, p<0.01), whereas the total Q-CHAT scores of the community sample were significantly correlated with all the subscales of the CBCL: CBCL-affective (r=0.404, p<0.001), CBCL-anxiety problem (r=0.388, p<0.001), CBCL-PD problem (r=0.470, p<0.001), and CBCL-attention-deficit/hyperactivity disorder problem (r=0.337, p<0.001), and CBCL-ODD problem (r=0.568, p<0.001).
Q-CHAT internal consistency and item-total correlations under 54 months of age
The internal consistency of the total Q-CHAT items was good in the overall sample as well as in the hospital sample (Cronbach’s alpha=0.764 and 0.825, respectively). Item removal of lines objects up (item 3) or interest maintained by spinning objects (item 7), resulted in improved internal consistency values of the total items for the hospital sample from Cronbach’s α (0.825 to 0.830 and 0.827, respectively).
Table 4 summarizes the item-total correlations for the hospital and community samples. The Bonferroni correction was used to calculate item-total correlations (p=0.05/25 items=0.002). In the hospital sample, items 5, 6, 8, 9, 10, 12, 15, 17, 19, 23, and 25 were positively correlated with the total Q-CHAT score, and the item-total correlation was good (0.8>r>0.5). Among the correlated items, items 5 (protoimperative pointing), 6 (protodeclarative pointing), 10 (follow the direction of eye gaze), 12 (use hand of others as a tool), 19 (use gestures), and 25 (stare at nothing for long time) were positively correlated with the Q-CHAT total score only for the hospital sample. In the community sample, items 1, 2, 7, 9, 11, 13, 15, 17, 20, 22, and 23 were positively correlated with the total Q-CHAT score with satisfactory correlation (0.5>r>0.2).
Behaviors, including number of words (item 8), pretend play (item 9), offer comfort to others (item 15), typicality of first words (item 17), and twiddle objects repetitively (item 23), were positively correlated with the Q-CHAT total score in both the samples, and all the items were more strongly correlated with the hospital sample. Behaviors, such as lines objects up (item 3), adapt to change in routine (item 14), repeat exactly what they heard (item 18), check your reaction in unfamiliar situations (item 21), and oversensitive to noise (item 24), were not significantly correlated with the Q-CHAT total score in either sample. Item 18 (repeat exactly what they heard) showed an insignificant but negative correlation with the Q-CHAT total score in both samples (r=-0.299 in the hospital sample and r=-0.066 in the community sample, ns).
Q-CHAT item scored above 2 endorsed for the hospital and community groups
We categorized the ages of the toddlers and children into the following four groups: 1) 12-36 months (n=30), 2) 37-48 months (n=52), and 3) 49-54 months (n=22) (
Table 5). The mean ratings of most Q-CHAT items were significantly different between the hospital and community samples of toddlers and children under the age of 54 months, except for items 11, 13, 16, 19, 20, 23, and 24 (ns). Upon comparing the frequency of items scored above 2 in the 12-36 months old (n=30) and 37-48 months old (n=52) groups, the hospital sample more frequently reported the following behaviors: look when call name (item 1), eye contact (item 2), others can comprehend child’s speech (item 4), protoimperative pointing (item 5), protodeclarative pointing (item 6), number of words (item 8), pretend play (item 9), follow the direction of eye gaze (item 10), use of hand of others as a tool (item 12), adapt to change in routine (item 14), offer comfort to others (item 15), typicality of first words (item 17), check your reaction in unfamiliar situations (item 21), stare at nothing for long time (item 25).
However, the ratings of lines up objects (item 3), interest maintained by spinning objects (item 7), sniff/lick unusual object (item11), walk on tiptoes (item 13), repeat same action over and over again (item 16), repeat exactly what they heard (item 18), unusual finger movements (item 20), maintenance of interest in one or two objects (item 22) twiddle objects repetitively (item 23), and oversensitive to noise (item 24) in the community sample were higher or similar to those in the hospital sample.
Q-CHAT item score distribution for the hospital and community groups
The item score distribution of the Q-CHAT for each age group is shown in
Supplementary Figure 2. One-way analysis of variance was conducted to compare the effects of age on the Q-CHAT item scores. Significant mean differences were observed among the four age groups in some Q-CHAT items; specifically, in the following items: look when called by name (item 1) (F[3,193]=4.045, p=0.008), eye contact (item 2) (F[3,193]=4.797, p=0.003), others can comprehend child’s speech (item 4) (F[3,193]=13.706, p<0.001), number of words (item 8) (F[3,193]=14.832, p<0.001), follow the direction of eye gaze (item 10) (F[3,192]=3.036, p=0.03), use of hand of others as a tool (item 12) (F[3,193]=6.575, p<0.001), adapt to change in routine (item 14) (F[3,193]=7.360, p<0.001), offer comfort to others (item 15) (F[3,193]=4.279, p=0.006), repeat same action over and over again (item 16) (F[3.193]=5.879, p=0.001), typicality of first words (item 17) (F[3,193]=5.471, p=0.001), and maintenance of interests in one or two objects (item 22) (F[3,193]=3.247, p=0.023).
The effects of age on the Q-CHAT item scores were analyzed separately for the hospital and community samples. No difference in the Q-CHAT item scores was present among age groups in the hospital samples (
Table 6), whereas the Q-CHAT item scores of others can comprehend child’s speech (item 4), number of words (item 8), use of hand of others as a tool (item 12), adapt to change in routine (item 14), typicality of first words (item 17), repeat same action over and over again (item 16), unusual finger movements (item 20) and oversensitive to noise (item 24), decreased with advance of age in the community samples (
Table 7).
DISCUSSION
This study aimed to further evaluate the clinical utility of the Korean version of the Q-CHAT in hospital settings by examining the distribution of Q-CHAT scores in both a community sample of toddlers and preschool children and a clinical sample of toddlers and preschool children presenting with developmental delays with high probability of ASD. The validity of the Q-CHAT for the risk of ASD was tested through the application of screening instruments, CARS, SCQ, and CBCL, which are commonly used for the evaluation of autism in toddlers. The ROC analysis also demonstrated good screening accuracy. Our result showed that the Korean version of the Q-CHAT had good internal consistency, with a Cronbach’s alpha of 0.764. Internal consistency, as reported by Allison et al. [
12], showed a Cronbach’s alpha of 0.83 in the ASD group (n=160), which is similar to the value found in the present study (0.825, n=68).
Although Q-CHAT was developed to screen for autistic trait in community sample aged 18-24 months, we wanted to test the possibility of Q-CHAT to be used as an initial screening tool for children presenting with developmental delay in hospital settings. In the initial study conducted by Allison et al. [
12], the mean age of the ASD group was 44.5 months (SD=10.2 months; range, 19-63 months). In this study we analyzed toddlers and preschool children aged 12-54 months. Similar to previous studies, we observed a normal distribution of the Q-CHAT scores in both groups [
14-
16], suggesting the unique potential of the Q-CHAT as a dimensional measure of ASD-specific traits along a continuum in the population.
Consistent with the findings reported by Allison et al. [
12], children in the hospital sample (M=42.05) scored significantly higher than those in the typically developing community sample (M=29.88). The mean score of the Korean version of the Q-CHAT was 42.05, which was higher than that in a previous Korean study (39.1), but still lower than the mean score of 51.8 from the previous Allison’s study [
12]. This could be partly due to a bias resulting from the small sample size of the hospital group. Additionally, the relatively higher mean age of participants (37.5 months) may have contributed to the lower total Q-CHAT score, as previous studies have reported that Q-CHAT scores tend to decrease with increasing age [
11,
12].
Significant differences in the total Q-CHAT, CBCL-PD problem, and CBCL-ODD. Problem t-scores were observed between male and female toddlers in the community sample. This implies that they follow sex-specific developmental characteristics. In contrast, no sex-specific differences were observed in the scores of other ASD screening instruments in the hospital sample. Since the ASD-specific pathological process was applicable to these children with ASD, we could not find a sex-normal developmental trajectory in the hospital sample. This result also suggests that we do not need to consider applying different cutoff scores according to the toddler’s sex.
Similar to the results of previous study, the Korean version of the Q-CHAT showed good discriminant validity. The total Q-CHAT scores in the hospital group showed significant positive correlations with the PD problem scores on the CARS, SCQ, and CBCL. In contrast, the total Q-CHAT scores in the community sample showed significant positive correlations with not only the PD problem scores of the CBCL but also with all other subscales (affective, anxiety, attention-deficit hyperactivity, and oppositional defiant behavior). This suggests that the Q-CHAT is an effective screening tool for distinguishing PD problems from other affective and behavioral problems in children with ASD.
Finally, we investigated which Q-CHAT items were more frequently endorsed in the hospital sample, aiming to identify items with greater sensitivity for detecting ASD traits across age groups. When the effects of age on the Q-CHAT item scores were analyzed separately for the hospital and community samples, no difference in the Q-CHAT item scores was present among age groups in the hospital samples, whereas the Q-CHAT item scores of others can comprehend child’s speech (item 4), number of words (item 8), use of hand of others as a tool (item 12), adapt to change in routine (item 14), typicality of first words (item 17), repeat same action over and over again (item 16), unusual finger movements (item 20) and oversensitive to noise (item 24), decreased with advance of age in the community samples.
Upon comparing the frequency of items scored above 2 in the 12-36 months old (n=30) and 37-48 months old (n=52) groups, the hospital sample more frequently reported items on social interaction and reciprocity: look when call name (item 1), eye contact (item 2), others can comprehend child’s speech (item 4), protoimperative pointing (item 5), protodeclarative pointing (item 6), number of words (item 8), pretend play (item 9), follow the direction of eye gaze (item 10), use of hand of others as a tool (item 12), adapt to change in routine (item 14), offer comfort to others (item 15), typicality of first words (item 17), check your reaction in unfamiliar situations (item 21), and stare at nothing for long time (item 25).
However, the ratings of repetition and sensory items in the community sample were higher or similar to those in the hospital sample, suggesting that these items were less specific, for example, lines up objects (item 3), interest maintained by spinning objects (item 7), sniff/lick unusual object (item 11), walk on tiptoes (item 13), repeat same action over and over again (item 16), repeat exactly what they heard (item 18), unusual finger movements (item 20), twiddle objects repetitively (item 23), and oversensitive to noise (item 24).
Using the M-CHAT-RF, Guo et al. [
17] also noted that certain items may better differentiate patients with ASD from those without ASD. According to Guo et al. [
17], the items that are more sensitive to ASD diagnosis were “declarative pointing,” “brings objects to show,” “responses to name,” “gaze following” and “understand what is said,” which were items that assess social interaction and reciprocity. However, “abnormal finger movement” and “hypersensitivity to noise” items, that is, evaluating repetitive/stereotypical behaviors, were not endorsed more frequently by children who were later diagnosed with ASD compared to children who were presumed to not have a diagnosis of ASD [
17]. Other previous studies assessing M-CHAT have demonstrated “declarative pointing,” and “brings objects to show” as items with most sensitivity [
8,
18]. “Response to name” was considered a sensitive item in M-CHAT [
8].
The findings of this study suggest that when screening for ASD, social interaction and reciprocity should be assessed more carefully than repetitive and sensory behaviors. Moreover, items endorsed for the hospital sample did not change among age groups, suggesting that the items have similar sensitivity in hospital group indicating stability across age groups. Narvekar et al. [
19] tested the effect of atypicality of responsiveness on autism trait, which included both under-responsiveness and hyper-responsiveness in infants to external stimuli. They found that in infants with a high risk of autism, hyperreactivity at 14 months was positively associated with fear at 24 months, and hyper-reactivity at 24 months was longitudinally associated with restricted and repetitive behavior and deference in social interaction at 36 months. When the analysis of under-responsiveness was added to the analysis of hyper-responsiveness, the longitudinal association between hyper-responsiveness, restricted and repetitive behavior and social interaction and reciprocity became statistically insignificant, suggesting that various types of reactivity must be considered to understand the developmental path of autism and item specificity.
This study has certain limitations that should be acknowledged. First, the study had an unequal proportion of toddlers and children in the two sample groups, with the hospital group comprising half the sample size of the typically developing (community) group. Second, children in the hospital sample were significantly younger than those in the other groups. Replication with a larger and better age-matched sample of children with ASD and typically developing children is recommended. Third, the hospital group participated in the study prior to receiving an autism diagnosis. However, parents may have been influenced by their recognition of their child’s delays in language, social development, and other developmental concerns, as well as the fact that they had already made appointments with specialists. Finally, although most participants in the hospital sample showed a high probability of ASD, the sample was not divided into groups based on subsequent diagnoses. As a result, it was not possible to compare the Q-CHAT scores with longitudinal diagnostic outcomes.
In conclusion, the Korean version of the Q-CHAT demonstrates good validity and reliability and is effective in discriminating autistic traits even in children older than 24 months. This supports its utility as an initial screening tool for autistic traits in Korean clinical settings. Moreover, the items endorsed for hospital samples varied from those endorsed for community samples, implying item-specific sensitivity for hospital samples.