A Study on the Screening of Children at Risk for Developmental Disabilities Using Facial Landmarks Derived From a Mobile-Based Application

Article information

Psychiatry Investig. 2024;21(5):496-505
Publication date (electronic) : 2024 May 23
doi : https://doi.org/10.30773/pi.2023.0315
1Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
2University of Ulsan College of Medicine, Seoul, Republic of Korea
3Department of Psychiatry, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
Correspondence: Yu Rang Park, PhD Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea Tel: +82-2-2228-2493, E-mail: yurangpark@yuhs.ac
Correspondence: Hyo-Won Kim, MD, PhD Department of Psychiatry, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Songpa-gu, Seoul 05505, Republic of Korea Tel: +82-2-3010-3414, Fax: +82-2-485-8381, E-mail: shingubi@amc.seoul.kr
*These authors contributed equally to this work.
Received 2023 October 7; Revised 2024 January 21; Accepted 2024 February 1.



Early detection and intervention of developmental disabilities (DDs) are critical to improving the long-term outcomes of afflicted children. In this study, our objective was to utilize facial landmark features from mobile application to distinguish between children with DDs and typically developing (TD) children.


The present study recruited 89 children, including 33 diagnosed with DD, and 56 TD children. The aim was to examine the effectiveness of a deep learning classification model using facial video collected from children through mobile-based application. The study participants underwent comprehensive developmental assessments, which included the child completion of the Korean Psychoeducational Profile-Revised and caregiver completing the Korean versions of Vineland Adaptive Behavior Scale, Korean version of the Childhood Autism Rating Scale, Social Responsiveness Scale, and Child Behavior Checklist. We extracted facial landmarks from recorded videos using mobile application and performed DDs classification using long short-term memory with stratified 5-fold cross-validation.


The classification model shows an average accuracy of 0.88 (range: 0.78–1.00), an average precision of 0.91 (range: 0.75–1.00), and an average F1-score of 0.80 (range: 0.60–1.00). Upon interpreting prediction results using SHapley Additive exPlanations (SHAP), we verified that the most crucial variable was the nodding head angle variable, with a median SHAP score of 2.6. All the top 10 contributing variables exhibited significant differences in distribution between children with DD and TD (p<0.05).


The results of this study provide evidence that facial landmarks, utilizing readily available mobile-based video data, can be used to detect DD at an early stage.


Developmental disabilities (DDs) are conditions that manifest impairments in physical, learning, language, or behavioral development, encompass autism spectrum disorder (ASD), language disorder (LD), intellectual disability (ID), learning disorder and cerebral palsy [1]. The prevalence of children with DD has shown a marked increase worldwide, affecting 17.8% of children in the United States from 2015 to 2017 [2]. According to a recent study, the prevalence of DD in South Korea steadily increased by more than 4 times from 2003 to 2017 [3]. DDs significantly and negatively affect the quality of life for both the afflicted individuals and their families as a substantial amount of medical and social support is needed for children with DD [4]. The social costs of DD are significant and expected to grow, considering the increasing prevalence of DD [5,6]. Among DDs, ASD, ID, and LD often manifest as initial language delays at similar ages. These conditions can be challenging to distinguish, and sometimes a diagnosis changes within the course of early development [7].

Early childhood is a critical period for physical, cognitive, and socioemotional development, which forms the building blocks of children’s later development [8]. Early identification of DDs is crucial for children to receive early evidence-based interventions, which have been shown to be highly effective in improving the outcomes of children with DD [9-13]. Although many studies emphasize the importance of early detection and intervention, difficulties in universal screening lead to delays in the age of identification and diagnosis [14].

Currently, DD is usually screened using caregiver questionnaire such as the Quantitative Checklist for Autism in Toddlers (Q-CHAT), Autism Behavior Checklist (ABC), Ages and Stages Questionnaire (ASQ), and the Parents’ Evaluation of Developmental Status (PEDS). But parents’ expectations or concerns can sometimes influence their responses, making it difficult to ensure reliable reporting [15]. ASD, ID, and LD are diagnosed by standardized instruments that evaluate the language, cognition, or social development, including Bayley Scales of Infant Development, Psychoeducational Profile-Revised (PEP-R), Autism Diagnostic Observation Schedule (ADOS) and the Autism Diagnostic Interview-Revised (ADI-R) [9,16,17]. However, these standardized tests usually take much time and need to be conducted by trained professionals [18]. This highlights the need for user-friendly and time-efficient mobile screening tools for children with DD.

As mobile devices are becoming widely distributed, the possibility that mobile-based screening tools could be useful in the early identification of DD was suggested. Mobile-based screening programs are easily accessible and are less time-consuming, do not require trained professionals, and may thus help speed up the identification of DD. Different types of mobile-based screening measures for DD are emerging, including Cognoa ASD Screener, a platform that examines home video by analysts [19], iTracker, an eye-tracking algorithm [20], and ASDTest, an application based on standardized tests such as the Autism Quotient 10 and Q-CHAT [21]. Video-analyzing platforms [19,22] and eye-tracking algorithms [23] are the most frequently studied mobile device-based methods for identifying DDs. In contrast, screening tools based on facial expression have only been recently introduced.

Facial expressions are indispensable in the sharing of emotions and human-to-human interactions. People with DD, especially those with ASD, have difficulties understanding and expressing facial emotions [24-26]. Although much research has been done on screening DD using facial expressions, not much research has been done on screening ASD [27]. Similarly, only few studies have been performed on facial expression in individuals with ID despite there being several studies regarding facial expression in individuals with ID [28,29]. Moreover, only a few studies have been conducted on LD and facial expression. Recently, some studies reported a significant difference in facial expression between children with DD and typically developing (TD) using models that categorize facial expression with facial landmarks [30,31]. Manfredonia and colleagues [31] recorded videos of participants and divided them into image frames. These studies suggest that programs examining the facial expression of children based on face recording video data could be useful for identifying children with DD [30,31]. Thus, we aimed to identify facial expressions and head pose based on facial landmark data extracted from face recording videos and differentiate between the characteristics of children with DD and those with TD.


Mobile-based application

This study aimed to examine the effectiveness of a deep learning classification model using facial video data of children collected from a mobile-based application operating on a tablet PC. The goal was to identify facial expressions and head poses of children, comparing those with DD to TD. To achieve this, we integrated four animations and four paintings into the mobile application (Supplementary Figure 1 in the online-only Data Supplement). When children view these animations or paintings in the mobile app, their facial and head movements are recorded by the tablet PC camera. The app displays four different paintings, featuring friendly animal characters, blocks, different animal characters, and animal characters with blocks. The animation content presents four different pictures from the previous test but utilizes an animation method, testing whether more children engage when objects move and whether the results align with those of non-animation tests.

Participant enrollment

From May 2020 to July 2020, children with DD were recruited from special education centers, and a comparison group of typically-developing children was recruited from community-based childcare settings (daycare centers and kindergartens).The children were between 34 and 77 months of age. We obtained the participants’ previous medical history information from their parents. For participants with a medical history, we reviewed their child’s medical record provided by the parent. Children were not included if they had 1) a history of neurologic diseases such as cerebral palsy, 2) any sensory disturbances (i.e., vision, hearing, taste, or smell), and 3) severe gross or fine motor problems that prevented them from participating in the psychometric tests. Of the 124 children, data from 35 children could not be analyzed because 1) facial data could not be extracted due to the use of facial masks (n=25), 2) facial data during video games were not available (n=5), and 3) landmark points were incorrectly extracted (n=5). As a result, a total of 89 children were analyzed in this study (Figure 1).

Figure 1.

Study population selection flow.

The participants of this study underwent a comprehensive developmental assessment. The children performed the Korean version of PEP-R, and their caregivers completed the Korean version of Vineland Adaptive Behavior Scales (VABS), the Korean version of the Childhood Autism Rating Scale (K-CARS), Social Responsiveness Scale (SRS) and Child Behavior Checklist (CBCL). Additionally, participants of this study underwent the most commonly used parent questionnaires for DD screening: Q-CHAT, ABC, ASQ, and the PEDS. Diagnoses of DD were made before the participants were enrolled, and the medical records were reviewed together with the results of developmental assessment by clinicians. If there was inconsistency between the clinician who interviewed the child and his or her caregiver, a diagnosis was made according to the Diagnostic and Statistical Manual of Mental Disorders Fifth Edition. Among TD children, those with confusing test results or developmental concerns were seen by clinicians and confirmed as TD controls. Informed consent was obtained from the parents of each child. Additionally, this study was performed in accordance with the principles of Good Clinical Practice and the Helsinki Declaration. The study was approved by the Institutional Review Board of Asan Medical Center (2020-0386).

Data collection and preprocessing

Video recording

We designed our experimental tasks to best elicit and capture facial expression in a structured environment to ensure reproducibility. The children were individually tested in a quiet room. Each child was seated on a height-adjustable chair with their right elbow and wrist resting on a table. At the beginning, the child was assisted or instructed to adjust the distance to the tablet PC so that the face could be recorded. Once the experimental environment is set up, the child watches the content on the table PC through a mobile app. Simultaneously, the mobile app records the child’s facial expressions during this viewing. Throughout this procedure, a trained examiner sits across from the child, closely observing their behavior. If a child does not cooperate or refuses to participate in viewing content through the mobile app, the examiner will halt the video recording.

Facial landmark extraction

Facial landmarks are standard reference points, such as the inner and outer corner of the eye fissure where the eyelids meet. In many cases, the landmarks used in computational face analysis are very similar to the anatomical soft tissue landmarks used by physicians. These extracted landmark points can be applied in various fields, such as human emotion recognition, gaze detection, and face conversion. In our work, we use 2D-Face Alignment Net (FAN), a convolutional neural network-based method, to recognize children’s faces in videos and extract 68 facial landmarks. The corresponding algorithm has been trained with an LS3D-W dataset consisting of approximately 230,000 face photographs including adult men and women, children, and showed higher performance when extracting the facial landmarks compared with other algorithms [32]. We decomposed the video into a frame unit image of 33 ms for each child, recognizing the child’s face from each image, and extracting the facial landmark points (Figure 2).

Figure 2.

Diagram of the 68 facial landmark points (A) and head pose estimations (B).


The extracted facial landmark points are stored in 68 coordinate values. For frames in which the extraction algorithm did not recognize faces, we interpolated the coordinate values of the landmark points by reflecting the information of the previous and subsequent frames of those frames. In video data, when a child bows his head or goes out of the video screen, the facial landmark may not be properly extracted. In this case, the video frame was regarded as an abnormal frame and removed (Figure 2).

Feature extraction

We utilized the OpenCV library to estimate the head pose in that frame through the landmark coordinate values. We measured three head poses (pitch, roll, yaw) by specifying six landmark points (i.e., eyes, nose, chin, and left and right mouth) and measuring the Euler angle to determine how much the points were rotated in the frame, assuming 3D coordinates with the orientation facing forward. The pitch was measured as the angle of nodding the head up and down, roll was measured archwise and left and right, yaw represented the angle of rotation of the head from left to right. Each value was measured within the range of -90 to 90 degrees (Figure 2).

To measure how much a child’s face has moved over the previous frame in the current frame, we assumed the average position of 68 landmark coordinate values in the frame as the center point of the face and calculated the Euclidean distance to determine the distance traveled. Given that each frame was captured at a frame rate of 33 ms, the distance the face traveled was measured by reflecting the frame rate.

The changes between 68 landmark points were measured in the video. A combination of all landmark points allows the construction of a total of 2,284 combinations of distance variables. We calculated all Euclidean distances of those combinations and selected frames staring forward in each child’s video. Based on the distance of the corresponding frame’s landmark combination, the ratio of the remaining frames to the normal frame was obtained. Among all the obtained proportion variables, the top 40 variables were selected as derivatives based on significant differences in the distribution between children with DD and TD.

Model algorithm

The data used in the current analyses were time-series data consisting of frames of video data recording the faces of children. Accordingly, we used the long short-term memory (LSTM) model for the binary classification of developmental disability. As a recurrent neural network (RNN) model, the LSTM model determines whether the weight value is maintained by adding cell states in an LSTM cell. The state obtained from an LSTM cell is used as input to the next LSTM cell, so the state of an LSTM cell affects the operation of the subsequent cells. The final target output at the end of the sequence represents a label classifying the developmental disability. The LSTM model can remove or add information to the cell state, carefully regulated by structures called the gates, which are a way of optionally letting information through. The LSTM model is more persistent than the existing RNN because it is possible to control long-term memory [33]. Since the lengths of each of the seven videos are vary, we created individual LSTM models for each video. We incorporated variables like sex and age as inputs in additional deep neural network (DNN) models. Subsequently, we constructed a model that integrated the results of seven LSTM models with those of a single DNN model to predict DD (Figure 3).

Figure 3.

Deep-learning model architecture. LSTM, long short-term memory.

We also performed stratified K-cross fold validation for the robustness of the model. Stratification is the process of rearranging the data to ensure that each fold is a good representative of the whole. The stratified K-cross fold validation technique splits the dataset into K sets and the model uses K-1 folds for training and is validated on the Kth fold. This process is repeated until all the folds have been used to validate the model once. Stratification ensures that each fold is a good representation of the entire dataset, leading to parameter fine-tuning and helping the model better classify DD [34]. In our study, we used K equals five. For evaluating the trained model, the following standard machine learning metrics were employed:

1) Accuracy: Percenage of correctly classified data frames in the given test dataset.

Accuracy=True positive+True negativesTrue positives+True negatives+False positives+False negatives·

2) Precision: Measure of exactness or quality of model prediction.

Precision=True positivesTrue positives+False positives·

3) Recall: Measure of completeness or quantity of model prediction.

Recall=True positivesTrue positives+False positives·

4) F1 score: Harmonic mean of the precision and the recall.

F1 score=2×Precision×RecallPrecision+Recall·

True positives are data samples that are classified as positive by the model and are actually positive. False positives are data samples that are classified as positive by the model but are actually negative. False negatives are data samples that are classified as negative by the model but are actually positive. When false positives and false negatives are close to zero, both the precision and the recall are closer to 1 and hence the F1-score is 1. In the worst case, the F1-score is 0, which means that the model is not performing any correct classifications.

After developing the DD classification model, we calculated the SHapley Additive exPlanations (SHAP) value to identify the main contributing variables of the model [35]. We calculated the SHAP values using the DeepExplainer of the SHAP package on five folds, which were divided from the train dataset. We then calculated the mean values of absolute SHAP values across all folds.

Statistical analysis

To compare the differences in the distribution of variables in the two groups, we tested the normality of the variables using the Shapiro–Wilk test. Variables satisfying normality were assessed using Student’s t-test, while those not meeting normality criteria were compared between groups using the Mann–Whitney U test. The statistical analysis was conducted using Python software (version 3.7, Python Software Foundation).


Overall population

A total of 89 children were included in this study. Among them, 33 children had DD and 56 were TD children. Children with DD were significantly older than TD (56.7±9.4 vs. 52.3±8.2 months; p=0.024) (Table 1). Sex distribution was not significantly different between the two groups, as 20 (60.6%) of children with DD and 28 (50.0%) of children without were boys (p=0.589). Of the children with DD, 25 (75.8%) were diagnosed with ASD, 23 (69.7%) with ID, and 7 (21.2%) with LD. The mean scores of the PEP-R Developmental Quotient, VABS Adaptive Behavior Composite score, SRS total score, and K-CARS were all significantly higher in children with DD (p<0.001). The mean score of the internalizing, externalizing, and total problems on the CBCL for children with DD were not significantly different from those of TD children, indicating that comorbid psychiatric symptoms are not different between the two groups.

Demographic characteristics of children with developmental disabilities and typically developing children

Model evaluation

We conducted a stratified 5-fold cross-validation, in which the mean accuracy was 0.88 (ranges, 0.78–1.00). The mean precision was 0.91 (range, 0.75–1.00). Recall exhibited the lowest performance among the evaluation metrics, with a mean value of 0.72 (range, 0.50–1.00). The mean F1-score was 0.80 (range, 0.60–1.00) (Table 2). When only landmark variables around the eyes and nose were selected to model learning, reflecting the coronavirus disease-2019 situation of mask-wearing, the average values of accuracy, precision, recall, and F1-score were 77%, 83%, 67% and 76%, respectively (Supplementary Table 1 in the online-only Data Supplement).

Performance of the stratified 5-fold cross validation

Model interpretation

After validating the performance of the model through cross-validation, we used SHAP to interpret the model to find the variables that contributed significantly to DD prediction. After ranking the contributing variables by SHAP score, we found that the “nodding head angle” variable scored the highest by a wide margin with a median SHAP score of 2.6, making it the strongest contributor to DD prediction (Figure 4). With a median of 1.3, the “left lip and left face corner area” variable was the next largest contributor.

Figure 4.

Box plot showing the absolute average SHAP values of the top 10 variables. SHAP, SHapley Additive exPlanations.

In addition, differences in the distribution of the top 10 high-contributing variables between children with and without DD were analyzed. The Mann–Whitney U test revealed significant differences in the distributions of these variables between the two groups (p<0.05) (Figure 5) (Table 3).

Figure 5.

Head pose and facial landmark distributions in children with and without DDs. DD, developmental disabilitie.

Mann–Whitney U test results for top 10 variables of importance


In this study, we present the effectiveness of deep learning classification model based on facial landmarks using a mobile-based video data. The model predicts the presence of DD by distinguishing facial characteristics between children with DD and TD. This is achieved by extracting 68 facial landmarks from the faces and generating derivatives including head pose estimation (pitch, yaw, roll) and landmark point distance. The deep learning classification model, utilizing mobile-based video data, predicted the presence of DD with an average accuracy of 88%. It revealed that in the pitch (head nodding) variable, children with DD exhibit a significantly broader distribution compared to TD. In the model’s interpretation process, we identified important predictive variables, such as pitch variables, all of which exhibited statistically significant differences in the distribution between children with DD and TD.

Caregiver questionnaires including Q-CHAT, ABC, ASQ, and PEDS are the most commonly used screening methods for DD. Of them, Q-CHAT showed a sensitivity of 71%–75% and a specificity of 63%–65% [21,36]. ABC was reported to have a sensitivity of 78.4% [37]. PEDS, which consists of two open-ended questions and eight yes/no questions completed by parents, showed sensitivities of 78.9% and 54.9% in severe and moderate-to-severe delays, respectively, and a specificity of 79.6% [38]. ASQ-3 showed sensitivities of 60.0% and 53.1% in severe and moderate-to-severe delays, respectively, and a specificity of 89.4% [39]. Thus, in terms of the accuracy of detection, our classification model seems to have comparable performance (88%) compared with the existing methods for screening.

Previous studies have suggested several digital screening methods for DDs [40-45]. Most web-based developmental surveillance programs involve trials of online versions of established questionnaires [42-45]. An example is the web-based Modified Checklist for Autism in Toddlers with Follow-up interview (M-CHAT/F), which is scored by parents. It operates as a two-stage screening test, wherein a positive result triggers a follow-up interview to clarify or correct the failed items. When administered by primary care pediatricians, the web-based M-CHAT/F demonstrated a sensitivity of 59% and a specificity of 71% [42]. In another study utilizing the digital M-CHAT-revised with follow-up, accurate documentation in the electronic health record of screening results increased from 54% to 92%. Moreover, appropriate action for children screening positive rose from 25% to 85%, as compared to the results obtained from the paper form of the M-CHAT [43]. In addition, the smartphone application PEDS operated by community healthcare workers, was shown to have a close correspondence with the gold standard paper-based PEDS tools operated by health professionals [44]. Most smartphone screening applications also focus on developing questionnaires answered by parents or medical professionals [45]. ASDTests is an application that is based on the autism-spectrum quotient and Q-CHAT, evaluating the possibility of having autistic traits [21]. Cognoa is a mobile screening application that includes both parental questionnaires and home video recording, with a sensitivity of 75% and a specificity of 62% [19,45]. These studies suggest that web-based or mobile-based screening tools could be reliably used for screening DD. Due to their quicker, cheaper, and more accessible, web-based or mobile-based screening tools could be helpful in improving the early identification of DD.

Some recent studies evaluated DD using digital observational methods, analyzing gazes, faces, or behaviors. Eye-tracking algorithms have shown progress in their potential use for screening ASD in rural areas [22,23]. Vargas-Cuentas and colleagues [22] recorded videos of participants watching social or non-social videos and analyzed the image frames from the video. Fujioka and colleagues [23] used infrared light sources and cameras to record the eye position. In one study from Bangladesh, a machine learning classifier trained by data from ADOS and ADI-R was able to detect developmental delay and autism by analyzing the behavior portrayed in home videos, and showed a sensitivity and accuracy of 76% [46]. Strobl and colleagues 20 also developed a smartphone application in which the participants’ gaze was analyzed by an eye-tracking algorithm. These studies show that digital methods could be used for the screening of DD.

Our study showed that facial landmark analysis, among mobile-based methods, could play a significant role in the detection of DD. In our study, out of 68 facial landmarks, the nodding head angle, left lip and face corner, and right eye and lip were the most important variables. These variables are consistent with existing analyses of head movement and facial expression for DD or ASD screening and they yield similar results to previous studies [47,48]. In previous studies examining head pose and facial expressions, Happy and Routray [30] utilized the Facial Action Coding System (FACS), which classifies facial expressions using salient facial patches and demonstrated a 94.14% accuracy in expression recognition. This study differs from ours in that the FACS extracts a maximum of 19 facial patches, while our study extracts 68 facial landmarks. Additionally, they utilized facial expression databases comprising a total 329 images, whereas our study directly collected data from 89 children. Another study employed a computer vision-based head tracking program (Zface) to illustrate differences between TD children and children with ASD [47]. In a different finding from our study, they found differences in the speed and quantity of head movement in yaw and roll, but not in pitch, between TD children and children with ASD. In another study, children with ASD and those with attention-deficit/hyperactivity disorder were differentiated with an accuracy of 94% using a red-green-blue-depth sensor from a depth measurement camera [48]. This study is similar to our work in that there was a difference in facial expressions using FACS. However, it is different from our results in that the study targeted adults aged 18 and older, and there was a difference in head movements in yaw. While these studies are computer-based programs that require special-purpose equipment, our study used a mobile-based application, making it more convenient and easy to use [30,47,48]. In one study, children watched movies on a smart tablet while the embedded camera recorded their facial expressions. Then, the computer vision analysis automatically tracked the facial landmarks and used them to classify the facial expressions into three types (positive, neutral, and other) with a maximum sensitivity of 73%, with different results depending on the type of movie being shown; notably, children with ASD displayed neutral expressions more often than children without ASD [49]. This study differs from ours in that we evaluated not only children with ASD but also those with DD.

Based on our results, we carefully suggest that facial landmarks and head poses may be used as screening tools for children with DD. A recent study that quantified head movement dynamics (displacement and velocity) showed that children with ASD had greater head movement dynamics than those without ASD [47]. Several papers hypothesized that turning away may be an adaptive strategy for individuals with ASD to regulate the overwhelming amount of information, which may explain the atypical head movement of individuals with ASD [50,51]. Therefore, using facial landmarks as a method of screening could aid the early identification of children with DD.

There are several limitations to this study. First of all, we were unable to find significant differences in facial landmarks or head pose when children were shown social videos and non-social videos. Second, our study did not analyze the results of the subgroups of DD (i.e., ASD, ID, LD). Third, since children with incorrect data were excluded, the sample size is relatively small and thus has limited generalizability. Fourth, the diagnosis of ASD was not based on structured interview such as ADI-R or ADOS. But the diagnosis of DD was based on thorough evaluation of clinician as well as comprehensive developmental assessment including PEP-R, VABS, K-CARS, SRS and CBCL. Fifth, we do not know whether these findings are limited to certain age groups. Sixth, CBCL data were not different between children with DDs compared to TD children. However, in our study, significant differences were observed in developmental assessments such as PEP-R, VABS, SRS, and K-CARS between DD and TP groups. Finally, our study did not consider body motion information because we used videos that recorded only the children’s faces.

Despite these caveats, our study evaluated the utility of digital methods, especially facial landmarks from deep learning-based methods, in the screening of DD among community-based preschool children. Our results provide preliminary evidence that a deep-learning classification model utilizing mobile-based video of children, could be employed for early detection of DD.

Supplementary Materials

The online-only Data Supplement is available with this article at https://doi.org/10.30773/pi.2023.0315.

Supplementary Table 1.

Performance verification result of the model using only landmarks in the eyes and nose area

Supplementary Figure 1.

Capture screen of video games among mobile-based screening kit.



Availability of Data and Material

The datasets generated or analyzed during the study are not publicly available due contains sensitive child facial data but are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors have no potential conflicts of interest to disclose.

Author Contributions

Conceptualization: Yu Rang Park, Hyo-Won Kim. Data curation: Sang Ho Hwang, Yeonsoo Yu. Formal analysis: Sang Ho Hwang. Funding acquisition: Yu Rang Park, Hyo-Won Kim. Investigation: Jichul Kim, Taeyeop Lee, Hyo-Won Kim. Methodology: Sang Ho Hwang, Yu Rang Park. Project administration: Hyo-Won Kim. Resources: Jichul Kim, Taeyeop Lee, Hyo-Won Kim. Software: Sang Ho Hwang, Yu Rang Park. Supervision: Yu Rang Park, Hyo-Won Kim. Validation: Jichul Kim, Taeyeop Lee, Hyo-Won Kim. Visualization: Sang Ho Hwang. Writing—original draft: Sang Ho Hwang, Yeonsoo Yu. Writing—review & editing: Yu Rang Park, Hyo-Won Kim.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the South Korean government (Ministry of Science and ICT) (NRF-2020R1A5A8017671).


We thank all the participants and their caregivers for their participation in this study.


1. Rubin IL, Crocker AC. Developmental disabilities: delivery of medical care for children and adults Philadelphia: Lea & Febiger; 1989.
2. Zablotsky B, Black LI, Maenner MJ, Schieve LA, Danielson ML, Bitsko RH, et al. Prevalence and trends of developmental disabilities among children in the United States: 2009-2017. Pediatrics 2019;144:e20190811.
3. Rah SS, Hong SB, Yoon JY. Prevalence and incidence of developmental disorders in Korea: a nationwide population-based study. J Autism Dev Disord 2020;50:4504–4511.
4. Rattaz C, Michelon C, Roeyers H, Baghdadli A. Quality of life in parents of young adults with ASD: EpiTED cohort. J Autism Dev Disord 2017;47:2826–2837.
5. Cakir J, Frye RE, Walker SJ. The lifetime social cost of autism: 1990-2029. Res Autism Spectr Disord 2020;72:101502.
6. Genereaux D, van Karnebeek CD, Birch PH. Costs of caring for children with an intellectual developmental disorder. Disabil Health J 2015;8:646–651.
7. Lee T, Park KJ, Shon SH, Kim S, Kim HW. Predictors of developmental outcome in 4- to 6-year-olds with developmental disability. Psychiatry Investig 2022;19:519–526.
8. Kakchapati S, Pratap KCS, Giri S, Sharma S. Factors associated with early child development in Nepal – a further analysis of multiple indicator cluster survey 2019. Asian J Soc Health Behav 2023;6:21–29.
9. McKenzie K, Megson P. Screening for intellectual disability in children: a review of the literature. J Appl Res Intellect Disabil 2012;25:80–87.
10. von Suchodoletz W. [Early identification of children with developmental language disorders – when and how?]. Z Kinder Jugendpsychiatr Psychother 2011;39:377–385. German.
11. Zwaigenbaum L, Bauman ML, Stone WL, Yirmiya N, Estes A, Hansen RL, et al. Early identification of autism spectrum disorder: recommendations for practice and research. Pediatrics 2015;136(Suppl 1):S10–S40.
12. Lee KY, Chen CY, Chen JK, Liu CC, Chang KC, Fung XCC, et al. Exploring mediational roles for self-stigma in associations between types of problematic use of internet and psychological distress in youth with ADHD. Res Dev Disabil 2023;133:104410.
13. Chan Y, Chan YY, Cheng SL, Chow MY, Tsang YW, Lee C, et al. Investigating quality of life and self-stigma in Hong Kong children with specific learning disabilities. Res Dev Disabil 2017;68:131–139.
14. Shattuck PT, Durkin M, Maenner M, Newschaffer C, Mandell DS, Wiggins L, et al. Timing of identification among children with an autism spectrum disorder: findings from a population-based surveillance study. J Am Acad Child Adolesc Psychiatry 2009;48:474–483.
15. Miller LE, Perkins KA, Dai YG, Fein DA. Comparison of parent report and direct assessment of child skills in toddlers. Res Autism Spectr Disord 2017;41-42:57–65.
16. McCarty P, Frye RE. Early detection and diagnosis of autism spectrum disorder: why is it so difficult? Semin Pediatr Neurol 2020;35:100831.
17. Tippelt S, Kühn P, Grossheinrich N, von Suchodoletz W. [Diagnostic accuracy of language tests and parent rating for identifying language disorders]. Laryngorhinootologie 2011;90:421–427. German.
18. Akshoomoff N, Corsello C, Schmidt H. The role of the autism diagnostic observation schedule in the assessment of autism spectrum disorders in school and community settings. Calif School Psychol 2006;11:7–19.
19. Kanne SM, Carpenter LA, Warren Z. Screening in toddlers and preschoolers at risk for autism spectrum disorder: evaluating a novel mobile-health screening tool. Autism Res 2018;11:1038–1049.
20. Strobl MAR, Lipsmeier F, Demenescu LR, Gossens C, Lindemann M, De Vos M. Look me in the eye: evaluating the accuracy of smartphone-based eye tracking for potential application in autism spectrum disorder research. Biomed Eng Online 2019;18:51.
21. Thabtah F. An accessible and efficient autism screening method for behavioural data and predictive analyses. Health Informatics J 2019;25:1739–1755.
22. Vargas-Cuentas NI, Roman-Gonzalez A, Gilman RH, Barrientos F, Ting J, Hidalgo D, et al. Developing an eye-tracking algorithm as a potential tool for early diagnosis of autism spectrum disorder in children. PLoS One 2017;12e0188826.
23. Fujioka T, Inohara K, Okamoto Y, Masuya Y, Ishitobi M, Saito DN, et al. Gazefinder as a clinical supplementary tool for discriminating between autism spectrum disorder and typical development in male adolescents and adults. Mol Autism 2016;7:19.
24. Deutsch SI, Raffaele CT. Understanding facial expressivity in autism spectrum disorder: an inside out review of the biological basis and clinical implications. Prog Neuropsychopharmacol Biol Psychiatry 2019;88:401–417.
25. Shanok NA, Jones NA, Lucas NN. The nature of facial emotion recognition impairments in children on the autism spectrum. Child Psychiatry Hum Dev 2019;50:661–667.
26. Zaja RH, Rojahn J. Facial emotion recognition in intellectual disabilities. Curr Opin Psychiatry 2008;21:441–444.
27. Faso DJ, Sasson NJ, Pinkham AE. Evaluating posed and evoked facial expressions of emotion from adults with autism spectrum disorder. J Autism Dev Disord 2015;45:75–89.
28. Memisevic H, Mujkanovic E, Ibralic-Biscevic I. Facial emotion recognition in adolescents with disabilities: the effects of type of disability and gender. Percept Mot Skills 2016;123:127–137.
29. Scotland JL, Cossar J, McKenzie K. The ability of adults with an intellectual disability to recognise facial expressions of emotion in comparison with typically developing individuals: a systematic review. Res Dev Disabil 2015;41-42:22–39.
30. Happy SL, Routray A. Automatic facial expression recognition using features of salient facial patches. IEEE Trans Affect Comput 2015;6:1–12.
31. Manfredonia J, Bangerter A, Manyakov NV, Ness S, Lewin D, Skalkin A, et al. Automatic recognition of posed facial expression of emotion in individuals with autism spectrum disorder. J Autism Dev Disord 2019;49:279–293.
32. Bulat A, Tzimiropoulos G. How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230,000 3D facial landmarks). Proceedings of the IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22-29; Venice, Italy. New York: IEEE, 1993, p.1021-1030.
33. Kwon DH, Kim JB, Heo JS, Kim CM, Han YH. Time series classification of cryptocurrency price trend based on a recurrent LSTM neural network. J Inf Process Syst 2019;15:694–706.
34. Sinha J, Manollas M. Efficient deep CNN-BiLSTM model for network intrusion detection [Internet]. Aailable at: https://doi.org/10.1145/3430199.3430224. Accessed April 7, 2023.
35. A unified approach to interpreting model predictions [Internet]. Available at: https://proceedings. neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html. Accessed April 7, 2023.
36. Raza S, Zwaigenbaum L, Sacrey LR, Bryson S, Brian J, Smith IM, et al. Brief report: evaluation of the short quantitative checklist for autism in toddlers (Q-CHAT-10) as a brief screen for autism spectrum disorder in a high-risk sibling cohort. J Autism Dev Disord 2019;49:2210–2218.
37. Juneja M, Sharma S, Mukherjee SB. Sensitivity of the autism behavior checklist in Indian autistic children. J Dev Behav Pediatr 2010;31:48–49.
38. Hamilton S. Screening for developmental delay: reliable, easy-to-use tools. J Fam Pract 2006;55:415–422.
39. Sheldrick RC, Marakovitz S, Garfinkel D, Carter AS, Perrin EC. Comparative accuracy of developmental screening questionnaires. JAMA Pediatr 2020;174:366–374.
40. Dias DC, Rondon-Melo S, Molini-Avejonas DR. Sensitivity and specificity of a low-cost screening protocol for identifying children at risk for language disorders. Clinics (Sao Paulo) 2020;75e1426.
41. Jain R, Arora A, Anand R, Malhotra S, Mittal M, Juneja M. Designing and validation of a Hindi-language parent self-report developmental screening tool. Indian Pediatr 2017;54:550–555.
42. Sturner R, Howard B, Bergmann P, Morrel T, Andon L, Marks D, et al. Autism screening with online decision support by primary care pediatricians aided by M-CHAT/F. Pediatrics 2016;138e20153036.
43. Campbell K, Carpenter KLH, Espinosa S, Hashemi J, Qiu Q, Tepper M, et al. Use of a digital modified checklist for autism in toddlers–revised with follow-up to improve quality of screening for autism. J Pediatr 2017;183:133–139.e1.
44. Maleka BK, Van Der Linde J, Glascoe FP, Swanepoel W. Developmental screening—evaluation of an m-Health version of the parents evaluation developmental status tools. Telemed J E Health 2016;22:1013–1018.
45. Abbas H, Garberson F, Glover E, Wall DP. Machine learning approach for early detection of autism by combining questionnaire and home video screening. J Am Med Inform Assoc 2018;25:1000–1007.
46. Tariq Q, Fleming SL, Schwartz JN, Dunlap K, Corbin C, Washington P, et al. Detecting developmental delay and autism through machine learning models using home videos of Bangladeshi children: development and validation study. J Med Internet Res 2019;21:e13822.
47. Martin KB, Hammal Z, Ren G, Cohn JF, Cassell J, Ogihara M, et al. Objective measurement of head movement differences in children with and without autism spectrum disorder. Mol Autism 2018;9:14.
48. Jaiswal S, Valstar MF, Gillott A, Daley D. Automatic detection of ADHD and ASD from expressive behaviour in RGBD data. Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017); 2017 May 30-June 3; Washington, DC, USA. New York: IEEE, 2017, p.762-769.
49. Carpenter KLH, Hahemi J, Campbell K, Lippmann SJ, Baker JP, Egger HL, et al. Digital behavioral phenotyping detects atypical pattern of facial expression in toddlers with autism. Autism Res 2021;14:488–499.
50. Mottron L, Mineau S, Martel G, Bernier CS, Berthiaume C, Dawson M, et al. Lateral glances toward moving stimuli among young children with autism: early regulation of locally oriented perception? Dev Psychopathol 2007;19:23–36.
51. Hellendoorn A, Langstraat I, Wijnroks L, Buitelaar JK, van Daalen E, Leseman PP. The relationship between atypical visual processing and social skills in young children with autism. Res Dev Disabil 2014;35:423–428.

Article information Continued

Figure 1.

Study population selection flow.

Figure 2.

Diagram of the 68 facial landmark points (A) and head pose estimations (B).

Figure 3.

Deep-learning model architecture. LSTM, long short-term memory.

Figure 4.

Box plot showing the absolute average SHAP values of the top 10 variables. SHAP, SHapley Additive exPlanations.

Figure 5.

Head pose and facial landmark distributions in children with and without DDs. DD, developmental disabilitie.

Table 1.

Demographic characteristics of children with developmental disabilities and typically developing children

Variable Children with development disabilities (N=33) Typically developing children (N=56) p
Age (months) 56.7±9.4 52.3±8.2 0.024
Male 20 (60.6) 28 (50.0) 0.589
 ASD 25 (75.8)
 ID 23 (69.7)
 LD 7 (21.2)
PEP-R DQ 52.0±27.0 107.1±10.8 <0.001
VABS ABC 58.1±12.2 90.2±11.7 <0.001
SRS total score (N=86) 69.2±15.7 46.4±8.1 <0.001
K-CARS 29.4±6.7 15.9±1.4 <0.001
CBCL total behavior problems (N=86) 56.9±8.0 45.9±8.8 0.907
CBCL internalizing score (N=86) 55.5±8.3 46.5±8.3 0.794
CBCL externalizing score (N=86) 54.9±8.2 45.9±9.1 0.533

Values are presented as mean±SD or number (%). ASD, autism spectrum disorder; ID, intellectual disability; LD, language disorder; PEP-R DQ, Psychoeducational Profile Revised Developmental Quotient; VABS ABC, Vineland Adaptive Behavior Scale Adaptive Behavior Composite; SRS, Social Responsiveness Scale; K-CARS, Korean version of the Childhood Autism Rating Scale; CBCL, Child Behavior Checklist; SD, standard deviation

Table 2.

Performance of the stratified 5-fold cross validation

Fold Accuracy Precision Recall F1-score
1 0.78 0.75 0.50 0.60
2 0.78 0.80 0.57 0.67
3 0.89 1.00 0.71 0.83
4 1.00 1.00 1.00 1.00
5 0.94 1.00 0.83 0.89
Average 0.88 0.91 0.72 0.80

Table 3.

Mann–Whitney U test results for top 10 variables of importance

Feature Mann–Whitney U p
Nodding head angle 823.0 <0.001
Left lip and left face corner angle 612.5 0.012
Right eye and right lip area 673.0 <0.001
Moving distance of face center point compared to previous time 592.0 <0.001
Tilting head angle 649.5 0.035
Nose tip and left nose area 522.0 <0.001
Right nose and right lip area 582.0 0.011
Shaking head angle 664.0 <0.001
Moving speed of face center point compared to previous time 724.0 <0.001
Right eyebrow area 649.0 0.033

Mann–Whitney U: the values of variables in both groups are listed in order of size to calculate their rankings. The U value is then calculated by considering the sum of the ranks of each group, the rank average, and the number of data