Classification of Adolescent Psychiatric Patients at High Risk of Suicide Using the Personality Assessment Inventory by Machine Learning
Article information
Abstract
Objective
There are growing interests on suicide risk screening in clinical settings and classifying high-risk groups of suicide with suicidal ideation is crucial for a more effective suicide preventive intervention. Previous statistical techniques were limited because they tried to predict suicide risk using a simple algorithm. Machine learning differs from the traditional statistical techniques in that it generates the most optimal algorithm from various predictors.
Methods
We aim to analyze the Personality Assessment Inventory (PAI) profiles of child and adolescent patients who received outpatient psychiatric care using machine learning techniques, such as logistic regression (LR), random forest (RF), artificial neural network (ANN), support vector machine (SVM), and extreme gradient boosting (XGB), to develop and validate a classification model for individuals with high suicide risk.
Results
We developed prediction models using seven relevant features calculated by Boruta algorithm and subsequently tested all models using the testing dataset. The area under the ROC curve of these models were above 0.9 and the RF model exhibited the best performance.
Conclusion
Suicide must be assessed based on multiple aspects, and although Personality Assessment Inventory for Adolescent assess an array of domains, further research is needed for predicting high suicide risk groups.
INTRODUCTION
Suicide is understood as a series of actions including suicidal ideation, suicidal planning, and suicide attempt [1]. Suicidal ideation refers to immersion into thoughts about suicide-related behaviors [2], and suicidal planning refers to specific planning to attempt suicide, which can lead to suicide attempt and ultimately death. If suicidal ideation is focused on the aspect of thinking, suicide attempt is the outward expression as a specific action. Suicide attempt includes suicide behaviors committed without the intention of suicide to achieve other goals as well as cases in which an individual attempted to kill oneself but failed.
According to OECD suicide statistics, Korea had the highest suicide rate among 35 OECD countries in 2017 at 23.0 per 100,000 population [3]. The 2019 suicide statistics published by Statistics Korea showed that suicide is the leading cause of death among individuals aged 10–39 years [4], and suicide has continuously been the leading cause of death among the teens between the age 10–19 since 2008.
Not all adolescents with suicidal ideation actually attempt suicide, and groups of adolescents without suicidal planning may attempt to commit suicide [5]. However, because suicidal ideation is highly likely to progress to suicide attempt and suicidal death, suicide attempts can be reduced by reducing suicidal ideation [6]. In fact, approximately one-third of adolescents with suicidal ideation are known to make a suicide attempt [5], and preventive interventions are crucial because suicidal ideation is an important predictor of suicide attempt [7,8]. Hence, there are growing interests on suicide risk screening in clinical settings, and classifying high-risk groups of suicide with suicidal ideation is crucial for a more effective suicide preventive intervention [9].
There are several factors that influence adolescents’ suicide risk [10], and specifically personality trait is one such risk factor [11]. Some past studies have analyzed features of groups with a suicide risk using the Personality Assessment Inventory (PAI) [12,13], but these studies were limited to adult subjects, with little relevant research on children and adolescents according to our literature review. In addition, most studies simply examined the association of suicide risk with symptoms such as depression and anxiety or identified relevant risk factors [14]. Recently, some studies have applied machine learning in the prediction of suicide [15], and there was an attempt to classify or predict adolescents with a suicide risk using machine learning [15-17]. However, none of the studies attempted to predict adolescents with high suicide risk using machine learning in consideration of their personality traits. Thus, this study aims to classify groups at risk of suicide by including personality traits using the PAI.
Previous statistical techniques were limited because they tried to predict suicide risk, a complex problem, using a simple algorithm [18]. Machine learning differs from the traditional statistical techniques in that it generates the most optimal algorithm from various predictors, based on which it has been proposed to be useful in the prediction and classification of suicide risk [19]. In the present study, we aim to analyze the PAI profiles of child and adolescent patients who received outpatient psychiatric care using machine learning techniques, such as logistic regression (LR), random forest (RF), artificial neural network (ANN), support vector machine (SVM), and extreme gradient boosting (XGB), to develop and validate a classification model for individuals with high suicide risk.
METHODS
Participants
Data were obtained from retrospective chart review of psychometric assessment, which was performed on 158 patients aged 12–17 years who have visited the outpatient psychiatric clinic at Wonkwang University hospital between January 2011 and December 2019. Overall psychological evaluation including PAI-A and intelligence test were conducted as initial assessments if it has not been evaluated before. Because the reliability of self-reported data from individuals with an overall intelligence and language comprehension score of below 70 is low, 34 were excluded, and a total of 124 patients were enrolled. The psychometric assessment results were interpreted by a clinical psychologist. The diagnosis of all patients was grouped according to DSM-IV-TR and DSM-5. This study was approved by the Institutional Review Board of Wonkwang University Hospital (WKUH 2021-02-013-004).
Instruments
Personality Assessment Inventory for Adolescent (PAI-A)
This test is an objective test for assessing personality and psychopathology developed by Morey [20] and standardized for use in Korea by Kim et al. [21]. The test was developed to provide important information about the patient or client in clinical settings, and according to the standardization study, it is a useful multiscale inventory personality test that enables differential diagnosis and identifies specific areas of discomfort among individuals. The PAI-A retains the same scale structure in the PAI, and it was developed for use on middle and high school students by modifying items that were deemed to be inappropriate for adolescents. It comprises 264 items, with 22 scales, including four validity scales, 11 clinical scales, five treatment consideration scales, and two interpersonal scales. Ten of these scales include subscales designed to facilitate a comprehensive and in-depth assessment of complex clinical constructs.
Lim et al. [22] conducted a restandardization study of Korean PAI-A in 2018, and the internal consistency of each scales were identified as NIM (Cronbach’s α=0.76), PIM (α=0.75), SOM (α=0.80), ANX (α=0.84), ARD (α=0.70), DEP (α=0.87), MAN (α=0.74), PAR (α=0.75), SCZ (α=0.80), BOR (α=0.88), ANT (α=0.77), ALC (α=0.51), DRG (α=0.64), AGG (α=0.80), SUI (α=0.82), STR (α=0.79), NON (α=0.66), RXR (α=0.73), DOM (α=0.67) and WRM (α=0.80).
SUI scale is related to evaluation of idea of death and suicide, with little thought of death or suicide if it is below 60 points. And with 60–69 points, there are temporary and periodic suicidal ideation and tend to be pessimistic about the future of one self, and a significant sense of suicide is reported if the score is 70 or higher. In this study, subjects with a SUI scale of 60 or higher were defined as suicide high risk group.
Machine learning
The prediction pipeline was developed as shown in Figure 1. The pipeline was generated from five machine learning methods, namely LR, RF, ANN, SVM, and XGB using the caret package provided in the R statistical software version 3.6.3 (R Studio, Inc., Boston, MA, USA). The developed pipeline consisted of random splitting of the input dataset into training (n=87; 70% of 124 samples) and testing (n=37, 30% of 124 samples) datasets, while maintaining equal proportions of the class ratios in each split. We developed five final machine learning models to predict high suicide risk group in the training dataset, by tuning the hyper-parameters using the caret package provided with the R statistical software. We used ten-fold cross-validation to prevent overfitting. The relative importance of feature, provided in arbitrary units, was calculated using the Boruta algorithm, which is a variable selection method built around the random forest [23]. The receiver operating characteristic (ROC) curves were plotted, and the area under the ROC curves (AUROC) was obtained to assess the model’s performance. The AUROCs were compared using the Delong test.
Statistical analysis
Statistical analysis was conducted using the R software (R Studio). The demographic data were analyzed using student’s t-test and χ2 test, as appropriate. p-values <0.05 (two-sided) were considered statistically significant.
RESULTS
Differences of demographic characteristics and intelligence scores between high-risk and low-risk groups
A total of 124 participants were divided into the high suicide risk group (suicide ideation, SUI ≥60) or low suicide risk group (SUI <60) based on their SUI score. There were no significant differences in the demographic factors and intelligence scores between the two groups (Table 1).
Comparison of PAI-A scale scores according to suicide risk
There were statistically significant differences in the inconsistency (ICN) scale, negative impression (NIM) scale, positive impression (PIM) scale, somatic complaints (SOM) scale, anxiety (ANX) scale, anxiety-related disorder (ARD) scale, depression (DEP) scale, mania (MAN) scale, paranoia (PAR) scale, schizophrenia (SCZ) scale, borderline features (BOR) scale, drug problems (DRG) scale, aggression (AGG) scale, stress (STR) scale, nonsupport (NON) scale, and treatment rejection (RXR) scale of PAI-A between the two groups. There were no statistically significant differences in the infrequency (INF) scale, antisocial features (ANT) scale, alcohol problems (ALC) scale, dominance (DOM) scale, and warmth (WRM) scale between the two groups (Table 2).
Develop a prediction model using machine learning techniques
The observed high-risk ratio was 35.5% (44/124), which was consistent with the imbalanced data (Table 1). Therefore, we applied the oversampling method to balance the training dataset (Figure 1). First, we developed the prediction models with sex, age and all PAI-A scales and subsequently tested all models using the testing dataset. The AUROCs of RF, SVM and XGB were >0.8, indicating that these models performed effectively in the testing dataset (Figure 2). Then, the relative importance of all features was calculated using the Boruta algorithm. Seven features including ARD, NON, DEP, RXR, STR, ANX, and AGG scales were determined as relevant for predicting high suicide risk group and the ARD showed the highest relative importance (Figure 3). Finally, we developed prediction models using seven relevant features calculated by Boruta algorithm and subsequently tested all models using the testing dataset. The AUROCs of these models were above 0.9 and the RF model exhibited the best performance (Figure 4 and Table 3). The performance of RF model was significantly superior to that of ANN (Table 4).
DISCUSSION
This study attempted to develop and validate a model for predicting high suicide risk in child and adolescent psychiatric patients using the scales of PAI-A and applying machine learning techniques.
In this study, approximately 35.5% (n=44) of the participants were classified as the high suicide risk group, as defined by suicide ideation scale score of 60 or higher. Jeon et al. [24] reported the lifetime prevalence of suicidal ideation to be 18.9% among adolescents aged 12–18 years, and the higher prevalence in our study may be attributable to the fact that the study population consisted of patients who sought psychiatric care, who have various risk factors and psychopathologies, as opposed to the general population.
According to a study comparing the psychometric variables of high and low suicide risk groups by Heo et al. [25], the high suicidal ideation group had significantly higher scores on the clinical scales of PAI, namely SOM, ANX, ARD, DEP, MAN, PAR, BOR, ANT, and DRG scales, compared to the low suicidal ideation group. In the study of suicide attempt by Sinclair et al. [12], total nine indicators (ICN, NIM, STR, MAN-G, SCZ-T, BOR-N, BOR-S, ANT-A, and ANT-E) among 42 PAI indicators and specifically six indicators (NIM, STR, MAN-G, BOR-N, BOR-S, ANT-A) when considering several factors such as age, sex, education, and race were found to differ between multiple suicide attempt and single/no suicide attempt. The scales found to be significantly associated with suicide risk classification in our study, namely ARD, NON, DEP, ANX, RXR, AGG, and STR scales, differed from those reported by previous studies, and this is presumed to be due to the adolescent study population in our study and application of machine learning methods.
AGG was found to be a significant factor in classifying high suicide risk group. It has been suggested as a developmental factor of suicide in a previous study [26], and it has also been reported to be associated with ADHD, ODD, and conduct disorders among children and adolescents, with adolescents with a suicide attempt history showed higher impulsivity than those without a suicide attempt history [27]. In the developmental aspect of neuromaturation, adolescent’s aggression can be interpreted in the context that structural and functional maturation of the prefrontal cortex is in progress, and it is still difficult to demonstrate consistent impulse control [28,29]. Adolescents with conduct disorder showed a high suicide risk, and some studies have linked this to severe emotional dysregulation and higher prevalence of mood disorder [30]. Substance abuse is also a risk factor of suicide attempt in adolescents, and one study reported that heavy drinking increases psychological distress and aggressiveness and hinders adaptive coping strategy, thereby increasing suicide risk [31].
NON scale indicates a lack of social support, and this scale can be used to examine the degree of support provided by family members and friends. A previous study reported that female adolescents who are socially severed from their friends engage in more suicidal ideation, while male adolescents who belong in a tightly networked school community exhibit a protective effect against suicide attempt [32]. Furthermore, this is in line with the findings that family stress or familial conflict directly and indirectly influence adolescents’ suicide risk behaviors [33-35].
Anxiety disorder and PTSD are known to increase suicide risk [26]. In addition, patients with depression are 13–26 times higher risk of suicide compared to the general population [36], and a psychological autopsy study on 229 suicidal deaths found that 59% of these people had major depressive disorder [37]. In our study, anxiety as well as depression were identified as risk factors for suicide, in this regard previous study [38] shown that the coexistence of depressive and anxiety symptoms was associated with an increased risk of suicide compared to having either one.
RXR evaluate the interest in psychological and emotional change, indicating that individual with high scores lack therapeutic motivation. According to a study of untreated adolescent depression [39], those who were not treated were found to have 4.19 times higher risk of suicide than those who were treated or who were the first depressive episode.
This study has several strengths. First, the study population comprised child and adolescent psychiatric patients. Second, data were analyzed using machine learning techniques. Suicide is a complex psychological phenomenon that occurs as a result of numerous factors. A past study confirmed the importance of machine learning techniques compared to other existing methodologies that utilize data in testing a priori hypothesis when testing the interactions among various complex factors [19], and the predictive accuracy was higher with machine learning techniques. In this study, we developed and validated a classification model for high suicide risk group using various psychopathologies, including personality traits with a standardized instrument. There were no studies that analyzed the risk factors of suicide in adolescents by both assessing them under a single standardized tool that can evaluate comorbid psychopathologies and applying machine learning techniques. The Boruta algorithm has an advantage of eliminating researcher’s intention in feature selection, and we were able to confirm that the predictive model has better performance in such case. Third, in light of past studies on the association between suicide risk and intelligence, we set an intelligence score of 70 or higher as an inclusion criterion and included intelligence in the high suicide risk classification model for analysis. A previous study reported an inverse association between IQ and suicide risk in children and adolescents, and it has been found that those with a low IQ who have no appropriate coping and cognitive skills are vulnerable to in suicidal ideation upon a stressful life event, as a result suicidal ideation and planning ultimately leads to a suicide attempt [40,41].
There are some limitations to this study. The first one is limited generalizability since subjects in this study has been selected among patient group who have visited child and adolescent psychiatric clinic in university hospital. Second, we used a self-report test. A retrospective self-report test is vulnerable to recall bias or underreporting. Subsequent studies should supplement the self-reported data with structured interviews. Third, causation cannot be drawn due to the cross-sectional design. However, this study has clinical significance in that various facets of personality and psychopathology related to suicide risk were included in the analysis.
In conclusion, this study developed a model to predict high suicide risk groups by applying machine learning techniques to PAI-A data. The model had an AUC of 0.936, confirming its potential for classifying and predicting high suicide risk groups with a “excellent” diagnostic accuracy [42]. In comparison to previous study findings using PAI, AGG, NON, and RXR were added as an important consideration in this study, and it may be helpful to detect suicides risk if these scales are actually elevated in clinical settings. However, suicide must be assessed based on multiple aspects, and although PAI-A assess an array of domains, further research is needed for predicting high suicide risk groups. In addition, large-scale data obtained from multiple institutions and further research will be needed to enhance the performance of the developed model and improves its applicability in clinical settings. Such model would be used for assessing and screening patients with high suicide risk in clinical practice.
Notes
Availability of Data and Material
The datasets generated or analyzed during the study are available from the corresponding author on reasonable request.
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Chan-Mo Yang. Data curation: Kyung-Won Kim, Chan-Mo Yang. Formal analysis: Kyung-Won Kim, Jae Seok Lim. Investigation: Kyung-Won Kim, Chan-Mo Yang. Methodology: Kyung-Won Kim, Jae Seok Lim, Chan-Mo Yang. Project administration: Chan-Mo Yang. Resources: Chan-Mo Yang. Supervision: Sang-Yeol Lee. Validation: Chan-Mo Yang, Seung-Ho Jang. Visualization: Jae Seok Lim. Writing—original draft: Kyung-Won Kim, Jae Seok Lim. Writing—review & editing: Kyung-Won Kim, Chan-Mo Yang.
Funding Statement
This paper was supported by Wonkwang University in 2021.