INTRODUCTION
Suicide is a significant global public health concern, accounting for over 720,000 deaths annually—equivalent to one in every 100 deaths worldwide [
1]. South Korea has the highest suicide rate among the Organization for Economic Cooperation and Development countries, with 27.3 deaths per 100,000 people in 2023, an increase of 2.2 from the previous year [
2,
3].
Among middle-aged adults, suicide ranks as the fifth leading cause of death globally [
4], the eighth in the United States [
5], and fifth in South Korea [
4]. While mental health issues are well-established risk factors for suicide, middle-aged adults often face additional burdens such as health problems, loss of jobs, and financial stress, making them more vulnerable than other age groups [
6,
7]. Furthermore, middle-aged individuals often experience familial stress due to their dual caregiving roles for both older and younger generations, earning them the label of the “sandwich generation.” [
8,
9]
In suicide research, accurate predicting suicidal behavior is critically important for timely intervention and prevention. However, despite decades of dedicated effort, traditional prediction models have consistently struggled to achieve meaningful accuracy, largely due to the complex, multifactorial nature of suicidality [
10]. As highlighted by Franklin et al.’s [
11] meta-analysis, predictions using traditional models were only slightly better than chance and showed no significant improvement over 50 years of research. Machine learning (ML) has emerged as a promising tool in suicide research, offering improved predictive accuracy compared with traditional statistical models. Interest in ML applications for suicide prediction has surged in recent years, with more than 4,000 articles published on this topic as of 2020 [
12]. Despite these advances, accurately predicting suicidal thoughts and behaviors remains challenging [
13,
14].
Most ML-based studies used clinical datasets, such as electronic medical records, to assist in clinical decision-making [
15]. However, population-based studies are crucial for informing broader public health-oriented suicide prevention strategies. Despite global calls for a “whole-of-society” approach to suicide prevention, ML studies that use population samples are rare [
15,
16]. A review by Heckler et al. [
12] found that only 5 of 54 ML studies on suicidal ideation (SI) involved population-based samples.
Identifying early-stage risk and protective factors for suicidal thoughts and behaviors, especially in individuals without clear psychiatric diagnoses, can significantly enhance preventive efforts. Factors such as sociodemographic characteristics, socioeconomic conditions, and family dynamics have consistently been associated with suicidal thoughts and behaviors, along with comorbid psychological disorders and adverse life experiences [
17,
18]. As SI is a key precursor of suicide attempts [
19], identifying and addressing modifiable social and familial influences may offer effective interventions [
17]. Notably, most ML-based studies on SI have focused on adolescents, older adults, or the general population, leaving middle-aged adults underexamined [
12].
In addition, although complex ML models offer strong predictive power, their limited interpretability poses challenges for their practical applications in healthcare [
13]. An explainable ML approach is essential for enhancing transparency, trust, and accountability in artificial intelligence systems. This allows users to understand the rationale behind model predictions, which is crucial for model validation and regulatory compliance. A representative method is SHapley Additive exPlanations (SHAP), which quantifies the contribution of each feature to a prediction based on cooperative game theory [
20].
The present study had two objectives: 1) to accurately predict future and concurrent SI in middle-aged Korean adults and compare their performance using four ML model; and 2) to identify and compare key predictors for future and concurrent SI outcome using SHAP plots, enhance interpretability, and inform preventive strategies.
METHODS
Study population
We used nationally representative data from the 7th (2011) to the 18th (2022) waves of the Korea Welfare Panel Study (KoWePS). Households were selected from 16 provincial districts in proportion to each district’s population size, and all individuals aged 15 years or older in these households were interviewed.
The final sample consisted of 8,992 participants aged 40-64 years who provided complete information on SI from 2011 to 2022. Two datasets were constructed for analysis: one comprising 55,373 time points for predicting concurrent SI, and another with 46,381 time points (excluding participants’ first-year data) for predicting future SI (
Figure 1). The study protocol was approved by the Institutional Review Board of Korea University (IRB approval number: KUIRB-2021-0376-01).
Dependent variable: SI experience
In the KoWePS dataset, SI was assessed annually using the question, “Have you ever thought about suicide in the past year?” To predict concurrent and future SI, we used SI data from the year and the following year as the dependent variables, respectively.
Predictive variables
Based on previous studies, we selected 38 predictors of SI in middle-aged individuals. These were grouped as follows: 1) sociodemographic factors (6): age, sex, marital status, religion, number of household members, and educational level; 2) socioeconomic factors (12): household income, low-income status, satisfaction with income, household economic conflict, employment status, satisfaction with occupation, residential area, satisfaction with housing environment, and four types of insurance (public pension, employment, industrial accident, and retirement insurance); 3) health-related factors (7): smoking status, alcohol consumption, Alcohol Use Disorders Identification Test score, chronic disease status, self-rated health, disability status, and satisfaction with health; 4) psychological factors (2): depression (measured using the Center for Epidemiologic Studies-Depression Scale 11) and self-esteem (measured using the Rosenberg Self-Esteem Scale); 5) family factors (8): experiences of quarrels, threats, violence, and satisfaction with various family relationships (general family, family life, spouse, children, and siblings); and 6) life satisfaction factors (3): satisfaction with social relationships, leisure activities, and overall life satisfaction.
Predictors of future SI were drawn from the previous year’s data. For concurrent SI, predictors were selected for the same year. Furthermore, previous-year SI was included as a predictor in the model for future SI.
Statistical analysis and ML modeling
The characteristics of features were displayed either as mean and standard deviation for continuous variables, or frequency and proportion for categorical variables. To evaluate the bivariate association of predictors with SI, the independent t-test or the chi-square test were applied, as appropriate. Statistical analysis was performed using SAS version 9.4 (SAS Institute Inc.).
We employed four ML methods to develop prediction models: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGB). Traditional models such as LR and DT were included for their simplicity and interpretability. RF is known to be robust against overfitting and XGB offers superior predictive accuracy [
21,
22]. Prior to modeling, we addressed missing values using multiple imputations, and all variables were scaled to a range between 0 and 1.
Recursive feature elimination with cross-validation was used to identify the most relevant predictors. For the future SI model, 14 variables were initially selected, and the previous-year SI was added as a final predictor. For the concurrent SI model, 22 predictors were selected from the total pool of 38 participants.
The modeling procedure was as follows: 1) the dataset was randomly split into training and test sets in an 80:20 ratio, while preserving the class distribution; 2) to mitigate overfitting, 10-fold cross-validation was used. Each training fold was oversampled using the Synthetic Minority Oversampling Technique (SMOTE) to address class imbalance; 3) the remaining fold was used for validation; 4) the optimal hyperparameters for each model were determined using a grid search, and the best-performing model was identified based on its accuracy (
Table 1); and 5) the selected model was then applied to the imbalanced holdout test set, and model performance was evaluated.
We assessed model performance using the area under the receiver operating characteristic curve (AUC), a measure of the ability of the model to distinguish between the classes. The interpretation thresholds were: <0.6 (poor), 0.6-0.75 (moderate), 0.75-0.9 (good), and >0.9 (excellent) [
23]. Additional metrics such as accuracy, sensitivity, specificity, precision, and F1-score were derived from the confusion matrix. Accuracy represents the proportion of total correct predictions. Sensitivity, or recall, reflects the ability to correctly identify positive cases, while precision, or positive predictive value, indicates the proportion of predicted positives that are actually positive. F1 score is the harmonic mean of sensitivity and precision, making it useful when classes are imbalanced.
In confusion matrix, true positives and true negatives refer to positive and negative cases that the model correctly predicts. A false positive occurs when a negative case is incorrectly classified as positive, while a false negative occurs when a positive case is incorrectly predicted negative.
To enhance interpretability, we used SHAP summary plots for the LR model, which showed the highest AUC. The Y-axis of SHAP summary plots lists the features used in the model, ordered by their importance. The X-axis indicates the SHAP values, showing the magnitude of contribution to the prediction. Positive SHAP values push the prediction higher, while negative values push it lower. A wide spread of dots means the feature’s impact varies across samples. All analyses were conducted using Python (version 3.7.10) [
24].
RESULTS
Approximately half of 8,992 participants were male (52.2%) with a mean age of 49.3±8.2 years, ranging from 40 to 64 years. The average rate of SI at 55,373 time points across 2011-2022 was 2.8%±1.2% (
Table 2). Compared to participants without SI, those with SI tended to be slightly older, live with lesser family members, have lower educational attainment, report poorer subjective health, exhibit higher level of depression, lower self-esteem, and reduced overall life satisfaction.
The performance of the ML models in predicting future SI was moderate to good, with AUC values ranging from 0.739 (DT) to 0.806 (LR). The prediction of concurrent SI yielded good to excellent results, with AUCs ranging from 0.847 (DT) to 0.907 (LR) (
Table 3). Among the models for future SI, XGB showed the highest accuracy (0.958), specificity (0.975), precision (0.259), and F1 score (0.285), whereas LR showed the highest sensitivity (0.642). In concurrent SI prediction, XGB had the highest accuracy (0.961), specificity (0.979), precision (0.313), and F1 score (0.325), whereas LR provided the highest sensitivity (0.797).
Figure 2 shows a schematic comparison of the AUCs across the four ML models for both future and concurrent SI.
Table 4 lists the confusion matrices for each model.
Figure 3 shows the SHAP summary plots of the LR model. These plots ranked the predictors by importance and indicated the direction of their effects. The key predictors of future SI included poor subjective health, depressive symptoms, low family satisfaction, low self-esteem, younger age, dissatisfaction with one’s spouse, low educational attainment, and dissatisfaction with the housing environment. For concurrent SI, the significant predictors included depressive symptoms, dissatisfaction with one’s spouse and health, family violence, younger age, low self-esteem, income dissatisfaction, smaller family size, and overall life dissatisfaction.
DISCUSSION
This study developed and evaluated four ML models—LR, DT, RF, and XGB—to predict future and concurrent SI in a nationally representative longitudinal sample of middle-aged Korean adults. The models showed good performance in predicting future SI (AUC up to 0.806) and excellent performance in predicting concurrent SI (AUC up to 0.907). These outcomes mark a significant improvement over traditional statistical models, which typically showed only modest predictive abilities, such as an AUC value of 0.58 [
11].
The performance of our models aligns with previous meta-analyses, which reported AUC values ranging from 0.66 to 0.97 across 15 ML-based SI studies [
15], from 0.79 to 0.98 across 54 studies [
12] and 0.81 to 0.92 across 35 studies [
10]. Although van Mens et al. [
25] reported a slightly higher AUC (0.83) in predicting future SI in a young Scottish cohort, our models performed comparably well, particularly considering the differences in age group and cultural context.
Among the four ML prediction models, LR and XGB demonstrated superior performance compared to the other two models in terms of accuracy, sensitivity, specificity, precision, and AUC. Based on AUC values, LR exhibited the highest discriminative ability, while RF or XGB were expected to show stronger predictive performance overall [
21,
22]. These findings may suggest that the relationship between SI and its predictors in our dataset is moderately linear, with minimal nonlinear interactions. Several previous studies have similarly reported that penalized LR achieved robust and reliable results in comparable setting [
26-
28].
Despite the low base rates of future (2.62%) and concurrent SI (2.76%), the XGB model achieved an approximately 10-fold improvement in precision (0.259 for future SI and 0.313 for concurrent SI) compared to the baseline rates. These findings outperformed those of several previous studies that showed a three- to four-fold increase in prediction ability [
25,
29]. Addressing class imbalance is crucial in suicide prediction because ML algorithms typically assume balanced datasets [
30]. To mitigate this, we applied oversampling during each fold of the cross-validation process to minimize the risk of overlapping data and model overfitting [
31].
Our results underscore the significance of health, family, and socioeconomic factors in predicting SI beyond well-known psychological risk factors such as depression and low self-esteem. Subjective health status, satisfaction with family and spousal relationships, housing environment, and educational attainment were particularly relevant for predicting future SI. More immediate or tangible stressors, such as family violence and income dissatisfaction, were stronger predictors of concurrent SI.
The importance of subjective health echoes previous findings that physical health problems, particularly those that impair daily functioning, increase suicide risk [
7]. Hence, our results highlight the need for multilayered intervention strategies, including family-based programs [
32], public health initiatives to enhance physical and mental health, expanded social welfare services, and policies aimed at reducing income inequality.
The impact of negative family environments, including conflict and lack of support, is well-documented in youth suicidality research [
33]. In collectivist cultures such as South Korea, where family relationships are especially central to social identity [
34], such dynamics may exert an even greater influence on mental health. While prior research on younger adults has emphasized psychological factors such as internal entrapment and interpersonal distress [
25], our findings suggest that for middle-aged adults, familial and socioeconomic contexts are equally crucial.
Socioeconomic variables also showed differential predictive power. While education and housing were more relevant to future SI, income was more pertinent for concurrent SI, likely reflecting immediate financial stress as a proximal trigger. These patterns support earlier systematic reviews emphasizing the critical role of socioeconomic adversity in suicidal behaviors [
18].
Although our comparison of near-future SI was confined to a 1-year follow-up, the results revealed distinct risk factor profiles that can guide targeted preventive interventions. Probert- Lindström et al. [
35] stratified predictors of suicide attempts using a 5-year threshold, identifying psychosis, major depression, and a history of attempts as long-term predictors (>5 years), whereas attempt severity was associated with short-term risk (≤5 years).
This study used a nationally representative longitudinal dataset to predict future and concurrent SI among middle-aged Korean adults, enhancing generalizability. By applying four ML models and incorporating SHAP for explainability, the study achieved high predictive accuracy while maintaining interpretability. It addressed class imbalance using advanced techniques such as SMOTE during cross-validation to improve reliability. Importantly, it highlighted multidimensional risk factors beyond mental health, including health status, family relationships, and socioeconomic conditions and offers actionable insights for prevention strategies.
The limitations of this study include its reliance on a single-item, self-reported measure of SI, which may underestimate its prevalence or misclassify individuals. A more robust assessment incorporating clinical interviews and multi-item scales could improve the predictive accuracy. Second, the findings may not be generalizable to other age groups or cultural contexts because the sample was limited to middle-aged Korean adults. Third, although the ML models exhibited improved precision compared to the baseline, the values were still relatively low, which may limit their practical application in clinical settings.
Future research should explore more refined feature engineering and examine the causal relationships between the predictors and SI. Larger cross-cultural datasets and standardized clinical assessments can further enhance predictive models and their utility in suicide prevention.
This study developed predictive models for both future and concurrent SI using four ML methods applied to a nationally representative longitudinal dataset of middle-aged Korean adults. All models demonstrated good to excellent performance when evaluated using the AUC metric.
To address the significant class imbalance inherent in the SI data, we implemented oversampling at each stage of cross-validation, thereby enhancing model reliability and reducing the risk of overfitting. SHAP analyses were used to improve the model interpretability and highlight the key predictive variables.
In addition to psychological indicators, such as depression and self-esteem, our findings emphasize the importance of health, family, and socioeconomic factors in predicting SI. More immediate or tangible stressors, such as family violence and income dissatisfaction, were stronger predictors of concurrent SI.
These results underscore the need for comprehensive suicide prevention strategies that operate at both national and community levels. Such strategies should include family-centered interventions, efforts to enhance physical and mental health, and policy initiatives aimed at reducing socioeconomic disparities, all of which may contribute to reducing SI in middle-aged populations.