Relationships of Antidepressant Medication With Its Various Factors Including Nitrogen Dioxides Seasonality: Machine Learning Analysis Using National Health Insurance Data
Article information
Abstract
Objective
This study employs machine learning and population-based data to examine major factors of antidepressant medication including nitrogen dioxides (NO2) seasonality.
Methods
Retrospective cohort data came from Korea National Health Insurance Service claims data for 43,251 participants with the age of 15–79 years, residence in the same districts of Seoul and no history of antidepressant medication during 2002–2012. The dependent variable was antidepressant-free months during 2013–2015 and the 103 independent variables for 2012 or 2015 were considered, e.g., particulate matter less than 2.5 micrometer in diameter (PM2.5), PM10, NO2, ozone (O3), sulphur dioxide (SO2) and carbon monoxide (CO) in each of 12 months in 2015.
Results
It was found that the Cox hazard ratios of NO2 were statistically significant and registered values larger than 10 for every three months: March, June–July, October, and December. Based on random forest variable importance and Cox hazard ratios in brackets, indeed, the top 20 factors of antidepressant medication included age (0.0041 [1.69–2.25]), migraine and sleep disorder (0.0029 [1.82]), liver disease (0.0017 [1.33–1.34]), exercise (0.0014), thyroid disease (0.0013), cardiovascular disease (0.0013 [1.20]), asthma (0.0008 [1.19–1.20]), September NO2 (0.0008 [0.01]), alcohol consumption (0.0008 [1.31–1.32]), gender - woman (0.0007 [1.80–1.81]), July NO2 (0.0007 [14.93]), July PM10 (0.0007), the proportion of the married (0.0005), January PM2.5 (0.0004), September PM2.5 (0.0004), chronic obstructive pulmonary disease (0.0004), economic satisfaction (0.0004), January PM10 (0.0003), residents in welfare facilities per 1,000 (0.0003 [0.97]), and October NO2 (0.0003).
Conclusion
Antidepressant medication has strong associations with neighborhood conditions including NO2 seasonality and welfare support.
INTRODUCTION
Major depressive disorder (MDD) is a major contributor for disease burden on the globe [1-4]. MDD, “a mood disorder that causes a persistent feeling of sadness and loss of interest,” registers a variety of severity (mild to severe) and duration (months to years) [1]. Its incidence grew rapidly in the world, i.e., by 47.86% from 172 million to 258 million during 1990 to 2017 [2]. Its years-lost-to-disability ranking was third in the world for 2017 and its disability-adjusted-life-years standing was fifth in Korea for 2015 [3,4]. It is reported to have various predictors such as demographic conditions (age, gender), socioeconomic status (education, employment, income), neighborhood conditions (crowding, housing, pollution, violence), and health-related predictors (drinking, exercise, smoking, diseases, genetics) [5-7].
Specifically, numerous studies and reviews report a positive association between MDD and particulate matter [8-13]. These cohort and cross-sectional studies were characterized by varying numbers of participants (4,008–71,271) and diverse origins including Africa, America, Asia, and Europe. Neuro-inflammation was the central component of a causal pathway between particulate matter and MDD in these studies. For example, a recent study [8] employed machine learning and population-based data to examine major predictors of antidepressant medication including the concentration of particulate matter under 2.5 μm (PM2.5). Based on random forest variable importance, the top 15 predictors of antidepressant medication during 2013–2015 included cardiovascular disease, age, household income, gender, the district-level proportion of recipients of national basic living security program benefits, district-level social satisfaction, diabetes mellitus, January 2012 PM2.5, district-level street ratio, drinker, chronic obstructive pulmonary disease, district-level economic satisfaction, exercise, March 2012 PM2.5, and November 2012 PM2.5. This recent study concluded that antidepressant medication had strong associations with neighborhood conditions including socioeconomic satisfaction and the seasonality of particulate matter.
However, the previous study covered a limited set of disease history and air pollution, i.e., diabetes mellitus, cardiovascular disease, chronic obstructive pulmonary disease, and PM2.5. Given the significant associations of MDD with these diseases and PM2.5, it is important to investigate the association of antidepressant medication with other disease histories and air pollutions as well, that is, PM10, nitrogen dioxides (NO2), ozone (O3), sulphur dioxide (SO2), and carbon monoxide (CO). Specifically, one study used time series analysis and 84,207 health insurance records in China’s 57 cities, reporting NO2 to be a risk factor for MDD in terms of hospitalization [14]. However, no machine learning investigation has been done on this topic. In this context, this study uses machine learning and population-based data to analyze major factors of antidepressant medication including 16 disease predictors and 72 air pollutions including NO2. To our best knowledge, this study presents the most comprehensive analysis for the determinants of antidepressant medication, using a population-based cohort of 43,251 participants and the richest collection of 103 predictors such as 6 demographic/socioeconomic factors, 16 disease predictors, 9 district-level factors, and 72 district-level air pollutions including PM2.5, PM10, NO2, O3, SO2, and CO in each of 12 months (e.g, January PM2.5, …, December PM2.5, January NO2, …, December NO2).
METHODS
Participants
The source of retrospective cohort data for this study was Korea National Health Insurance Service sample research data for 1 million subscribers in Korea (For more description, see https://nhiss.nhis.or.kr/bd/ab/bdaba022eng.do). The final data for this study included 43,251 participants with the age of 15–79 years, residence in the same districts of Seoul and no history of antidepressant medication during 2002–2012. This study was approved by the Institutional Review Board (IRB) of Korea University Anam Hospital on August 19, 2019 (2019AN0354). Informed consent was waived by the IRB.
Variables
The dependent variable was antidepressant-free months during 2013–2015. The following antidepressant medications for all psychiatric disorders (F01–F99) were included: selective serotonin reuptake inhibitor, serotonin-norepinephrine reuptake inhibitor, tricyclic antidepressant, monoamine oxidase inhibitor, and others (i.e., bupropion, trazodone, mirtazapine, and vortioxetine) [13]. The 103 independent variables for 2012 were considered: 6 demographic/socioeconomic conditions including gender, age, household income (an insurance fee with the range of 1 [the lowest group] to 10 [the highest group]), smoker (never, former, current), alcohol consumption (0, 1–2, 3–4, ≥5 times per week), and exercise (0 vs. ≥1 times per week); 16 disease factors (each being coded as no vs. yes), i.e., diabetes mellitus, cardiovascular disease, chronic obstructive pulmonary disease, thyroid disease, fat and hyper nutrition, malnutrition, other disorders of glycoprotein metabolism, metabolism, asthma, renal failure, heart failure, migraine and sleep disorder, cerebral palsy and other paralysis syndrome, skin disease, liver disease, and malignant neoplasm; 9 district-level conditions such as population, proportion of the married, economic satisfaction (0–10), social satisfaction (0–10), residents in welfare facilities per 1,000, deprivation index, crude birth rate, recipients of national basic living security program benefits per 1,000, and street ratio; 72 district-level air pollutions including PM2.5, PM10, NO2, O3, SO2, and CO in each of 12 months in 2015 [15-19].
Analysis
A comparison was made between the random forest for survival analysis (the random forest) and the Cox proportional-hazards model (the Cox model) [8,20,21] regarding the prediction of antidepressant-free months. The training and validation sets consisted of 32,438 observations (75%) and 10,813 observations (25%), respectively. The number of trees was 1,000, the maximum depth of the tree was not predetermined and the logrank splitting rule was used for the random forest. Random forest permutation variable importance, the contribution of a variable for the performance of the model, was adopted to identify major predictors of antidepressant-free months. RStudio 1.1.453 (RStudio, Boston, MA, USA) was employed for the analysis.
RESULTS
The descriptive statistics of the 43,251 participants are presented in Tables 1 and 2. The proportion of those taking antidepressants during 2013–2015 was 2.26% (978). The respective statistics of those with male status, the age of 60 years or higher, and the household income of the 7th decile or higher were 49% (21,014), 30% (12,698), and 51% (22,035) as of 2012. The 21%, 48%, and 79% of the participants were current smokers, current drinkers, and those with exercise as of 2012, respectively. Likewise, the corresponding statistics of those with disease history were 25% (10,688) for migraine and sleep disorder, 34% (14,694) for liver disease, 17% (7,494) for thyroid disease, 38% (16,584) for cardiovascular disease, 25% (10,654) for asthma, and 38% (16,525) for chronic obstructive pulmonary disease as of 2012. The yearly averages of air pollutions over 25 districts in Seoul were 26 μg/m3 for PM2.5, 48 μg/m3 for PM10, 0.023 ppm for NO2, 0.027 ppm for O3, 0.005 ppm for SO2, and 0.517 ppm for CO as of 2015. Finally, the district averages for the proportion of the married, economic satisfaction, and residents in welfare facilities were 0.55, 5.54 (over 10), and 19 per 1,000 as of 2012, correspondingly. The performance measures of the random forest and the Cox model are shown in Table 3. The accuracy of the random forest was around 59% and the C-Index (accuracy) of the Cox models were 60% across board. Based on random forest variable importance (Table 4) and Cox hazard ratios in brackets (Table 5), the top 20 factors of antidepressant medication during 2013–2015 included age (0.0041 [1.69–2.25]), migraine and sleep disorder (0.0029 [1.82]), liver disease (0.0017 [1.33–1.34]), exercise (0.0014), thyroid disease (0.0013), cardiovascular disease (0.0013 [1.20]), asthma (0.0008 [1.19–1.20]), 2015 September NO2 (0.0008 [0.01]), alcohol consumption (0.0008 [1.31–1.32]), gender-woman (0.0007 [1.80–1.81]), 2015 July NO2 (0.0007 [14.93]), 2015 July PM10 (0.0007), the proportion of the married (0.0005), 2015 January PM2.5 (0.0004), 2015 September PM2.5 (0.0004), chronic obstructive pulmonary disease (0.0004), economic satisfaction (0.0004), 2015 January PM10 (0.0003), residents in welfare facilities per 1,000 (0.0003 [0.97]), and 2015 October NO2 (0.0003). It was also found in Table 5 that the hazard ratios of NO2 were statistically significant and registered values larger than 10 for every three months: 2015 March, June–July, October, and December.
DISCUSSION
As a unique contribution of this study, the Cox hazard ratios of NO2 were found to be statistically significant and registered values larger than 10 for every three months: March, June–July, October, and December. Based on random forest variable importance, in addition, the top 20 factors of antidepressant medication during 2013–2015 included age, migraine and sleep disorder, liver disease, exercise, thyroid disease, cardiovascular disease, asthma, 2015 September NO2, alcohol consumption, gender - woman, 2015 July NO2, 2015 July PM10, the proportion of the married, 2015 January PM2.5, 2015 September PM2.5, chronic obstructive pulmonary disease, economic satisfaction, 2015 January PM10, residents in welfare facilities per 1,000, and 2015 October NO2.
To our best knowledge, this study presents the most comprehensive analysis for the determinants of antidepressant medication, using a population-based cohort of 43,251 participants and the richest collection of 103 predictors such as 6 demographic/socioeconomic factors, 16 disease predictors, 9 district-level factors, and 72 district-level air pollutions including PM2.5, PM10, NO2, O3, SO2, and CO in each of 12 months. Moreover, this study draws four important clinical and policy implications below. Firstly, the findings of this study agree with those of existing literature on the positive associations of antidepressant medication with age, cardiovascular disease, alcohol consumption, and gender-woman [8]. These four predictors ranked within the top 20 in random forest variable importance and were statistically significant in the Cox model hazard ratios in both studies. Secondly, the results of this study are consistent with those of previous reviews on the positive relationships of depression with sleep disorder [22], liver disease [23], and asthma [24]. The importance rankings of these three predictors were within the top 20 and their hazard ratios were statistically significant in this study, i.e., migraine and sleep disorder (1.82), liver disease (1.33–1.34) and asthma (1.19–1.20). Little population-based studies have been done and no machine learning literature has been available on these important issues. In this vein, this study makes a unique contribution in this direction.
Thirdly, this study brings new sights on the seasonality of NO2 and antidepressant medication. A recent meta-analysis [25] examined 39 original studies on the positive associations of depression with various air pollutions in terms of the relative risk for short-term exposure (less than one month): PM2.5 (1.009), PM10 (1.009), NO2 (1.022), SO2 (1.024), O3 (1.011), and CO (1.062). These studies reviewed were published during 2007–2021 and covered some of the six air pollutions included in this study (i.e., PM2.5, PM10, NO2, O3, SO2, and CO). However, most of these previous studies were cross-sectional and none of them addressed the issue of seasonality based on machine learning. In this study, 2015 July NO2 ranked within the top 20 in variable importance and its hazard ratio was statistically significant, i.e., 14.93. It was also found in this study (Table 5) that the hazard ratios of NO2 were statistically significant and registered values larger than 10 for every three months: March, June–July, October, and December. This unique contribution has not been available in the existing literature. Finally, the variable importance of residents in welfare facilities per 1,000 was within the top 20 and its hazard ratio was statistically significant in this study (0.97). These findings affirm the importance of social determinants in the prediction of antidepressant medication [8].
However, this study had some limitations. Firstly, machine learning is a data-driven approach and this study did not consider pathways among antidepressant medication and its various predictors including NO2 seasonality. Little analysis has been done and more examination is needed in this direction. Secondly, it was not the scope of this study to evaluate the effects of different subsampling methods on random forest variable importance [26]. Thirdly, it needs to be noted that there would exist room for improvement in the performance measures of the random forest and the Cox model within the ranges of 59%–69%. Increasing the sample size would be an effective solution. Fourthly, the hyper-parameters of the random forest came from existing literature [8,20,21] and hyper-parameter tuning is expected to strengthen its performance. Fifthly, data on PM10, NO2, O3, SO2, and CO before 2015 were not available at the time of data collection. The model performance and validity would have been higher with these data as of 2012. Sixthly, the binary categories of psychiatric disorders were defined based on the International Classification of Diseases 10th Revision code code and this could be a source of potential bias. Finally, uniting various kinds of deep learning approaches for various kinds of MDD data would bring new innovations and deeper insights in this line of research.
In conclusion, antidepressant medication has strong associations with neighborhood conditions including NO2 seasonality and welfare support. Strong interventions for these factors are really needed for the effective management of MDD.
Notes
Availability of Data and Material
The code and data presented in this study are not publicly available. But the code and data are available from the corresponding author upon reasonable request and under the permission of Korea National Health Insurance Service.
Conflicts of Interest
The authors have no potential conflicts of interest to disclose.
Author Contributions
Conceptualization: Kwang-Sig Lee, Byung0-Joo Ham. Data curation: all authors. Formal analysis: all authors. Funding acquisition: Kwang-Sig Lee, Byung-Joo Ham. Investigation: all authors. Methodology: all authors. Project administration: Kwang-Sig Lee, Byung-Joo Ham. Resources: Kwang-Sig Lee, Byung-Joo Ham. Software: Kwang-Sig Lee, Byung-Joo Ham. Supervision: Kwang-Sig Lee, Byung-Joo Ham. Validation: all authors. Visualization: all authors. Writing—original draft: Kwang-Sig Lee, Byung-Joo Ham. Writing—review & editing: Kwang-Sig Lee, Byung-Joo Ham.
Funding Statement
This work was supported by (1) National Research Foundation of Korea (NRF) grant funded by the Ministry of Education, Science and Technology of South Korea (No.NRF-2020M3E5D9080792) and (2) Korea Health Industry Development Institute grant funded by the Ministry of Health and Welfare of South Korea (No.HI22C1302 [Korea Health Technology R&D Project]). The funders had no role in the design of the study, the collection, analysis and interpretation of the data and the writing of the manuscript.