PubMed ID:
34562867
Public Release Type:
Journal
Publication Year: 2021
Affiliation: Division of Endocrinology, Diabetes, & Metabolism, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA 21287
DOI:
https://doi.org/10.1016/j.dsx.2021.102278
Authors:
Santhanam P,
Khthir R
Request IDs:
22606
Studies:
Look AHEAD: Action for Health in Diabetes
Machine learning and AI methods have recently become critical tools for research in metabolic diseases (like hypertension, diabetes) in the unfolding era of big-data science (1). For example, retrospective analysis using the SPRINT trial dataset has shown that the visit-to-visit Blood Pressure (BP) variability can be analyzed using the K-Means Clustering with a high level of stability index (2). In another study, using the dataset, it was demonstrated that random forest as a machine learning tool has a good AUC (Area under the Curve) for longitudinal Systolic Blood Pressure (SBP) trends and variability assessment and to elucidate the factors that determine that(3). This study used the random forest to determine the factors determining SBP in persons with diabetes from the LOOK AHEAD study cohort(4). The LOOK AHEAD study was a randomized controlled clinical trial that compared the intensive lifestyle intervention group (that achieved weight loss with dietary modifications and enhanced physical activity ) to a control group that received only traditional support and education(4).Deidentified data was obtained from the NIH-NIDDK repository after obtaining IRB approval from the Johns Hopkins IRB. Baseline data from the LOOK AHEAD was tabulated for the following variables: Waist circumference (cm), SBP(mm/Hg), DBP(mm/Hg), Duration of Diabetes (years), Age (years), Albumin to Creatinine Ratio(ACR), A1C(%), Serum Creatinine(mg/dl), Triglycerides (mg/dl), LDL (mg/dl) and finally, maximal exercise capacity(mets). SBP was the dependent outcome of interest. Random Forest was the machine learning technique employed to study the impact of the above variables on the systolic blood pressure(5). The statistical open source software R was used for the analysis(R© -version 4.0.3, The R Foundation for Statistical Computing, 2020), and the following packages were used: dplyr, Matrix, tidyverse , randomForest and CARET. After excluding incomplete data, there were 4677 observations. There were 500 regression decision trees that were employed with 3 variables tried at each split. After DBP (expectedly the most important predictor of SBP),maximal exercise capacity, age , ACR and serum creatinine were the key variables that determined systolic blood pressure (in that order). The error rate plateaued after the 300 trees (figure 1). The mean of squared residuals was: 178.2373 and 39.05 of variation in SBP was explained by this algorithm. This study shows that the capacity to exercise is an important determinant of SBP in type 2 diabetes (even more than age).