Loading...
Thumbnail Image
Item

A New Scalable, Portable, and Memory-Efficient Predictive Analytics Framework for Predicting Time-to-Event Outcomes in Healthcare

Manyam, Ramesh
Citations
Altmetric:
Abstract

Time-to-event outcomes are prevalent in medical research. To handle these outcomes, as well as censored observations, statistical and survival regression methods are widely used based on the assumptions of linear association; however, clinicopathological features often exhibit nonlinear correlations. Machine learning (ML) algorithms have been recently adapted to effectively handle nonlinear correlations. One drawback of ML models is that they can model idiosyncratic features of a training dataset. Due to this overlearning, ML models perform well on the training data but are not so striking on test data. The features that we choose indirectly influence the performance of ML prediction models. With the expansion of big data in biomedical informatics, appropriate feature engineering and feature selection are vital to ML success. Also, an ensemble learning algorithm helps decrease bias and variance by combining the predictions of multiple models.

In this study, we newly constructed a scalable, portable, and memory-efficient predictive analytics framework, fitting four components (feature engineering, survival analysis, feature selection, and ensemble learning) together. Our framework first employs feature engineering techniques, such as binarization, discretization, transformation, and normalization on raw dataset. The normalized feature set was applied to the Cox survival regression that produces highly correlated features relevant to the outcome.The resultant feature set was deployed to “eXtreme gradient boosting ensemble learning” (XGBoost) and Recursive Feature Elimination algorithms. XGBoost uses a gradient boosting decision tree algorithm in which new models are created sequentially that predict the residuals of prior models, which are then added together to make the final prediction.

In our experiments, we analyzed a cohort of cardiac surgery patients drawn from a multi-hospital academic health system. The model evaluated 72 perioperative variables that impact an event of readmission within 30 days of discharge, derived 48 significant features, and demonstrated optimum predictive ability with feature sets ranging from 16 to 24. The area under the receiver operating characteristics observed for the feature set of 16 were 0.8816, and 0.9307 at the 35th, and 151st iteration respectively. Our model showed improved performance compared to state-of-the-art models and could be more useful for decision support in clinical settings.

Comments
Description
Date
2019-12-16
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Ensemble Learning, Extreme Gradient Boosting, XGBoost, Gradient Decision Tree Algorithm, Coronary Artery Bypass Grafting Surgery, Predictive Analytics Modeling, Survival Regression, Advanced Machine Learning Algorithms, Deep Learning, Artificial Intelligence, Feature Engineering, Feature Selection, Recursive Feature Elimination, CABG, Cox Proportional Hazards Model
Citation
Manyam, Ramesh (2019). A New Scalable, Portable, and Memory-Efficient Predictive Analytics Framework for Predicting Time-to-Event Outcomes in Healthcare. Dissertation, Georgia State University. https://doi.org/10.57709/15936528
Embargo Lift Date
2021-12-04
Embedded videos