Loading...
Thumbnail Image
Item

Analysis and Prediction of Infectious Disease Outbreak Incidence and Related Mortality by Integrating Diverse Data Sources Using Statistical Modeling and Machine Learning Methods

Aleksandr Shishkin
Citations
Altmetric:
Abstract

Accurate and timely forecasts of infectious disease incidence and related mortality are critical for effective public health responses. Traditional surveillance data, while invaluable, often suffer from reporting delays, necessitating the exploration of auxiliary data sources. This work leverages internet search data, molecular epidemiological information, and traditional surveillance data to improve outbreak predictions. The first study examines the COVID-19 pandemic burden in Ukraine using excess mortality analysis from 2020 to 2021. By comparing observed all-cause and cause-specific mortality with expected historical trends, the study quantifies the pandemic’s impact. Three distinct waves of excess mortality were identified, corresponding with peaks in lab-confirmed COVID- 19 deaths. Cause-specific analyses revealed significant excess mortality from pneumonia and circulatory system diseases, highlighting the broader health impacts of the pandemic beyond direct COVID-19 fatalities. The second study investigates the utility of Google search queries related to COVID-19 as supplementary data for forecasting incidence and mortality. Predictive keywords were identified through Granger causality tests and cross-correlation analyses. ARIMA, Prophet, and XGBoost models were then employed to compare baseline forecasts (using only traditional surveillance data) with enhanced models incorporating search query data. The inclusion of top-ranked keywords significantly improved predictive accuracy, with gains ranging from 50% to 90% in certain scenarios. The third study develops a novel approach for outbreak investigation and forecasting by integrating molecular data with internet search trends. Hepatitis C virus (HCV) sequence data from the Scott County outbreak were analyzed using Bayesian evolutionary models to estimate historical viral population sizes. These estimates were correlated with Google Trends data, and predictive models were constructed to assess the added value of search data in forecasting disease prevalence. The integration of molecular and internet-based data sources demonstrated potential improvements in predictive performance. Collectively, this dissertation underscores the importance of combining traditional epidemiological data with innovative auxiliary data sources and advanced modeling techniques. The findings contribute to the field of infectious disease epidemiology by offering improved methodologies for outbreak prediction and public health decision-making.

Comments
Description
Date
2025-05-06
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
forecasting, epidemiology, COVID-19, SARS-CoV-2, Google Trends, ARIMA, Prophet, XGBoost, excess mortality, USA, Ukraine, MAE, RMSE, Box-Cox Transformation, incidence, mortality, outbreak
Citation
Aleksandr Shishkin. "Analysis and Prediction of Infectious Disease Outbreak Incidence and Related Mortality by Integrating Diverse Data Sources Using Statistical Modeling and Machine Learning Methods." Dissertation, Georgia State University, 2025. https://doi.org/10.57709/w1p9-ag37
Embargo Lift Date
2025-05-06
Embedded videos