Date of Award

Spring 5-13-2016

Degree Type


Degree Name

Doctor of Philosophy (PhD)


Public Health

First Advisor

Christine Stauber, PhD

Second Advisor

Ruiyan Luo, PhD

Third Advisor

Ciara O'Reilly, PhD

Fourth Advisor

Robert M. Hoekstra, PhD


Worldwide diarrheal disease is a leading cause of morbidity and mortality in children less than five years of age. Incidence and disease severity remain the highest in sub-Saharan Africa. Kenya has an estimated 400,000 severe diarrhea episodes and 9,500 diarrhea-related deaths per year in children. Current statistical methods for estimating etiological and exposure risk factors for moderate-to-severe diarrhea (MSD) in children are constrained by the inability to assess a large number of parameters due to limitations of sample size, complex relationships, correlated predictors, and model assumptions of linearity. This dissertation examines machine learning statistical methods to address weaknesses associated with using traditional logistic regression models. The studies presented here investigate data from a 4-year, prospective, matched case-control study of MSD among children less than five years of age in rural Kenya from the Global Enteric Multicenter Study. The three machine learning approaches were used to examine associations with MSD and include: least absolute shrinkage and selection operator, classification trees, and random forest.

A principal finding in all three studies was that machine learning methodological approaches are useful and feasible to implement in epidemiological studies. All provided additional information and understanding of the data beyond using only logistic regression models. The results from all three machine learning approaches were supported by comparable logistic regression results indicating their usefulness as epidemiological tools. This dissertation offers an exploration of methodological alternatives that should be considered more frequently in diarrheal disease epidemiology, and in public health in general.