Date of Award

Spring 5-13-2016

Degree Type


Degree Name

Master of Public Health (MPH)


Public Health

First Advisor

Ruiyan Luo

Second Advisor

Karen Conneely


Background: Gene expression is regulated via highly coordinated epigenetic changes, the most studied of which is DNA methylation (DNAm). Many studies have shown that DNAm is linearly associated with age, and some have even used DNAm data to build predictive models of human age, which are immensely important considering that DNAm can predict health outcomes, such as all-cause mortality, better than chronological age. Nevertheless, few studies have investigated non-linear relationships between DNAm and age, which could potentially improve these predictive models. While such investigations are relevant to predicting health outcomes, non-linear relationships between DNAm and age can also add to our understanding of biological responses to late-life events, such as diseases that afflict the elderly.

Objectives: We aim to (1) examine non-linear relationships between DNAm and age at specific loci on the genome and (2) build upon regularization methods by comparing prediction errors between models with both non-transformed and square-root transformed predictors to models that include only non-transformed predictors. We used both the sparse partial least squares (SPLS) regression model and the lasso regression model to make our comparisons.

Results: We found two age-differentially methylated sites implicated in the regulation of a gene known as KLF14, which could be involved in an immunosenescent phenotype. Inclusion of the square-root transformed variables had little effect on the prediction error of the SPLS model. On the other hand, the prediction error increased substantially in the lasso regression model, particularly when few predictors (70) were included.

Conclusion: The growing amount and complexity of biological data coupled with advances in computational technology are indispensable to our understanding of biological pathways and perplexing biological phenomena. Moreover, high-dimensional biological data have enormous implications for clinical practice. Our findings implicate a possible biological pathway involved in immunosenescence. While we were unable to improve the predictive models of human age, future research should investigate other possible non-linear relationships between DNAm and human age, considering that such statistical methods can improve predictions of health outcomes.