Author ORCID Identifier

0000-0001-7417-9937

Date of Award

8-8-2024

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Pavel Skums

Abstract

Many human diseases, including viral infections and cancers, are driven by the evolutionary dynamics of heterogeneous populations of genomic variants. A major type of evolutionary behavior is migration, encompassing viral transmissions and cancer metastasis. This study explores the connections between phylogenetic trees and migration trees through graph homomorphism and examines the relationship between maximum likelihood trees and maximum parsimony trees. It is also demonstrated that machine learning can accurately identify coronaviruses using small portions of their sequence information.

The first part of this study investigates how structural constraints on migration patterns and tree topologies influence the relationship between phylogenies and migration trees. We propose algorithms to assess the compatibility of given phylogenetic and migration trees under various migration scenarios.

The second part examines the relationship between two-state character maximum likelihood trees and maximum parsimony trees, identifying conditions where an optimal solution for a maximum likelihood tree is also a parsimony tree. Properties that simplify maximum likelihood trees are proven, and a closed-form solution is provided for maximum likelihood trees with three taxa.

The third part uses machine learning models, including support vector machine, logistic regression, decision tree, and random forest, to predict the host specificity of coronaviruses based on their spike sequences. These models demonstrated high accuracies, f1 scores, sensitivities, and specificities. Notably, the decision tree model identified protein regions with known biological importance, indicating that spike sequences alone can predict host specificity.

File Upload Confirmation

1

Share

COinS