Loading...
Thumbnail Image
Item

Methods of Discrete Optimization and Machine Learning for the Analysis of Heterogeneous Genomic Populations

Kuzmin, Kiril
Citations
Altmetric:
Abstract

Many human diseases, including viral infections and cancers, are driven by the evolutionary dynamics of heterogeneous populations of genomic variants. A major type of evolutionary behavior is migration, encompassing viral transmissions and cancer metastasis. This study explores the connections between phylogenetic trees and migration trees through graph homomorphism and examines the relationship between maximum likelihood trees and maximum parsimony trees. It is also demonstrated that machine learning can accurately identify coronaviruses using small portions of their sequence information.

The first part of this study investigates how structural constraints on migration patterns and tree topologies influence the relationship between phylogenies and migration trees. We propose algorithms to assess the compatibility of given phylogenetic and migration trees under various migration scenarios.

The second part examines the relationship between two-state character maximum likelihood trees and maximum parsimony trees, identifying conditions where an optimal solution for a maximum likelihood tree is also a parsimony tree. Properties that simplify maximum likelihood trees are proven, and a closed-form solution is provided for maximum likelihood trees with three taxa.

The third part uses machine learning models, including support vector machine, logistic regression, decision tree, and random forest, to predict the host specificity of coronaviruses based on their spike sequences. These models demonstrated high accuracies, f1 scores, sensitivities, and specificities. Notably, the decision tree model identified protein regions with known biological importance, indicating that spike sequences alone can predict host specificity.

Comments
Description
Date
2024-08-08
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Phylogenetic inference, Migration tree, Viral transmission, Maximum likelihood, Maximum parsimony, Coronavirus
Citation
Kuzmin, Kiril (2024). Methods of Discrete Optimization and Machine Learning for the Analysis of Heterogeneous Genomic Populations. Dissertation, Georgia State University. https://doi.org/10.57709/37330588
Embargo Lift Date
2024-07-15
Embedded videos