Date of Award
Doctor of Philosophy (PhD)
Next Generation Sequencing is a set of relatively recent but already well-established technologies with a wide range of applications in life sciences. Despite the fact that they are constantly being improved, multiple challenging problems still exist in the analysis of high throughput sequencing data. In particular, genome assembly still suffers from inability of technologies to overcome issues related to such structural properties of genomes as single nucleotide polymorphisms and repeats, not even mentioning the drawbacks of technologies themselves like sequencing errors which also hinder the reconstruction of the true reference genomes. Other types of issues arise in transcriptome quantification and differential gene expression analysis. Processing millions of reads requires sophisticated algorithms which are able to compute gene expression with high precision and in reasonable amount of time. Following downstream analysis, the utmost computational task is to infer the activity of biological pathways (e.g., metabolic). With many overlapping pathways challenge is to infer the role of each gene in activity of a given pathway. Assignment products of a gene to a wrong pathway may result in misleading differential activity analysis, and thus, wrong scientific conclusions. In this dissertation I present several algorithmic solutions to some of the enumerated problems above. In particular, I designed scaffolding algorithm for genome assembly and created new tools for differential gene and biological pathways expression analysis.
Mandric, Igor, "Assembly, quantification, and downstream analysis for high trhoughput sequencing data." Dissertation, Georgia State University, 2018.