Tony Yeung

Date of Award


Degree Type

Closed Thesis

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Dr. Rajshekhar Sunderraman - Chair

Second Advisor

Dr. Yanqing Zhang

Third Advisor

Dr. Anu G. Bourgeois


In recent years, the field of bioinformatics has exploded in a scale that is unprecedented. The amount of data generated from different genome projects demands a new and efficient way of information storage and retrieval. The analysis and management of the protein structure information has become one of the main focuses. It is well-known that a protein’s functions differ depending on its structure’s position in 3-dimensional space. Due to the fact that protein structures are exceedingly large, complex, and multi-dimensional, there is a need for a data model that can fulfill the requirements of storing protein structures in accordance to its spatial arrangement and topological relationships and, at the same time, provide tools to analyze the information stored. With the emergence of spatial database, first used in the field of Geographical Information Systems, the data model for protein structure could be based on the geographic model, as they share several similar uncanny traits. The geometry of proteins can be modeled using the spatial types provided in a spatial database. In a similar way, special geometry queries used for geographical analysis can also be used to provide information for analysis on the structure of the proteins. This thesis will explore the mechanics of extracting structural information for a protein from a flat file (PDB), storing that information into a spatial data model based on a spatial data model, and making analysis using geometric operators provided by the spatial database. The database used is Oracle 9i. Most features are provided by the Oracle Spatial package. Queries using the ideas aforementioned will be demonstrated.