Computer Science Dissertations

Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries

Bernard ChenFollow

Date of Award

7-17-2009

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dr. Yi Pan - Chair

Second Advisor

Dr. Yanqing Zhang

Third Advisor

Dr. Robert. W. Harrison

Fourth Advisor

Dr. Phang C. Tai

Abstract

Protein sequence motifs are gathering more and more attention in the field of sequence analysis. The recurring patterns have the potential to determine the conformation, function and activities of the proteins. In our work, we obtained protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is essential. We use two granular computing models, Fuzzy Improved K-means (FIK) and Fuzzy Greedy K-means (FGK), in order to efficiently generate protein motif information. After that, we develop an efficient Super Granular SVM Feature Elimination model to further extract the motif information. During the motifs searching process, setting up a fixed window size in advance may simplify the computational complexity and increase the efficiency. However, due to the fixed size, our model may deliver a number of similar motifs simply shifted by some bases or including mismatches. We develop a new strategy named Positional Association Super-Rule to confront the problem of motifs generated from a fixed window size. It is a combination approach of the super-rule analysis and a novel Positional Association Rule algorithm. We use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified HHK clustering, which requires no parameter setup to identify the similarities and dissimilarities between the motifs. The positional association rule is created and applied to search similar motifs that are shifted some residues. By analyzing the motifs results generated by our approaches, we realize that these motifs are not only significant in sequence area, but also in secondary structure similarity and biochemical properties.

DOI

https://doi.org/10.57709/1059452

Recommended Citation

Chen, Bernard, "Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries." Dissertation, Georgia State University, 2009.
doi: https://doi.org/10.57709/1059452

Download

Included in

Computer Sciences Commons

COinS

Computer Science Dissertations

Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

DOI

Recommended Citation

Included in

Browse

Authors

Computer Science Dissertations

Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries

Author

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

DOI

Recommended Citation

Included in

Share

Browse

Authors