Loading...
Thumbnail Image
Item

Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries

Chen, Bernard
Citations
Altmetric:
Abstract

Protein sequence motifs are gathering more and more attention in the field of sequence analysis. The recurring patterns have the potential to determine the conformation, function and activities of the proteins. In our work, we obtained protein sequence motifs which are universally conserved across protein family boundaries. Therefore, unlike most popular motif discovering algorithms, our input dataset is extremely large. As a result, an efficient technique is essential. We use two granular computing models, Fuzzy Improved K-means (FIK) and Fuzzy Greedy K-means (FGK), in order to efficiently generate protein motif information. After that, we develop an efficient Super Granular SVM Feature Elimination model to further extract the motif information. During the motifs searching process, setting up a fixed window size in advance may simplify the computational complexity and increase the efficiency. However, due to the fixed size, our model may deliver a number of similar motifs simply shifted by some bases or including mismatches. We develop a new strategy named Positional Association Super-Rule to confront the problem of motifs generated from a fixed window size. It is a combination approach of the super-rule analysis and a novel Positional Association Rule algorithm. We use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified HHK clustering, which requires no parameter setup to identify the similarities and dissimilarities between the motifs. The positional association rule is created and applied to search similar motifs that are shifted some residues. By analyzing the motifs results generated by our approaches, we realize that these motifs are not only significant in sequence area, but also in secondary structure similarity and biochemical properties.

Comments
Description
Date
2009-07-17
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Positional Association Rule, Super-Rule, protein sequence motif, FIK model, FGK model, Super GSVM-FE, HHK clustering algorithm
Citation
Chen, Bernard (2009). Discovery and Extraction of Protein Sequence Motif Information that Transcends Protein Family Boundaries. Dissertation, Georgia State University. https://doi.org/10.57709/1059452
Embargo Lift Date
2011-11-23
Embedded videos