Author

Hae-Jin Hu

Date of Award

8-6-2007

Degree Type

Closed Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Yi Pan - Chair

Second Advisor

Yan-Qing Zhang

Third Advisor

Phang C. Tai

Fourth Advisor

Robert W. Harrison

Abstract

With the efforts to understand the protein structure, many computational approaches have been made recently. Among them, the Support Vector Machine (SVM) methods have been recently applied and showed successful performance compared with other machine learning schemes. However, despite the high performance, the SVM approaches suffer from the problem of understandability since it is a black-box model; the predictions made by SVM cannot be interpreted as biologically meaningful way. To overcome this limitation, a new association rule based classifier PCPAR was devised based on the existing classifier, CPAR to handle the sequential data. The performance of the PCPAR was improved more by designing the following two hybrid schemes. The PCPAR/SVM method is a parallel combination of the PCPAR and the SVM and the PCPAR_SVM method is a sequential combination of the PCPAR and the SVM. To understand the SVM prediction, the SVM_PCPAR scheme was developed. The experimental result presents that the PCPAR scheme shows better performance with respect to the accuracy and the number of generated patterns than CPAR method. The PCPAR/SVM scheme presents better performance than the PCPAR, PCPAR_SVM or the SVM_PCPAR and almost equal performance to the SVM. The generated patterns are easily understandable and biologically meaningful. The system sturdiness evaluation and the ROC curve analysis proved that this new scheme is robust and competent.

Share

COinS