Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)


Computer Science

First Advisor

Prof. Y. Pan - Chair

Second Advisor

Prof. G. Qin

Third Advisor

Prof. A. Bourgeois

Fourth Advisor

Prof. A. Zelikovski


Knowledge on cysteine oxidation state and disulfide bond connectivity is of great importance to protein chemistry and 3-D structures. This research is aimed at finding the most relevant features in prediction of cysteines oxidation states and the disulfide bonds connectivity of proteins. Models predicting the oxidation states of cysteines are developed with machine learning techniques such as Support Vector Machines (SVMs) and Associative Neural Networks (ASNNs). A record high prediction accuracy of oxidation state, 95%, is achieved by incorporating the oxidation states of N-terminus cysteines, flanking sequences of cysteines and global information on the protein chain (number of cysteines, length of the chain and amino acids composition of the chain etc.) into the SVM encoding. This is 5% higher than the current methods. This indicates to us that the oxidation states of amino terminal cysteines infer the oxidation states of other cysteines in the same protein chain. Satisfactory prediction results are also obtained with the newer and more inclusive SPX dataset, especially for chains with higher number of cysteines. Compared to literature methods, our approach is a one-step prediction system, which is easier to implement and use. A side by side comparison of SVM and ASNN is conducted. Results indicated that SVM outperform ASNN on this particular problem. For the prediction of correct pairings of cysteines to form disulfide bonds, we first study disulfide connectivity by calculating the local interaction potentials between the flanking sequences of the cysteine pairs. The obtained interaction potential is further adjusted by the coefficients related to the binding motif of enzymes during disulfide formation and also by the linear distance between the cysteine pairs. Finally, maximized weight matching algorithm is applied and performance of the interaction potentials evaluated. Overall prediction accuracy is unsatisfactory compared with the literature. SVM is used to predict the disulfide connectivity with the assumption that oxidation states of cysteines on the protein are known. Information on binding region during disulfide formation, distance between cysteine pairs, global information of the protein chain and the flanking sequences around the cysteine pairs are included in the SVM encoding. Prediction results illustrate the advantage of using possible anchor region information.