Date of Award

12-20-2004

Degree Type

Closed Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Dr. Yi Pan - Chair

Second Advisor

Dr. Rajshekhar Sunderraman

Third Advisor

Dr. Michael Weeks

Abstract

A learning task with thousands of training examples in Support Vector Machine (SVM) demands large amounts of memory and time requirements. SVMlight by Dr. Thorsten Joachims has been implemented in C using a fast optimizing algorithm for handling thousands of such support vectors. SVMlight solves the problem of classification, pattern recognition, regression and learning ranking function. The C code also provides methods for XiAlpha estimation of error rate and precision. Implementing these two methods leads to generalized performance of Support Vector Machine even for computation intensive text classification functions. SVMlight code allows users to define their own kernel functions. The SVMlight software employs an efficient algorithm and minimizes the cost, but it still takes considerable amount of time for computing thousands of support vectors and training examples. This time can be still reduced by parallelizing the code. In our work we refined the SVMlight code by removing unnecessary iterations and rewriting it as cost efficient. Then we parallelized the code individually using two different types, OpenMP and POSIX Threads shared memory parallelism. The code is parallelized for these two methods on Intel’s C compiler for Linux 7.1 using hyper threading technology. The parallelized code is tested for protein structure prediction. Different types of Protein Sequences are tested on these methods by varying the number of training examples and support vectors. The time consumption and speedup are calculated for both OpenMP and Pthreads. Implementation of OpenMP and Pthreads together showed good increase in speedup.

Share

COinS