Date of Award

12-2009

Degree Type

Closed Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Alexander Zelikovsky - Chair

Second Advisor

Raj Sunderraman

Third Advisor

Saeid Belkasim

Fourth Advisor

Victor Olman

Abstract

The availability of large gene expression microarray data has brought along many challenges for biological data mining. Many different clustering methods have been proposed and widely used to analyze gene expression data. The underlying concept allows to identify sets of genes sharing similar expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and data sets. Currently, there are several biclustering methods that use different techniques; however, it is not clear how to compare the resulted biclusters with respect to biological relevance. So far, there are no available guidelines for choosing a biclustering technique from available ones. In this work, we propose two new Mean Squared Residue (MSR) based biclustering methods. The first method is a dual biclustering algorithm which finds a set of biclusters using a greedy approach. The second method combines dual biclustering algorithm with quadratic programming. The dual biclustering algorithm reduces the size of the matrix, so that the quadratic program can find an optimal bicluster reasonably fast. We also describe the comparison method, explain how we handle bicluster’s overlap and how we treat missing data.

Share

COinS