Mathematics Theses

Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data

Mourad AtlasFollow

Date of Award

5-12-2005

Degree Type

Closed Thesis

Degree Name

Master of Science (MS)

Department

Mathematics and Statistics

First Advisor

Dr. Susmita Datta - Chair

Second Advisor

Dr. Saied Belkasim

Third Advisor

Dr. Gengsheng Qin

Abstract

Currently, clustering applications use classical methods to partition a set of data (or objects) in a set of meaningful sub-classes, called clusters. A cluster is therefore a collection of objects which are “similar” among them, thus can be treated collectively as one group, and are “dissimilar” to the objects belonging to other clusters. However, there are a number of problems with clustering. Among them, as mentioned in [Datta03], dealing with large number of dimensions and large number of data items can be problematic because of computational time. In this thesis, we investigate all clustering algorithms used in [Datta03] and we present a parallel solution to minimize the computational time. We apply parallel programming techniques to the statistical algorithms as a natural extension to sequential programming technique using R. The proposed parallel model has been tested on a high throughput dataset. It is microarray data on the transcriptional profile during sporulation in budding yeast. It contains more than 6,000 genes. Our evaluation includes clustering algorithm scalability pertaining to datasets with varying dimensions, the speedup factor, and the efficiency of the parallel model over the sequential implementation. Our experiments show that the gene expression data follow the pattern predicted in [Datta03] that is Diana appears to be solid performer also the group means for each cluster coincides with that in [Datta03]. We show that our parallel model is applicable to the clustering algorithms and more useful in applications that deal with high throughput data, such as gene expression data.

DOI

https://doi.org/10.57709/14344075

Recommended Citation

Atlas, Mourad, "Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data." Thesis, Georgia State University, 2005.
doi: https://doi.org/10.57709/14344075

Download

COinS

Mathematics Theses

Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

DOI

Recommended Citation

Browse

Authors

Mathematics Theses

Parallel Computing in Statistical-Validation of Clustering Algorithm for the Analysis of High throughput Data

Author

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Abstract

DOI

Recommended Citation

Share

Browse

Authors