Date of Award

12-15-2016

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Dr. Yanqing Zhang

Second Advisor

Dr. Yi Pan

Third Advisor

Dr. Rajshekhar Sunderraman

Fourth Advisor

Dr. Yichuan Zhao

Abstract

To build reliable prediction models and identify useful patterns, assembling data sets from databases maintained by different sources such as hospitals becomes increasingly common; however, it might divulge sensitive information about individuals and thus leads to increased concerns about privacy, which in turn prevents different parties from sharing information. Privacy Preserving Distributed Data Mining (PPDDM) provides a means to address this issue without accessing actual data values to avoid the disclosure of information beyond the final result. In recent years, a number of state-of-the-art PPDDM approaches have been developed, most of which are based on Secure Multiparty Computation (SMC). SMC requires expensive communication cost and sophisticated secure computation. Besides, the mining progress is inevitable to slow down due to the increasing volume of the aggregated data. In this work, a new framework named Privacy-Aware Non-linear SVM (PAN-SVM) is proposed to build a PPDDM model from multiple data sources. PAN-SVM employs the Secure Sum Protocol to protect privacy at the bottom layer, and reduces the complex communication and computation via Nystrom matrix approximation and Eigen decomposition methods at the medium layer. The top layer of PAN-SVM speeds up the whole algorithm for large scale datasets. Based on the proposed framework of PAN-SVM, a Privacy Preserving Multi-class Classifier is built, and the experimental results on several benchmark datasets and microarray datasets show its abilities to improve classification accuracy compared with a regular SVM. In addition, two Privacy Preserving Feature Selection methods are also proposed based on PAN-SVM, and tested by using benchmark data and real world data. PAN-SVM does not depend on a trusted third party; all participants collaborate equally. Many experimental results show that PAN-SVM can not only effectively solve the problem of collaborative privacy-preserving data mining by building non-linear classification rules, but also significantly improve the performance of built classifiers.

DOI

https://doi.org/10.57709/9444795

Recommended Citation

Lu, Yunmei, "Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis." Dissertation, Georgia State University, 2016.
doi: https://doi.org/10.57709/9444795

Download

COinS

Computer Science Dissertations

Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

DOI

Recommended Citation

Browse

Authors

Computer Science Dissertations

Privacy Preserving Data Mining For Horizontally Distributed Medical Data Analysis

Author

Date of Award

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Abstract

DOI

Recommended Citation

Share

Browse

Authors