Author ORCID Identifier
Date of Award
Summer 8-7-2024
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science
First Advisor
Zhipeng Cai
Second Advisor
Daniel Takabi
Third Advisor
Wei Li
Fourth Advisor
Yi Ding
Abstract
Machine learning (ML) has emerged as a transformative technology with extensive applications in diverse fields such as computer vision, natural language processing, and audio and speech processing. The ML pipeline generally consists of two fundamental phases: training, where models are trained using labeled data, and inference, where these trained models are deployed to analyze and make predictions on unseen data. Neural networks, renowned for their exceptional performance, are extensively adopted in practice and necessitate large volumes of data for effective training. Although major technology companies provide Machine Learning as a Service (MLaaS) to facilitate access to ML services, these services present substantial privacy concerns, particularly when dealing with sensitive data such as health records or financial information. Regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) further emphasize the need for privacy protection within MLaaS platforms.
This dissertation addresses these critical privacy concerns by proposing novel frameworks to ensure the privacy of user data throughout the machine learning computations, with a particular emphasis on the training phase. We explore the integration of advanced cryptographic techniques, specifically functional encryption (FE) and fully homomorphic encryption (FHE), into the ML workflow. One significant contribution of this work is the development of secure activation functions and the enhancement of FE's performance for privacy-preserving machine learning (PPML). This approach enables the execution of ML algorithms on encrypted data, thereby safeguarding sensitive information. This facilitates the development of a privacy-preserving neural network pipeline specifically designed for image classification tasks. This pipeline leverages FE to perform computations on encrypted data, ensuring that the underlying sensitive information remains confidential.
Additionally, we address the challenges associated with FE-based machine learning by developing fine-tuning systems using FHE. These systems enable the adaptation of pre-trained models to new tasks while maintaining data privacy. Finally, this work extends the application of privacy-preserving techniques to modern ML architectures, including vision transformer models and large language models (LLMs). By incorporating privacy-preserving mechanisms into these advanced models, we demonstrate the feasibility and effectiveness of our proposed systems in safeguarding user data across a wide range of ML applications. The findings of this dissertation significantly advance the field of privacy-preserving machine learning, providing a foundation for privacy-preserving MLaaS deployments and paving the way for future research in this critical area.
DOI
https://doi.org/10.57709/37395685
Recommended Citation
Panzade, Prajwal, "Efficient Privacy-Preserving Machine Learning: From Training Neural Networks to Fine-Tuning Large Language Models." Dissertation, Georgia State University, 2024.
doi: https://doi.org/10.57709/37395685
File Upload Confirmation
1