Loading...
Thumbnail Image
Item

Efficient Privacy-Preserving Machine Learning: From Training Neural Networks to Fine-Tuning Large Language Models

Panzade, Prajwal
Citations
Altmetric:
Abstract

Machine learning (ML) has emerged as a transformative technology with extensive applications in diverse fields such as computer vision, natural language processing, and audio and speech processing. The ML pipeline generally consists of two fundamental phases: training, where models are trained using labeled data, and inference, where these trained models are deployed to analyze and make predictions on unseen data. Neural networks, renowned for their exceptional performance, are extensively adopted in practice and necessitate large volumes of data for effective training. Although major technology companies provide Machine Learning as a Service (MLaaS) to facilitate access to ML services, these services present substantial privacy concerns, particularly when dealing with sensitive data such as health records or financial information. Regulations like the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) further emphasize the need for privacy protection within MLaaS platforms.

This dissertation addresses these critical privacy concerns by proposing novel frameworks to ensure the privacy of user data throughout the machine learning computations, with a particular emphasis on the training phase. We explore the integration of advanced cryptographic techniques, specifically functional encryption (FE) and fully homomorphic encryption (FHE), into the ML workflow. One significant contribution of this work is the development of secure activation functions and the enhancement of FE's performance for privacy-preserving machine learning (PPML). This approach enables the execution of ML algorithms on encrypted data, thereby safeguarding sensitive information. This facilitates the development of a privacy-preserving neural network pipeline specifically designed for image classification tasks. This pipeline leverages FE to perform computations on encrypted data, ensuring that the underlying sensitive information remains confidential.

Additionally, we address the challenges associated with FE-based machine learning by developing fine-tuning systems using FHE. These systems enable the adaptation of pre-trained models to new tasks while maintaining data privacy. Finally, this work extends the application of privacy-preserving techniques to modern ML architectures, including vision transformer models and large language models (LLMs). By incorporating privacy-preserving mechanisms into these advanced models, we demonstrate the feasibility and effectiveness of our proposed systems in safeguarding user data across a wide range of ML applications. The findings of this dissertation significantly advance the field of privacy-preserving machine learning, providing a foundation for privacy-preserving MLaaS deployments and paving the way for future research in this critical area.

Comments
Description
Date
2024-08-07
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
Privacy-preserving Machine Learning, Functional Encryption, Fully Homomorphic Encryption, Privacy-preserving Neural Networks, Secure Computation, Privacy-preserving Fine-tuning
Citation
Panzade, Prajwal (2024). Efficient Privacy-Preserving Machine Learning: From Training Neural Networks to Fine-Tuning Large Language Models. Dissertation, Georgia State University. https://doi.org/10.57709/37395685
Embargo Lift Date
2027-07-26
Embedded videos