Author ORCID Identifier

0000-0002-6154-1068

Date of Award

Summer 8-8-2024

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Zhipeng Cai

Second Advisor

Daniel Takabi

Abstract

State-of-the-art machine learning techniques, particularly pre-trained language models, have demonstrated exceptional performance across a range of computer science domains. These models, often fine-tuned for specific tasks, have shown exceptional proficiency in natural language understanding, text generation, and various language-related tasks like sentiment analysis, machine translation, and question-answering. Their success can be attributed to extensive training on vast amounts of text data, enabling them to capture intricate linguistic patterns and contextual information. Furthermore, the accessibility of these pre-trained models has democratized research and development, fostering innovation and advancements in fields such as information retrieval, chatbots, and automated content generation. However, it is crucial to recognize the vulnerability of these state-of-the-art language models to adversarial attacks and tasks. Recent studies have revealed their susceptibility to subtle input manipulations aimed at misleading them. Adversarial attacks on language models involve injecting carefully crafted perturbations into input data, leading the model to produce incorrect or undesirable outputs. This vulnerability raises concerns about the reliability and security of these models, particularly in security-critical domains such as automated content moderation and fraud detection. Addressing and mitigating these vulnerabilities are paramount to ensuring the responsible and secure deployment of pre-trained language models in real-world applications. To tackle these security challenges, this dissertation focuses on developing efficient adversarial attacks and robust pre-trained language models against such attacks, highlighting the importance of addressing this critical issue in natural language processing. In our first work, we introduce an efficient adversarial attack model for generating context-aware adversarial examples to enhance the adversarial training of pre-trained language models. Our subsequent work centers on the design of a robust sentence embedding framework that improves generalization and robustness across various text representation tasks, even when facing different adversarial attacks and tasks. As a more comprehensive endeavor, we ultimately propose and implement straightforward yet effective sentence embeddings utilizing a token-level perturbation generator with a novel adversarial token-detection objective. This approach aims to generate high-quality and more resilient sentence embeddings, thereby enhancing the overall robustness of language models in both adversarial and non-adversarial settings.

File Upload Confirmation

1

Share

COinS