Author

Yang LiFollow

Author ORCID Identifier

https://orcid.org/0009-0006-4018-5582

Date of Award

8-7-2024

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Shihao Ji

Second Advisor

Rajshekhar Sunderraman

Third Advisor

Murray Patterson

Fourth Advisor

Wenzhan Song

Abstract

Deep Neural Networks (DNNs) have achieved significant success across various applications. However, the increasing number of parameters in state-of-the-art architectures presents challenges such as overfitting and high computational costs. Additionally, with the rising adoption of large language models (LLMs) and the growing demand for per-user or per-task model customization, parameter-efficient fine-tuning has become crucial. Consequently, the exploration of neural network efficiency has emerged as a vibrant and dynamic research area, focusing on optimizing model performance while minimizing resource usage.

This dissertation explores neural network efficiency in two directions: pruning and parameter-efficient fine-tuning. Three novel pruning algorithms—L0-ARM, NPN, and Dep-L0—are introduced. L0-ARM enhances L0-based pruning with the Augment-Reinforce-Merge gradient estimator, demonstrating superior performance in sparsifying networks. Building on L0-ARM, the Neural Plasticity Network (NPN) enables both network pruning and expansion within the same framework. To address the inconsistencies of L0-based methods on large-scale tasks, Dep-L0 introduces dependency-enabled L0 regularization, leveraging dependency modeling for binary gates.

In the realm of parameter-efficient fine-tuning (PEFT), this dissertation introduces VB-LoRA, which implements a novel "divide-and-share" paradigm to address the limitations of low-rank decomposition across matrix dimensions, modules, and layers by globally sharing parameters through a vector bank. The proposed VB-LoRA method composites all low-rank matrices of LoRA from a shared vector bank using a differentiable top-k admixture module. This approach enables VB-LoRA to achieve extreme parameter efficiency while maintaining performance that is comparable to or better than state-of-the-art PEFT methods.

DOI

https://doi.org/10.57709/37370410

File Upload Confirmation

1

Share

COinS