Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 2.47 KB

2407.01906.md

File metadata and controls

20 lines (15 loc) · 2.47 KB

Background

  • Background This paper discusses parameter-efficient fine-tuning (PEFT) of large language models (LLMs), which is crucial when customizing LLMs with constrained resources. However, current PEFT research has primarily focused on LLMs with dense architectures, while sparse-architecture LLMs remain relatively unexplored.

  • Existing Work Existing PEFT methods such as Low-Rank Adaptation (LoRA) and P-Tuning have been applied to dense-architecture LLMs, but are not well-adapted for sparse-architecture LLMs like those with a Mixture-of-Experts (MoE) structure. MoE architecture activates different experts for different tasks, and there is a lack of effective tuning techniques for this structure in the existing literature.

Core Contributions

  • Introduced Expert-Specialized Fine-Tuning (ESFT)
    • Challenge 1: Maintaining Expert Specialization One of the challenges in efficiently fine-tuning LLMs with MoE structure is to maintain the specialization of various expert components. ESFT tackles this by fine-tuning only the parameters of experts highly relevant to the downstream tasks while freezing the parameters of other experts and modules, avoiding parameter updates of non-target task experts during full-parameter fine-tuning and effectively maintaining specialized segmentation.

    • Challenge 2: Saving Computational Resources Another significant challenge in LLM fine-tuning is to reduce training resource consumption while maintaining high performance. The ESFT method fine-tunes only the parameters of selected experts, effectively reducing storage by up to 90% and training time by up to 30% compared to full-parameter fine-tuning.

Implementation and Deployment

The research experimentally validated the performance of ESFT across different downstream tasks. The results showed that ESFT maintains or even exceeds the performance of full-parameter fine-tuning while also significantly reducing the computational resources required. Moreover, the authors delved into the working mechanism of the ESFT, studied the expert selection process and its advantages, and highlighted the importance of expert relevance scores and fine-grained expert segmentation architecture through ablation studies.

Summary

The paper proposed ESFT, an efficient fine-tuning method for sparse-architecture LLMs that fine-tunes only the experts most relevant to downstream tasks, maintaining expert specialization and significantly saving computational resources.