-
Background The paper addresses the limitations of Large Language Models (LLMs) in processing long text sequences, often due to the limited size of the context window during pre-training. Previous studies indicated that extending the context window size typically requires fine-tuning of large models, which is time-consuming and compute-intensive.
-
Existing Work Technologies like Rotary Position Embedding (RoPE) have been shown to be effective in extending context windows without fine-tuning, yet these methods generally still need fine-tuning to adapt to longer sequence lengths.
- Introduced Self-Extend
-
Challenge 1: How to extend the context window without fine-tuning The proposed Self-Extend method incorporates two types of attention mechanisms, grouped attention and normal attention, to achieve mapping of unseen positions into seen during pre-training, all without the need for fine-tuning.
-
Challenge 2: Ensuring no loss in performance Self-Extend has a unique design for merging both types of attentions, ensuring a smooth transition from normal to grouped attention ranges and demonstrating through experiments its effectiveness in enhancing model performance on long texts without additional training.
-
The experiments for Self-Extend show comparable or even superior performance on various benchmarks compared to methods that require fine-tuning. This demonstrates that the method is able to expand the context window while avoiding the high computational costs of fine-tuning. It particularly improved performance on different models, such as Llama-2-7b-chat and Mistral-7B-instruct, without additional training.
The paper successfully presents a method for extending the context window of LLMs without fine-tuning, which is crucial for improving the capability of large language models to process long texts when computational resources are limited.