Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 1.96 KB

2401.01325.md

File metadata and controls

20 lines (15 loc) · 1.96 KB

Background

  • Background The paper addresses the limitations of Large Language Models (LLMs) in processing long text sequences, often due to the limited size of the context window during pre-training. Previous studies indicated that extending the context window size typically requires fine-tuning of large models, which is time-consuming and compute-intensive.

  • Existing Work Technologies like Rotary Position Embedding (RoPE) have been shown to be effective in extending context windows without fine-tuning, yet these methods generally still need fine-tuning to adapt to longer sequence lengths.

Core Contributions

  • Introduced Self-Extend
    • Challenge 1: How to extend the context window without fine-tuning The proposed Self-Extend method incorporates two types of attention mechanisms, grouped attention and normal attention, to achieve mapping of unseen positions into seen during pre-training, all without the need for fine-tuning.

    • Challenge 2: Ensuring no loss in performance Self-Extend has a unique design for merging both types of attentions, ensuring a smooth transition from normal to grouped attention ranges and demonstrating through experiments its effectiveness in enhancing model performance on long texts without additional training.

Implementation and Deployment

The experiments for Self-Extend show comparable or even superior performance on various benchmarks compared to methods that require fine-tuning. This demonstrates that the method is able to expand the context window while avoiding the high computational costs of fine-tuning. It particularly improved performance on different models, such as Llama-2-7b-chat and Mistral-7B-instruct, without additional training.

Summary

The paper successfully presents a method for extending the context window of LLMs without fine-tuning, which is crucial for improving the capability of large language models to process long texts when computational resources are limited.