Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 3.11 KB

2406.08464.md

File metadata and controls

20 lines (15 loc) · 3.11 KB

Background

  • Background The paper discusses the remarkable performance of Large Language Models (LLMs) in executing a wide array of tasks by following instructions, which is contingent upon high-quality instruction data for fine-tuning. However, alignment data used for tuning models like Llama-3-Instruct remain proprietary, even when the model weights are open, constraining the democratization of AI and limiting research into understanding and improving LLM alignment.

  • Existing Work Existing work in constructing alignment datasets faces two primary challenges. One requires human effort to generate and curate data, which is time and labor-intensive. The other relies on LLMs to produce synthetic instructions, which, although reducing human labor, critically hinges on prompt engineering and careful selection of seed questions, decreasing data diversity as dataset size increases.

Core Contributions

  • Introduced a self-synthesis method called MAGPIE
    • Challenge 1: How to synthesize high-quality instruction data that maintains diversity without relying on human effort or prompt engineering? The MAGPIE method doesn't rely on prompt engineering or seed questions. Instead, it directly constructs instruction data by prompting aligned LLMs with a pre-query template for sampling instructions. MAGPIE was applied to Llama-3-8B-Instruct and Llama-3-70B-Instruct models to create two datasets: MAGPIE-Air and MAGPIE-Pro, which are capable of self-synthesizing user queries and leveraging the capabilities acquired during the LLM's alignment process.

    • Challenge 2: How to assess the effectiveness of the MAGPIE dataset compared to other public instruction datasets? A detailed analysis and comparison were conducted, with Llama-3-8B-Base fine-tuned separately with each dataset and evaluated on LLM alignment benchmarks like AlpacaEval 2, ArenaHard, and WildBench. The results showed that MAGPIE fine-tuned models surpass the official Llama-3-8B-Instruct model enhanced with over 10 million data points for supervised fine-tuning (SFT) and feedback learning.

Implementation and Deployment

With the MAGPIE method, it is efficiently feasible to generate diverse and high-quality instruction data by giving just the pre-query template. Experiments indicate that models fine-tuned with MAGPIE exhibit superior alignment on benchmarks like AlpacaEval 2, outperforming the official Llama-3-8B-Instruct. Moreover, using MAGPIE solely for SFT even surpasses previous public datasets used for both SFT and preference optimization. This advantage is evident on alignment benchmarks and notably attained without compromising performance on reasoning tasks like MMLU-Redux.

Summary

This paper presents MAGPIE, a novel self-synthesis method for generating large-scale high-quality alignment data without relying on human intervention or prompt engineering, demonstrating the potential of LLMs in automatic data generation and alignment. Experimentation shows that models fine-tuned with MAGPIE excel across various benchmarks, exhibiting the latent capabilities of LLMs in data generation and model alignment.