Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 3.45 KB

2404.09937.md

File metadata and controls

20 lines (15 loc) · 3.45 KB

Background

  • Background This paper explores the link between language models (LLMs) and compression, and whether compression capability can be an indicator of intelligence. There's been a longstanding view that good compression ability is tightly related to intelligence and recent research has equated language modeling with compression, providing a rational explanation for the success of LLMs. The objective of this paper is to empirically investigate this link and explore whether better compression capability signifies higher intelligence.

  • Existing Work Currently, there is a lack of sufficient empirical research on the relationship between compression and intelligence. Although theoretical discussions exist, empirical support for the relationship between compression and intelligence, specifically within the context of LLMs, is scarce—it's questioned whether a language model encoding a text corpus with fewer bits losslessly indeed indicates higher intelligence.

Core Contributions

  • Introduced a method to examine the relationship between LLMs' intelligence and compression capability
    • Challenge 1: Quantifying 'intelligence' The concept of intelligence is abstract and difficult to quantify directly. The paper uses average downstream benchmark scores as a proxy for intelligence, focusing on intelligence related to knowledge and commonsense, coding, and mathematical reasoning; it evaluates 30 public LLMs from different organizations across 12 benchmarks. The study found that the LLMs' intelligence—reflected by average benchmark scores—almost linearly correlates with their ability to compress external text corpora, providing substantial evidence for the belief that better compression suggests greater intelligence.

    • Challenge 2: Ensuring universality and practicality of the research Past studies might only explore the relationship between benchmark scores and compression-like metrics such as validation loss within the same model series sharing configurations like model designs, tokenizers, etc. However, this study is the first to document a linear correlation between compression and intelligence across LLMs of varying sizes, tokenizers, context window lengths, and pretraining data distributions, establishing the linear correlation as a universal principle. It also proposes compression efficiency as an unsupervised metric linearly correlated with the models' abilities.

Implementation and Deployment

Experiments across 30 different language models and 12 different benchmarks revealed that the downstream capabilities of LLMs are almost linearly correlated with their compression efficiency, with a Pearson correlation coefficient of about -0.95. This indicates that better compression efficiency indeed suggests higher intelligence. The compression efficiency of LLMs, as an unsupervised metric, can serve as a stable, flexible, and reliable standard to evaluate LLMs, linearly correlated with model capabilities. The authors have also open-sourced their compression dataset and data collection pipelines to enable future researchers to properly assess compression.

Summary

The paper provides empirical evidence that there is almost a linear correlation between LLMs' performance on downstream tasks and their compression efficiency, supporting the long-held belief that "better compression indicates higher intelligence". It also proposes using compression efficiency as an unsupervised metric for assessing LLM performance.