Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 1.99 KB

2402.16153.md

File metadata and controls

20 lines (15 loc) · 1.99 KB

Background

  • Background The paper discusses the current lack of publicly available music-related natural language corpora, which hinders the ability of language models (LMs) to comprehend and generate music.

  • Existing Work Previous research could not address this issue mainly due to the absence of a dedicated music corpora pre-training dataset, which limits the application of models in the music domain.

Core Contributions

  • Introduced a pre-training dataset called MusicPile
    • Challenge 1: Lack of music-related corpora Researchers curated the MusicPile dataset by selecting from existing large-scale corpora using music-related terms and topics to inject musical capability into LLMs.

    • Challenge 2: Limited performance in understanding and generating music The researchers established MusicTheoryBench, a benchmarking test to evaluate LLMs' music understanding and reasoning capabilities. They also demonstrated the use of the ABC musical notation system for effective compression and encoding of musical structures.

Implementation and Deployment

The evaluation results demonstrated that the proposed methods and datasets significantly improve music understanding and generation capabilities. For the understanding capability tests, all systems exceeded the random baseline, with GPT-4 reaching the highest score of 58.2 in the music knowledge metric. In music reasoning, even though performances were modest, ChatMusician-Base and ChatMusician achieved scores of 27.1 and 26.3 in a zero-shot setting, surpassing GPT-4. In terms of music generation, the proposed ABC notation compressing approach achieved a higher compression ratio and was proven by human evaluation to offer superior musicality in its generated music.

Summary

The paper made substantial progress in an under-researched domain by creating the first music pre-training dataset and assessment benchmark for language models, enhancing LLMs' performance in understanding and generating music.