-
Background The paper discusses the current lack of publicly available music-related natural language corpora, which hinders the ability of language models (LMs) to comprehend and generate music.
-
Existing Work Previous research could not address this issue mainly due to the absence of a dedicated music corpora pre-training dataset, which limits the application of models in the music domain.
- Introduced a pre-training dataset called MusicPile
-
Challenge 1: Lack of music-related corpora Researchers curated the MusicPile dataset by selecting from existing large-scale corpora using music-related terms and topics to inject musical capability into LLMs.
-
Challenge 2: Limited performance in understanding and generating music The researchers established MusicTheoryBench, a benchmarking test to evaluate LLMs' music understanding and reasoning capabilities. They also demonstrated the use of the ABC musical notation system for effective compression and encoding of musical structures.
-
The evaluation results demonstrated that the proposed methods and datasets significantly improve music understanding and generation capabilities. For the understanding capability tests, all systems exceeded the random baseline, with GPT-4 reaching the highest score of 58.2 in the music knowledge metric. In music reasoning, even though performances were modest, ChatMusician-Base and ChatMusician achieved scores of 27.1 and 26.3 in a zero-shot setting, surpassing GPT-4. In terms of music generation, the proposed ABC notation compressing approach achieved a higher compression ratio and was proven by human evaluation to offer superior musicality in its generated music.
The paper made substantial progress in an under-researched domain by creating the first music pre-training dataset and assessment benchmark for language models, enhancing LLMs' performance in understanding and generating music.