Skip to content

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Notifications You must be signed in to change notification settings

declare-lab/Video2Music

 
 

Repository files navigation

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

This repository contains the code and dataset accompanying the paper "Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model" by Dr. Jaeyong Kang, Prof. Soujanya Poria, and Prof. Dorien Herremans.

Introduction

We propose a novel AI-powered multimodal music generation framework called Video2Music. This framework uniquely uses video features as conditioning input to generate matching music using a Transformer architecture. By employing cutting-edge technology, our system aims to provide video creators with a seamless and efficient solution for generating tailor-made background music.

Directory Structure

  • saved_models/: saved model files
  • utilities/
    • run_model_vevo.py: code for running model (AMT)
    • run_model_regression.py: code for running model (bi-GRU)
  • model/
    • video_music_transformer.py: Affective Multimodal Transformer (AMT) model
    • video_regression.py: Bi-GRU regression model used for predicting note density/loudness
    • positional_encoding.py: code for Positional encoding
    • rpr.py: code for RPR (Relative Positional Representation)
  • dataset/
    • vevo_dataset.py: Dataset loader
  • train.py: training script (AMT)
  • train_regression.py: training script (bi-GRU)
  • evaluate.py: evaluation script
  • generate.py: inference script

Preparation

  • Clone this repo

  • Obtain the dataset:

    • MuVi-Sync (features) (Link)
    • MuVi-Sync (original video) (Link)
  • Put all directories started with vevo in the dataset under this folder (dataset/)

  • Download the processed training data AMT.zip from HERE and extract the zip file and put the extracted two files directly under this folder (saved_models/AMT/)

  • Install dependencies pip install -r requirements.txt

    • Choose the correct version of torch based on your CUDA version

Training

python train.py

Inference

python generate.py

Citation

If you find this resource useful, please cite the original work:

  @article{kang2023video2music,
    title={Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model},
    author={Kang, Jaeyong and Poria, Soujanya and Herremans, Dorien},
    journal={arXiv preprint arXiv:2311.00968},
    year={2023}
  }

Kang, J., Poria, S. & Herremans, D. (2023). Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. arXiv preprint arXiv:2311.00968.

About

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%