Skip to content

Latest commit

 

History

History
35 lines (21 loc) · 2.46 KB

description.md

File metadata and controls

35 lines (21 loc) · 2.46 KB

STL1

Daniel Stoller¹, Sebastian Ewert², Simon Dixon¹

¹Queen Mary University London

²Spotify London

Contact: d.stoller (AT) qmul.ac.uk

Additional Info

  • is_blind: no
  • additional_training_data: yes

Supplemental Material

Method

Task: Singing voice separation. For the same model applied to multi-track separation, see STL2 submission.

We use the Wave-U-Net, an adaptation of the U-Net architecture to the one-dimensional time domain to perform end-to-end audio source separation. Through a series of downsampling and upsampling blocks, which involve convolutions combined with a down-/upsampling process, features are computed on multiple scales/levels of abstraction and time resolution, and combined to make a prediction. Training is done on 75 MUSDB training set songs and the CCMixter Vocal Separation Database, validation with early stopping on the remaining 25 MUSDB songs, both from the training set. Training loss is the MSE on the raw audio source outputs.

A paper with more details, experiments and analysis is currently under review elsewhere.

References