Has anyone trained a good sounding SoundStream model on music? #151

vican9000 · 2023-03-29T12:27:07Z

vican9000
Mar 29, 2023

First of all, great repo and great discussions!

As a lot of people have reported here, the overall loss curve while training SoundStream seems to flatten itself quite early (after 1-2k steps) and I seem to be experiencing the same with MagnaTagATune dataset.
After training for 20k steps (batch size = 64, gradient accumulation = 8, 2 second 24kHz audio) the results seem to be very similar (my own listening tests + several objective torchmetrics related to audio) to what they were at step 2000.

I am very curious to hear whether anyone was able to train something that sounded alright, even after 1M steps? I've seen some discussions here on loss balancer missing (among other proposed fixes) and I just hope that the implementation checks out.
Don't get me wrong, this is very, very good work, I am just curious if there are things in the original Soundstream paper that weren't solved in the implementation here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has anyone trained a good sounding SoundStream model on music? #151

{{title}}

Replies: 0 comments

Select a reply

Has anyone trained a good sounding SoundStream model on music? #151

vican9000 Mar 29, 2023

Replies: 0 comments

vican9000
Mar 29, 2023