Bad result with vqgan #418

shizhediao · 2022-02-23T00:11:16Z

Hi, I am using VQGAN on the MSCOCO training dataset (also tried adding Visual Genome to construct a 1 Million dataset), but got a bad result. The pixels are wired.

Here are my settings, thanks!

transformer_dim = 512 rotary_emb = False image_fmap_size = 32 self.transformer = Transformer( dim = transformer_dim, causal = True, seq_len = seq_len, depth = self.config_visual_decoder.num_hidden_layers, heads = 8, dim_head = 64, reversible = False, attn_dropout = 0.0, ff_dropout = 0.0, attn_types = 'full', image_fmap_size = image_fmap_size, sparse_attn = False, stable = False, sandwich_norm = False, shift_tokens = False, rotary_emb = rotary_emb, # shared_attn_ids = None, # shared_ff_ids = None, # optimize_for_inference = False, )

I looked up several previous issues and reports and notice that people usually get loss < 4.5 while my loss is around 5.4.

I use a large batch size (more than 3000) while others use a far smaller batch size (like 16), does that matter?
Thanks

The text was updated successfully, but these errors were encountered:

rom1504 · 2022-02-23T00:19:43Z

yes if you can use such a large batch size, it either means that you have hundred of gpus or that your model is too small
you should increase the depth parameter, and probably train for longer. How long did you train?

shizhediao · 2022-02-23T00:23:55Z

Thanks for your quick reply!
I have tried three settings, all of them are based on a 6layer Transformer.

MSCOCO training data: 400,000 image-text-pairs
MSCOCO + VG + some private data: around 6M image-text-pairs
MSCOCO + VG + CC15M: around 16M image-text-pairs

To speed up the training, I use many A100 GPUs with only 15 epochs. The first setting could be finished in several hours.

The loss is from 7.0 -> 6.5 -> 5.4, and stuck at 5.4 from epoch5.

shizhediao · 2022-02-23T00:27:03Z

What could I do if I want to both speed up with large batch size and use only 6layer Transformer?
Maybe larger learning rate and train longer?
It seems that even with lr=1e-3, the loss will get stuck at 5.0

rom1504 · 2022-02-23T00:32:23Z

Try the default lr and depth 16
What distributor do you use ? Deepspeed? Horovod?

Increasing depth usually gets much better results

shizhediao · 2022-02-23T00:35:37Z

Try the default lr and depth 16 What distributor do you use ? Deepspeed? Horovod?

Increasing depth usually gets much better results

Neither, I am using NCCL constructed by a company.
I suddenly found there is an issue said ""adamw" optimizer + weight decay = poor generations" #170
I am using adamw + weight decay0.01, does that matter?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad result with vqgan #418

Bad result with vqgan #418

shizhediao commented Feb 23, 2022 •

edited

Loading

rom1504 commented Feb 23, 2022

shizhediao commented Feb 23, 2022 •

edited

Loading

shizhediao commented Feb 23, 2022

rom1504 commented Feb 23, 2022

shizhediao commented Feb 23, 2022

Bad result with vqgan #418

Bad result with vqgan #418

Comments

shizhediao commented Feb 23, 2022 • edited Loading

rom1504 commented Feb 23, 2022

shizhediao commented Feb 23, 2022 • edited Loading

shizhediao commented Feb 23, 2022

rom1504 commented Feb 23, 2022

shizhediao commented Feb 23, 2022

shizhediao commented Feb 23, 2022 •

edited

Loading

shizhediao commented Feb 23, 2022 •

edited

Loading