-
-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noise on all image for training #315
Comments
Hey! I was having the same issue for a while. Try wrapping your imagen with the import torch
from imagen_pytorch import Unet, Imagen, ImagenTrainer
# unet for imagen
unet1 = Unet(
dim = 32,
cond_dim = 512,
dim_mults = (1, 2, 4, 8),
num_resnet_blocks = 3,
layer_attns = (False, True, True, True),
)
unet2 = Unet(
dim = 32,
cond_dim = 512,
dim_mults = (1, 2, 4, 8),
num_resnet_blocks = (2, 4, 8, 8),
layer_attns = (False, False, False, True),
layer_cross_attns = (False, False, False, True)
)
# imagen, which contains the unets above (base unet and super resoluting ones)
imagen = Imagen(
unets = (unet1, unet2),
text_encoder_name = 't5-large',
image_sizes = (64, 256),
timesteps = 1000,
cond_drop_prob = 0.1
).cuda()
# wrap imagen with the trainer class
trainer = ImagenTrainer(imagen)
# mock images (get a lot of this) and text encodings from large T5
text_embeds = torch.randn(64, 256, 1024).cuda()
images = torch.randn(64, 3, 256, 256).cuda()
# feed images into imagen, training each unet in the cascade
loss = trainer(
images,
text_embeds = text_embeds,
unet_number = 1, # training on unet number 1 in this example, but you will have to also save checkpoints and then reload and continue training on unet number 2
max_batch_size = 4 # auto divide the batch of 64 up into batch size of 4 and accumulate gradients, so it all fits in memory
)
trainer.update(unet_number = 1)
# do the above for many many many many steps
# now you can sample an image based on the text embeddings from the cascading ddpm
images = trainer.sample(texts = [
'a puppy looking anxiously at a giant donut on the table',
'the milky way galaxy in the style of monet'
], cond_scale = 3.)
images.shape # (2, 3, 256, 256) More info and an example training script is mentioned in this thread: #305 |
``Thanks it helped a lot @alif-munim ! But I Have a dataset on hugginface :
But those are the dimension for the text : print(train_db[0][0].shape) print(train_db[0][1].shape) |
@axel578 if you use the built-in t5 text encoding functions, you should get the correct dimensionality for your text embeddings. See https://github.com/lucidrains/imagen-pytorch#L26 |
Hello,
I'm trying to train on 50 000 16x16 images with alpha channels (RGBA)
But training on multiple steps doesnt give me any decent result,
training on the cli is very very slow
After training on A100 for 12 hours I still get a completly noisy image.
How could I make smaller unet that are convenient for 16x16 images with alpha channel.
I know this architecture doesnt like small images. Still is it possible to make this architecture more efficient in training time and not giving noisy images.
Thanks by the way for the work you've done :)
The text was updated successfully, but these errors were encountered: