acgan: Add batch normalization to the Generator, etc #8616

ozabluda · 2017-11-28T21:38:02Z

Add batch normalization to the Generator. This makes the example closer to the referenced paper and improves generated images. Adding batch normalization to the Discriminator breaks training so badly, that I suspect a bug (maybe Add tests exposing BatchNormalization bug(s)? when used in GANs. #5647 is fixed incompletely or something). Not adding batch normalization to the Discriminator also side-steps the issue of correlation of samples within a batch (https://github.com/soumith/ganhacks#4-batchnorm)
Use one-sided soft labels and a harder soft_one=0.95 vs 0.9). The referenced paper says they don't need one-sided soft labels. This example also doesn't "need" them any more, but the generated images are better. Add reference to a paper.
Increase epoch=100 from 50, as good images often appear between epochs 50 and 100. Note that the training time per epoch is half that of the original example, after 67cd3b0
Increase output precision of various losses from 2 decimal digits to 4. You can't really tell what is going on with just 2.

@lukedeo , I see a lot of examples online which use embedding with Hadamard, but do you know of any paper(s) we can reference? I haven't seen it in any of the GAN papers. I really like embedding with Hadamard, as replacing them would require multiple (3-5?) additional layers, but to be thorough I did a half-hearted attempts to remove them (make closer to the acgan paper), just to see if I can, and failed (generated images are much worse).

ozabluda · 2017-11-28T21:43:16Z

Generated images are better than previous best (#8482 (comment))

The biggest difference is much larger diversity in line thickness.

Epoch 57

Epoch 99

Epoch 100

fchollet · 2017-12-03T23:43:55Z

Note that if you want to completely freeze a model that has BN layers (like here) you need to do more than just set it to non-trainable, you need to also disable its updates. Otherwise the batch statistics will still get updated during training. This is not a bug, this is simply a manifestation of the fact that non-backprop updates and layer trainability are independent (for instance, if you set a stateful RNN to non-trainable, that will not freeze its state).

It is possible that we should have a frozen attribute that would both disable trainability and other updates. Or some other API to freeze BN layers.

alpapado · 2017-12-04T14:12:47Z

you need to also disable its updates.

Could you elaborate on that please? Do you mean setting every layer's trainable attribute to False?

fchollet · 2017-12-04T16:33:04Z

No. You'll need to clear the attribute _per_input_updates on every sublayer. That, or find other way to get model.updates to return [].

ozabluda · 2017-12-04T17:30:31Z

I don't think it ever makes sense to freeze those 2 trainable parameters, while updating the 2 non-trainable parameters. This approach should properly be called "Breaking Deep Network Training by Introducing Uncorrectable Internal Covariate Shift". In terms of calling things trainable or non-trainable, I don't see what difference does it make conceptually whether the parameter updates happen during forward prop or backprop. I think those non-trainable weights should always be frozen as well, and renamed _trainable_when they aren't frozen.

This issue does not affect this PR (which has BN only on the Generator, which is never frozen), but may be the reason why my attempts to put BN into the Discriminator broke things very badly. The following workaround #4762 (comment) seems to be working for the GAN use case, and I'll try it, but not for a while.

fchollet · 2017-12-04T17:40:22Z

@ozabluda Jeremy Howard did a study on this recently ("should we update the batch statistics during fine-tuning?") and the answer was, "it depends". It's not clear-cut.

Beyond batch norm specifically, you seem confused by the different between trainability and stateful behavior. Setting a layer to non-trainable means its trainable weights will not be taken into account during training. It does not affect the parts of the layer's state that are independent from training. For instance, a layer that maintains a counter that is incremented by one with every batch, will not stop doing that if you set trainable = False. Because that has nothing to do with training. Same with stateful RNNs, or with BN updates.

If you want to run your BN layers in inference mode, the way to do it is to pass a static boolean as the training argument:

y = BatchNormalization()(x, training=False)

lukedeo · 2017-12-04T20:46:49Z

unrelated, @ozabluda I don't know of any references. Tbh, it always just seemed more natural on an intuitive basis.

ozabluda · 2017-12-04T23:41:44Z

@fchollet, Can't find that study by Jeremy Howard.

Is there a recommended way to freeze those two non_trainable weights that would work for the acgan example, if I put BatchNormalization into the Discriminator? y = BatchNormalization()(x, training=False) will work for the combined model, but not for the Discriminator. For the same reason, the idea from my previous comment will not work either.

Right now, the best advice seems to be to go through the discriminator model, immediately after discriminator.trainable = False is set, looking for BatchNormalization layers and set their per_input_updates={}. Is that supposed to work without screwing up the already compiled discriminator?

Also see tensorflow/tensorflow#10857

ahundt · 2017-12-05T06:40:36Z

It is possible that we should have a frozen attribute that would both disable trainability and other updates. Or some other API to freeze BN layers.

Yes, please!

fchollet · 2017-12-05T19:46:06Z

Is there a recommended way to freeze those two non_trainable weights that would work for the acgan example

There is no "clean" way at the moment, but we need one. The simplest would be a layer/model attribute that regulates whether or not the layer/model will always return [] when asked for .updates. It would be very straightforward to implement. Could be named frozen, freeze, freeze_updates...

To make sure we have separation of concerns between trainability and updates, it's probably best for this attribute to only act on updates (no effect on trainability) and to explicit mention updates in the name. Like, freeze_updates.

ozabluda · 2017-12-05T23:13:35Z

FWIW, none of the following works to freeze those 2 weights in BatchNormalization I put into Discriminator in acgan example:

Immediately after discriminator.trainable = False

    for layer in discriminator.layers[1].layers:
        if layer.name.startswith('batch_normalization_'):
            layer._per_input_updates={}
            layer._updates=[]

before and after combined.compile():

    for layer in combined.layers[-1].layers[1].layers:
        if layer.name.startswith('batch_normalization_'):
            layer._per_input_updates={}
            layer._updates=[]

ahundt · 2017-12-06T21:27:06Z

@fchollet I agree, freeze_updates sounds like the best of your suggestions

ozabluda · 2017-12-06T22:15:03Z

@ahundt, I have no immediate plans to work on it, since I don't understand that part of the Keras code, and therefore the suggestions (as of now). I also don't understand the performance implications, for example compared to native tensorflow/tensorflow#12580.

So if you feel like doing it, it would be great. For the reference, this is how I check if freezing actually works in the acgan example after putting BatchNormalization into the Discriminator, (not in this PR), immediately after combined.train_on_batch()

print(combined.layers[-1].layers[1].layers[5].name)
print(np.all([
    np.array_equal(w1, w2) for w1,w2 in
    zip(weights,
        combined.layers[-1].layers[1].layers[5].get_weights())]))

print(np.array_equal(
    weights[2],
    combined.layers[-1].layers[1].layers[5].get_weights()[2]))#0-1 are equal, 2-3 are not

A much simpler example can't be made, because this is the use case for which none of the hacks, etc work, but see example in #8676 (put BatchNormalization into m2).

ahundt · 2017-12-07T00:46:52Z

@ozabluda sorry, meant to @ mention fchollet on that previous post, I was addressing the feature he was suggesting. I edited it in now.

ahundt · 2017-12-18T04:08:56Z

@fchollet Also, I'm not sure I mentioned this, but freezing batch normalization updates is quite helpful when fine tuning segmentation problems from pre-trained weights. I think something like this modified BatchNormalization class with freeze would do the trick. Would it be acceptable to place the freeze parameter in allowed_kwargs under Layer?

fchollet · 2018-01-02T12:53:49Z

For the record on master you can now set layer.updatable = False or model.updatable = False to freeze updates (mirrors trainable = False).

@lukedeo would you like to review this PR?

lukedeo · 2018-01-03T01:33:46Z

yep @fchollet, can check by EOW

fchollet · 2018-01-11T19:08:51Z

@ahundt @ozabluda FYI I think the updatable system is overcomplicated. Pragmatically, what most people want/expect is that setting bn.trainable = False will run BN in inference mode, i.e. without updates. Likewise, pragmatically, updates when trainable == False only make sense for stateful layers (stateful RNNs being the only example of that).

Thus I am reverting updatable (it was never part of a release) and simply disabling updates for non-stateful layers when trainable == False.

fchollet · 2018-01-11T19:09:37Z

@lukedeo if you're still available to review this PR, we're waiting for your input. Otherwise, please tell and we will find another reviewer. Thanks!

ahundt · 2018-01-11T23:58:07Z

@fchollet thanks for the update, the new functionality you describe is the behavior I imagined when I first saw the name of the flag a year ago when I was first learning the code base. I look forward to using it!

ozabluda · 2018-01-12T02:22:04Z

For the reference, the changes described in #8616 (comment) were made in 24246ea

ahundt · 2018-01-12T03:03:53Z

Specifically, the reverse commit that makes trainable=False prevent updates is 24246ea

lukedeo · 2018-01-12T18:03:10Z

Hey @fchollet back from vacation. Will look today - sorry.

lukedeo · 2018-01-13T05:29:24Z

@fchollet this seems fine to me overall.

@ozabluda one question, though not necessarily needed for this PR, is if we should also add BN to the discriminator since it's technically closer to the paper. If we see good performance, which we do, we don't necessarily need it, but just raising as people might question now that I think about it. The one place I could see this maybe helping is if people want to adapt this script to CIFAR10 or that ilk.

ozabluda · 2018-01-13T17:43:04Z

I tried adding BN to the discriminator (the very first message in this PR, item 1). With recent fixes to the BN, maybe I'll try it again later. An attempt to adapt it to CIFAR-10 is going on here: #8937

ozabluda · 2018-01-16T01:59:37Z

I've created animations of acgan training:
https://www.youtube.com/playlist?list=PL7zaNUNu3zI8O6cBK6rgpLCEXtZerDzU_

Resolutions are 1080p (2K), 1440p (3K), 2160p (4K) 7680p (8K). There are two types of video
acgan1: one frame per epoch.
acgan0: one frame per iteration.

Note that on YouTube, you can change the playback speed, and, when paused, go frame-by-frame with ',' and '.'

Real images are on the bottom. Fake generated one are on top. Epoch/iteration is in the gray bar in between. For every column of 10 digits (0-9), latent noise vector is the same (class is different). For every row of same digits, latent noise vector is different (class is the same).

Discriminator probability real/fake is shown as a grayscale square around a digit. It's scaled such that p=0.5 is pure black, p=0.75 is pure white, with p<0.5 and p>0.75 clipped to min and max values. Otherwise, one wouldn't be able to see what is going on.

Staring at these videos for a while, falsified a lot of my preconceived notions of what GAN is actually doing. Too many to describe here. I highly recommend anyone who is interested to stare for a while as well.

ahundt · 2018-01-16T21:05:33Z

@ozabluda can you fix the youtube link? doesn't seem to work for me

ozabluda · 2018-01-16T21:34:57Z

@ahundt, fixed.

ahundt · 2018-01-18T19:59:58Z

thanks! looks neat

ozabluda · 2018-01-19T02:45:00Z

Staring at these videos for a while, falsified a lot of my preconceived notions of what GAN is actually doing. Too many to describe here. I highly recommend anyone who is interested to stare for a while as well.

Two quick things:

In this GAN, diversity doesn't come from training (after first epochs). Most It comes from sampling latent noise. Which is not what we want, but may not matter much in practice, depending.
I blame everything on the discriminator now, unlike the consensus, and what I though before I started at these videos for a while.

P.S. I've added acgan0 8K video - 30 min, 100 GB, baby!

ahundt · 2018-01-19T05:56:10Z

P.S. I've added acgan0 8K video - 30 min, 100 GB, baby!

I present you with:

🥇

Awarded for the highest resolution 28x28 image of all time. 👍

P.S. Any hints on what rendering tools you used to generate these? Sounds like something useful

ozabluda · 2018-01-19T19:56:58Z

just ffmpeg. the code to generate pngs is an embarrassing mess, not sure if it's worthwhile to clean and add it to the example.

ahundt · 2018-01-20T03:38:22Z

A png utility could go in some non-keras repo or keras-contrib.

Evan05543071 · 2019-03-24T06:54:49Z

Specifically, the reverse commit that makes trainable=False prevent updates is 24246ea

@ahundt
Does this mean:
I found that if I set the trainable= False (batch_normalization)
All parameters in batch normalization layer would stop updating
However,if I use what you said up there set the training =False
The moving mean and moving variance would be fixed and the others would still change while training

Add batch normalization to the Generator

9fe4466

Add name of a paper

da6fb93

ozabluda mentioned this pull request Dec 4, 2017

Wrong number of 'Trainable params: 3' in Model.summary() for GAN-type network when using BatchNormalization #8676

Closed

ozabluda mentioned this pull request Jan 12, 2018

Add Example ACGAN cifar-10 version #8937

Closed

fchollet approved these changes Jan 19, 2018

View reviewed changes

fchollet merged commit 97acd91 into keras-team:master Jan 19, 2018

ozabluda deleted the patch-8 branch January 19, 2018 20:33

This was referenced Jan 28, 2018

DCGAN on MNIST example #9199

Closed

"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214

Closed

ahundt mentioned this pull request Apr 23, 2018

Change BN layer to use moving mean/var if frozen #9965

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

acgan: Add batch normalization to the Generator, etc #8616

acgan: Add batch normalization to the Generator, etc #8616

ozabluda commented Nov 28, 2017

ozabluda commented Nov 28, 2017

fchollet commented Dec 3, 2017

alpapado commented Dec 4, 2017

fchollet commented Dec 4, 2017

ozabluda commented Dec 4, 2017

fchollet commented Dec 4, 2017

lukedeo commented Dec 4, 2017

ozabluda commented Dec 4, 2017

ahundt commented Dec 5, 2017

fchollet commented Dec 5, 2017 •

edited

Loading

ozabluda commented Dec 5, 2017 •

edited

Loading

ahundt commented Dec 6, 2017 •

edited

Loading

ozabluda commented Dec 6, 2017 •

edited

Loading

ahundt commented Dec 7, 2017 •

edited

Loading

ahundt commented Dec 18, 2017 •

edited

Loading

fchollet commented Jan 2, 2018 •

edited

Loading

lukedeo commented Jan 3, 2018 •

edited

Loading

fchollet commented Jan 11, 2018

fchollet commented Jan 11, 2018

ahundt commented Jan 11, 2018 •

edited

Loading

ozabluda commented Jan 12, 2018

ahundt commented Jan 12, 2018

lukedeo commented Jan 12, 2018

lukedeo commented Jan 13, 2018

ozabluda commented Jan 13, 2018

ozabluda commented Jan 16, 2018 •

edited

Loading

ahundt commented Jan 16, 2018

ozabluda commented Jan 16, 2018

ahundt commented Jan 18, 2018

ozabluda commented Jan 19, 2018

ahundt commented Jan 19, 2018 •

edited

Loading

ozabluda commented Jan 19, 2018

ahundt commented Jan 20, 2018

Evan05543071 commented Mar 24, 2019 •

edited

Loading

acgan: Add batch normalization to the Generator, etc #8616

acgan: Add batch normalization to the Generator, etc #8616

Conversation

ozabluda commented Nov 28, 2017

ozabluda commented Nov 28, 2017

fchollet commented Dec 3, 2017

alpapado commented Dec 4, 2017

fchollet commented Dec 4, 2017

ozabluda commented Dec 4, 2017

fchollet commented Dec 4, 2017

lukedeo commented Dec 4, 2017

ozabluda commented Dec 4, 2017

ahundt commented Dec 5, 2017

fchollet commented Dec 5, 2017 • edited Loading

ozabluda commented Dec 5, 2017 • edited Loading

ahundt commented Dec 6, 2017 • edited Loading

ozabluda commented Dec 6, 2017 • edited Loading

ahundt commented Dec 7, 2017 • edited Loading

ahundt commented Dec 18, 2017 • edited Loading

fchollet commented Jan 2, 2018 • edited Loading

lukedeo commented Jan 3, 2018 • edited Loading

fchollet commented Jan 11, 2018

fchollet commented Jan 11, 2018

ahundt commented Jan 11, 2018 • edited Loading

ozabluda commented Jan 12, 2018

ahundt commented Jan 12, 2018

lukedeo commented Jan 12, 2018

lukedeo commented Jan 13, 2018

ozabluda commented Jan 13, 2018

ozabluda commented Jan 16, 2018 • edited Loading

ahundt commented Jan 16, 2018

ozabluda commented Jan 16, 2018

ahundt commented Jan 18, 2018

ozabluda commented Jan 19, 2018

ahundt commented Jan 19, 2018 • edited Loading

ozabluda commented Jan 19, 2018

ahundt commented Jan 20, 2018

Evan05543071 commented Mar 24, 2019 • edited Loading

fchollet commented Dec 5, 2017 •

edited

Loading

ozabluda commented Dec 5, 2017 •

edited

Loading

ahundt commented Dec 6, 2017 •

edited

Loading

ozabluda commented Dec 6, 2017 •

edited

Loading

ahundt commented Dec 7, 2017 •

edited

Loading

ahundt commented Dec 18, 2017 •

edited

Loading

fchollet commented Jan 2, 2018 •

edited

Loading

lukedeo commented Jan 3, 2018 •

edited

Loading

ahundt commented Jan 11, 2018 •

edited

Loading

ozabluda commented Jan 16, 2018 •

edited

Loading

ahundt commented Jan 19, 2018 •

edited

Loading

Evan05543071 commented Mar 24, 2019 •

edited

Loading