Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acgan: Use Generator/Discriminator more closely resembling the ones from the referenced paper, and don't train discriminator to produce class labels for generated images #8482

Merged
merged 7 commits into from
Nov 25, 2017

Conversation

ozabluda
Copy link
Contributor

@ozabluda ozabluda commented Nov 14, 2017

This depends on PR #8452, and should be merged after it.

Generator:

  1. Remove extra embedding layer.
  2. Remove extra 1x1 convolutions.
  3. Use transposed convolutions instead of NN upsample+conv (=conv with fractional slides). Note that this corresponds to the paper, but contrary to author's own later advice in Deconvolution and Checkerboard Artifacts. Still, I found that paper network works better on MNIST in this example.

Discriminator:

  1. Use LeakyReLU slope 0.2, instead of default 0.3

These changes

  1. made generated images more diverse (for example, thick and thin are on the same epoch, compare acgan: Use Generator/Discriminator more closely resembling the ones from the referenced paper, and don't train discriminator to produce class labels for generated images #8482 (comment) to acgan: Use same latent vector for all classes in a row #8409 (comment))
  2. Can tolerate harder soft_labels (all the way to 1.0 works without generator collapse, but the results are not as good as 0.9).
  3. Reduced the number of trainable parameters in generator by 65% from 8,698,356 to 3,182,580.
  4. Reduced training time per epoch by 50% from 90 sec to 45 sec on Titan X Maxwell.

@ozabluda
Copy link
Contributor Author

ozabluda commented Nov 14, 2017

Below, left image is generated, right image is actual MNIST dataset (generated side-by-side with PR #8483). In generated images, latent vector is the same per-row (see PR #8409)

Epoch 32:
plot_epoch_032_generated

Epoch 45:
plot_epoch_045_generated

Epoch 48:
plot_epoch_048_generated

@ozabluda ozabluda changed the title Use Generator/Discriminator more closely resembling the ones from the referenced paper. acgan: Use Generator/Discriminator more closely resembling the ones from the referenced paper Nov 14, 2017
@fchollet
Copy link
Member

@lukedeo if your time allows, could you check out this PR as well?

@lukedeo
Copy link
Contributor

lukedeo commented Nov 20, 2017

@fchollet Overall, this looks good and is a welcome change.

@ozabluda you mention that you remove the extra embedding, but I don't see it removed here... the embedding wasn't mentioned in the paper, but I found the hadamard-y interaction to be useful. Not sure what the intention was here.

@ozabluda
Copy link
Contributor Author

@lukedeo
Copy link
Contributor

lukedeo commented Nov 20, 2017

Ah ok I see, sure, that extra hidden rep is extraneous and makes it more in-line with the paper.

Re: #8452, how noticeable is the improvement in quality when the discriminator receives zero weight for the auxiliary task on generated samples? If we're going to go full paper reproduction mode, my thought is to go all the way. I could be fairly easily convinced though, as having the discriminator minimize classification accuracy on generated samples can lead to the generator not examining sections of the support where the the likelihood ratio approaches a flat posterior

@ozabluda
Copy link
Contributor Author

A brief history:

Recently acgan example was broken - generating all-black images. The minimal first fix for it was to introduce soft real/fake labels, PR "acgan: Fix generator producing pure black images" #8383, see generated examples there. This fix was either mandatory for later fixes (to prevent all-black), or was producing batter results. So far, all further fixes made soft labels harder and harder, which is an evidence that those are good fixes).

Second fix was PR "acgan: don't train discriminator to produce class labels for generated images" #8452, see generated examples there for how much more diverse they are than before. See #8452 (comment) for alternatives, and #8452 (comment) for the history how it ended up in the paper and the justification of its removal from the example (initially I was also hesitant to remove it, see the second sentence here: #8452 (comment)). It's better to keep the discussion of that PR in that PR.

Third fix is this PR. It depends on second and first.

Use same latent vector for all classes in a row
don't train discriminator to produce class labels for generated images
@ozabluda ozabluda changed the title acgan: Use Generator/Discriminator more closely resembling the ones from the referenced paper acgan: Use Generator/Discriminator more closely resembling the ones from the referenced paper, and don't train discriminator to produce class labels for generated images Nov 24, 2017
@ozabluda
Copy link
Contributor Author

ozabluda commented Nov 24, 2017

The following PRs were merged into this PR:

acgan: Use same latent vector for all classes in a row #8409
acgan: don't train discriminator to produce class labels for generated images #8452

See #8409 (comment)

@@ -185,6 +182,13 @@ def build_discriminator():
num_batches = int(x_train.shape[0] / batch_size)
progress_bar = Progbar(target=num_batches)

# don't train discriminator to produce class labels for generated
# images. To preserve total weight of the auxilary classifier,
# take real image samples with weight 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meaning of this comment is not clear, please rephrase/clarify

# don't train discriminator to produce class labels for generated
# images. To preserve total weight of the auxilary classifier,
# take real image samples with weight 2.
disc_sample_weight = [np.ones(2 * batch_size, dtype=np.float32),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses float32 for one but not the other; it's not really necessary for either (memory is not an issue and they get cast anyway). In any case, better to be consistent

@ozabluda
Copy link
Contributor Author

Review comments addressed.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants