Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added data_format to flatten layer. #9696

Merged
merged 11 commits into from
Apr 1, 2018
Merged

Conversation

joeyearsley
Copy link
Contributor

If you create an inference graph in NHWC but have trained in NCHW you will realise that your inference predictions are different from train time.

This is due to the fact that the flatten layer does not respect channel orderings, therefore I have added a data_format keyword arg such that it is always reshaped to NHWC - as for dense layers, it does not matter.

@fchollet
Copy link
Member

The CNTK/Theano backends seem unhappy with the implementation. You need to use K.ndim(inputs), I think.

@joeyearsley
Copy link
Contributor Author

You are correct with the K.ndim, unsure if that will resolve the dynamic axis issue with CNTK.
If it doesn't, I will setup a CNTK docker image to debug that issue in.

@fchollet
Copy link
Member

The issue persists with CNTK. @souptc could you please advise on how to fix it?

@souptc
Copy link
Contributor

souptc commented Mar 20, 2018

my bad, it is a bug in cntk_backend. In cntk_backend.py method "def permute_dimensions", line 1115:
if num_dynamic_axis > 0 and pattern[:num_dynamic_axis] != current_layout[:num_dynamic_axis]:

here current_layout is a tuple but pattern could be a list, so compare may failed un-expected...

Do you mind to fix this in this pr?

@joeyearsley
Copy link
Contributor Author

Thanks @souptc! - that has fixed it.

@@ -116,6 +116,9 @@ def layer_test(layer_cls, kwargs={}, input_shape=None, input_dtype=None,

# test as first layer in Sequential API
layer_config = layer.get_config()
# deals with data_format in flatten
if 'data_format' in kwargs:
layer_config['data_format'] = kwargs['data_format']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change and add support for data_format in get_config for this layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot about that func, I've updated to reflect your comments.

@joeyearsley
Copy link
Contributor Author

@fchollet Any more to do?

@@ -465,6 +466,13 @@ def get_config(self):
class Flatten(Layer):
"""Flattens the input. Does not affect the batch size.

Arguments:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format: it's # Arguments (and no semicolon)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@fchollet fchollet merged commit aedad39 into keras-team:master Apr 1, 2018
dschwertfeger added a commit to dschwertfeger/keras that referenced this pull request Apr 6, 2018
…ack-embeddings-from-layer-outputs

* upstream/master: (68 commits)
  fit/evaluate_generator supporting native tensors (keras-team#9816)
  keras-team#9642 Add kwarg and documentation for dilation_rate to SeparableConvs (keras-team#9844)
  Document that "same" is inconsistent across backends with strides!=1 (keras-team#9629)
  Improve tests by designating dtype of sample data (keras-team#9834)
  Add documentation for 'subset' and interpolation' arguments (ImageDataGenerator) (keras-team#9817)
  Revert default theme to readthedocs
  Various docs fixes.
  Fix conflict
  Add support for class methods documentation (keras-team#9751)
  Add missing verbose opt for evaluate_generator (keras-team#9811)
  Added `data_format` to flatten layer. (keras-team#9696)
  Allow saving models directly to binary stream (keras-team#9789)
  Fix ctc_batch_cost() error when batch_size = 1 (keras-team#9775)
  Fix keras-team#9802 (keras-team#9803)
  Fix error in ImageDataGenerator documentation (keras-team#9798)
  fix typo (keras-team#9792)
  keras-team#9733: Extend RemoteMonitor to send data as application/json (keras-team#9734)
  Fixed inconsistencies regarding ReduceLROnPlateau (keras-team#9723)
  Fix doc issue.
  General stateful metrics fixes (keras-team#9446)
  ...
Vijayabhaskar96 pushed a commit to Vijayabhaskar96/keras that referenced this pull request May 3, 2018
* Added data_format to flatten

* Added flatten tests

* Fixed Tests

* Added more dimension tests

* Reverted TF backend change

* Reverted

* Fixed CI Problems

* Altered to K.ndim for compatability

* Updated CNTK backend

* Updated to match comments

* Updated Docs
@TimZaman
Copy link
Contributor

I don't understand why we would always convert channels_first to channels_last? That seems suboptimal for channels_last models, for both training as well as inference.

@joeyearsley
Copy link
Contributor Author

In what way?

This is to make it easier to productise going from NCHW to NHWC.

The dimension ordering of your 4d matrix matters before your dense layers, if you don’t account for these 2 formats then you get lots of regression tests breaking when you do the conversion for inference time.

Why is it suboptimal if it’s a flatten layer? - ignoring the one extra transpose added when using NCHW.

I don’t fully understand your final point.

@TimZaman
Copy link
Contributor

TimZaman commented Nov 13, 2018

It's suboptimal as it creates ops in the graph when training channels_first (more optimal for GPUs) that effectively don't do anything. Although I think we agree on that, your original intent here was good, but another change by Francois changed that.

I think this happened: with good intent, you created this initializer:

+   def __init__(self, data_format='channels_last', **kwargs):

The data_format is value here is odd, as Keras usually sets this to None. The behaviour with the channels_last here is good though, because it's backwards compatible, and won't add any extraneous ops.

However, then this change happened 243338d , where I think someone thought this was an error, as data_format everywhere else defaults to None.

-    def __init__(self, data_format='channels_last', **kwargs):
+    def __init__(self, data_format=None, **kwargs):

Which suddenly is not backwards compatible anymore, and changes the behaviour of both training and doing inference on channels_first models!

I suggest reverting 243338d

@joeyearsley
Copy link
Contributor Author

Ah, now I see the confusion! Yes I agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants