Added `data_format` to flatten layer. #9696

joeyearsley · 2018-03-18T22:59:51Z

If you create an inference graph in NHWC but have trained in NCHW you will realise that your inference predictions are different from train time.

This is due to the fact that the flatten layer does not respect channel orderings, therefore I have added a data_format keyword arg such that it is always reshaped to NHWC - as for dense layers, it does not matter.

fchollet · 2018-03-19T02:23:15Z

The CNTK/Theano backends seem unhappy with the implementation. You need to use K.ndim(inputs), I think.

joeyearsley · 2018-03-19T07:42:12Z

You are correct with the K.ndim, unsure if that will resolve the dynamic axis issue with CNTK.
If it doesn't, I will setup a CNTK docker image to debug that issue in.

fchollet · 2018-03-19T21:56:08Z

The issue persists with CNTK. @souptc could you please advise on how to fix it?

souptc · 2018-03-20T00:45:36Z

my bad, it is a bug in cntk_backend. In cntk_backend.py method "def permute_dimensions", line 1115:
if num_dynamic_axis > 0 and pattern[:num_dynamic_axis] != current_layout[:num_dynamic_axis]:

here current_layout is a tuple but pattern could be a list, so compare may failed un-expected...

Do you mind to fix this in this pr?

joeyearsley · 2018-03-20T08:28:48Z

Thanks @souptc! - that has fixed it.

fchollet · 2018-03-20T16:55:27Z

keras/utils/test_utils.py

@@ -116,6 +116,9 @@ def layer_test(layer_cls, kwargs={}, input_shape=None, input_dtype=None,

    # test as first layer in Sequential API
    layer_config = layer.get_config()
+    # deals with data_format in flatten
+    if 'data_format' in kwargs:
+        layer_config['data_format'] = kwargs['data_format']


Revert this change and add support for data_format in get_config for this layer

Forgot about that func, I've updated to reflect your comments.

joeyearsley · 2018-03-31T16:24:21Z

@fchollet Any more to do?

fchollet · 2018-03-29T17:52:39Z

keras/layers/core.py

@@ -465,6 +466,13 @@ def get_config(self):
 class Flatten(Layer):
    """Flattens the input. Does not affect the batch size.

+    Arguments:


Format: it's # Arguments (and no semicolon)

fchollet

LGTM, thanks!

…ack-embeddings-from-layer-outputs * upstream/master: (68 commits) fit/evaluate_generator supporting native tensors (keras-team#9816) keras-team#9642 Add kwarg and documentation for dilation_rate to SeparableConvs (keras-team#9844) Document that "same" is inconsistent across backends with strides!=1 (keras-team#9629) Improve tests by designating dtype of sample data (keras-team#9834) Add documentation for 'subset' and interpolation' arguments (ImageDataGenerator) (keras-team#9817) Revert default theme to readthedocs Various docs fixes. Fix conflict Add support for class methods documentation (keras-team#9751) Add missing verbose opt for evaluate_generator (keras-team#9811) Added `data_format` to flatten layer. (keras-team#9696) Allow saving models directly to binary stream (keras-team#9789) Fix ctc_batch_cost() error when batch_size = 1 (keras-team#9775) Fix keras-team#9802 (keras-team#9803) Fix error in ImageDataGenerator documentation (keras-team#9798) fix typo (keras-team#9792) keras-team#9733: Extend RemoteMonitor to send data as application/json (keras-team#9734) Fixed inconsistencies regarding ReduceLROnPlateau (keras-team#9723) Fix doc issue. General stateful metrics fixes (keras-team#9446) ...

* Added data_format to flatten * Added flatten tests * Fixed Tests * Added more dimension tests * Reverted TF backend change * Reverted * Fixed CI Problems * Altered to K.ndim for compatability * Updated CNTK backend * Updated to match comments * Updated Docs

TimZaman · 2018-11-12T19:06:13Z

I don't understand why we would always convert channels_first to channels_last? That seems suboptimal for channels_last models, for both training as well as inference.

joeyearsley · 2018-11-13T00:46:07Z

In what way?

This is to make it easier to productise going from NCHW to NHWC.

The dimension ordering of your 4d matrix matters before your dense layers, if you don’t account for these 2 formats then you get lots of regression tests breaking when you do the conversion for inference time.

Why is it suboptimal if it’s a flatten layer? - ignoring the one extra transpose added when using NCHW.

I don’t fully understand your final point.

TimZaman · 2018-11-13T01:36:39Z

It's suboptimal as it creates ops in the graph when training channels_first (more optimal for GPUs) that effectively don't do anything. Although I think we agree on that, your original intent here was good, but another change by Francois changed that.

I think this happened: with good intent, you created this initializer:

+   def __init__(self, data_format='channels_last', **kwargs):

The data_format is value here is odd, as Keras usually sets this to None. The behaviour with the channels_last here is good though, because it's backwards compatible, and won't add any extraneous ops.

However, then this change happened 243338d , where I think someone thought this was an error, as data_format everywhere else defaults to None.

-    def __init__(self, data_format='channels_last', **kwargs):
+    def __init__(self, data_format=None, **kwargs):

Which suddenly is not backwards compatible anymore, and changes the behaviour of both training and doing inference on channels_first models!

I suggest reverting 243338d

joeyearsley · 2018-11-13T07:13:03Z

Ah, now I see the confusion! Yes I agree

joeyearsley added 7 commits March 18, 2018 19:44

Added data_format to flatten

05a372c

Added flatten tests

2ffb417

Fixed Tests

6ab4772

Added more dimension tests

01eda4e

Reverted TF backend change

af09987

Reverted

a55f2ff

Fixed CI Problems

c5c6446

Altered to K.ndim for compatability

5d58651

Updated CNTK backend

58f17d3

fchollet reviewed Mar 20, 2018

View reviewed changes

Updated to match comments

0a462a8

fchollet reviewed Mar 31, 2018

View reviewed changes

Updated Docs

bfb4412

fchollet approved these changes Apr 1, 2018

View reviewed changes

fchollet merged commit aedad39 into keras-team:master Apr 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added `data_format` to flatten layer. #9696

Added `data_format` to flatten layer. #9696

joeyearsley commented Mar 18, 2018

fchollet commented Mar 19, 2018

joeyearsley commented Mar 19, 2018

fchollet commented Mar 19, 2018

souptc commented Mar 20, 2018

joeyearsley commented Mar 20, 2018

fchollet Mar 20, 2018

joeyearsley Mar 20, 2018

joeyearsley commented Mar 31, 2018

fchollet Mar 29, 2018

joeyearsley Mar 31, 2018

fchollet left a comment

TimZaman commented Nov 12, 2018

joeyearsley commented Nov 13, 2018

TimZaman commented Nov 13, 2018 •

edited

Loading

joeyearsley commented Nov 13, 2018

Added data_format to flatten layer. #9696

Added data_format to flatten layer. #9696

Conversation

joeyearsley commented Mar 18, 2018

fchollet commented Mar 19, 2018

joeyearsley commented Mar 19, 2018

fchollet commented Mar 19, 2018

souptc commented Mar 20, 2018

joeyearsley commented Mar 20, 2018

fchollet Mar 20, 2018

Choose a reason for hiding this comment

joeyearsley Mar 20, 2018

Choose a reason for hiding this comment

joeyearsley commented Mar 31, 2018

fchollet Mar 29, 2018

Choose a reason for hiding this comment

joeyearsley Mar 31, 2018

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

TimZaman commented Nov 12, 2018

joeyearsley commented Nov 13, 2018

TimZaman commented Nov 13, 2018 • edited Loading

joeyearsley commented Nov 13, 2018

Added `data_format` to flatten layer. #9696

Added `data_format` to flatten layer. #9696

TimZaman commented Nov 13, 2018 •

edited

Loading