Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense Image Warp tests are flaky #138

Closed
seanpmorgan opened this issue Apr 3, 2019 · 17 comments · Fixed by #187
Closed

Dense Image Warp tests are flaky #138

seanpmorgan opened this issue Apr 3, 2019 · 17 comments · Fixed by #187
Labels
bug Something isn't working image

Comments

@seanpmorgan
Copy link
Member

Recently we've seen that #53 is causing flaky failures in the CI. See:
https://source.cloud.google.com/results/invocations/8f31faef-505a-440e-b75f-e6edf1071269/targets/tensorflow_addons%2Fubuntu%2Fgpu%2Fpy3%2Fpresubmit/log

Do you mind taking a look when time allows @WindQAQ ?

@seanpmorgan seanpmorgan added bug Something isn't working image labels Apr 3, 2019
@WindQAQ
Copy link
Member

WindQAQ commented Apr 4, 2019

Seems that the behavior of tf.debugging.assert* that we investigated in #53 results in the error.
But I'm not so sure why the test fails even if it runs in eager mode only.

@WindQAQ
Copy link
Member

WindQAQ commented Apr 4, 2019

Okay, I find that the following script also fails in tf 2.0.0-dev20190403 on my machine. CC @tomerk for visibility.

import tensorflow as tf
from tensorflow.python.framework import test_util


@tf.function
def foo(x):
    tf.debugging.assert_greater_equal(x.shape[0], 2, message="fail")
    y = x[1]
    return y


class TestAssert(tf.test.TestCase):

    def test_assert(self):
        with self.assertRaisesRegexp(tf.errors.InvalidArgumentError, "fail"):
            self.evaluate(foo(tf.random.uniform(shape=(1, 2, 3))))


if __name__ == "__main__":
    tf.test.main()

@seanpmorgan
Copy link
Member Author

seanpmorgan commented Apr 5, 2019

@WindQAQ So while this does look like a bug... is there any reason we need to be using tf.debugger for the checks in dense_image_warp? I believe the uses can all be checked using the python interpreter (Or in graph mode with tf.function and typical python syntax)?

@WindQAQ
Copy link
Member

WindQAQ commented Apr 5, 2019

The checks are all about shape inference. If tf.function can handle the task of None shape or this is even not a problem in TF 2.0, I think the checks can definitely use simple assert, if and raise instead.

@seanpmorgan
Copy link
Member Author

Could we try a PR that uses just if and raise and then make sure there is a test case that will fail if it doesn't properly trap shape issue?

@WindQAQ
Copy link
Member

WindQAQ commented Apr 5, 2019

Sure, I'll create a PR in one or two days :-)

@WindQAQ WindQAQ mentioned this issue Apr 8, 2019
@WindQAQ
Copy link
Member

WindQAQ commented Apr 8, 2019

In the latest docker image (tf==2.0.0-dev20190405), I find that the error is caused by test_zero_flows. The workaround is to disable test in graph mode. Still working on replacing tf.debugging with pure if and raise statement for future extension.

@armando-fandango
Copy link
Contributor

@WindQAQ one more is failing now, sorry it was not failing earlier:

[  FAILED  ] DenseImageWarpTest.test_size_exception
[ RUN      ] DenseImageWarpTest.test_zero_flows
[       OK ] DenseImageWarpTest.test_zero_flows
======================================================================
FAIL: test_size_exception (__main__.DenseImageWarpTest)
test_size_exception (__main__.DenseImageWarpTest)
Make sure it throws an exception for images that are too small.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/absl/third_party/unittest3_backport/case.py", line 37, in testPartExecutor
    yield
  File "/usr/local/lib/python2.7/dist-packages/absl/third_party/unittest3_backport/case.py", line 162, in run
    testMethod()
  File "/root/.cache/bazel/_bazel_root/e051f2f195071208ea7f081f88730f4f/execroot/__main__/bazel-out/k8-opt/bin/tensorflow_addons/image/dense_image_warp_test.runfiles/__main__/tensorflow_addons/image/dense_image_warp_test.py", line 223, in test_size_exception
    self._check_interpolation_correctness(shape, "float32", "float32")
AssertionError: "Grid width must be at least 2." does not match "indices[0,0] = -1 is not in [0, 2)
	 [[{{node dense_image_warp/StatefulPartitionedCall/interpolate_bilinear/gather-top_left/GatherV2}}]] [Op:__inference_dense_image_warp_9536]"

----------------------------------------------------------------------
Ran 8 tests in 52.030s

FAILED (failures=1)
================================================================================

@WindQAQ
Copy link
Member

WindQAQ commented Apr 9, 2019

@armando-fandango I cannot reproduce this error now. Is this on the CI build or your local machine? Thanks!

@seanpmorgan Hi Sean, I'd suppose it is not easy due to tf.function's state. Seems that the following code snippet will not result in multiple distinct graphs being traced, and thus, ValueError is also raised when shape=[3, 3, 3]. Sorry that I am not familiar with tf.function and not sure if the use case is correct.

import tensorflow as tf


@tf.function
def foo(x):
    if tf.shape(x)[0] < 3:
        raise ValueError("fail")
    return x[2]


for shape in [[1, 1, 1], [2, 2, 2], [3, 3, 3]]:
    try:
        foo(tf.random.uniform(shape=shape))
    except ValueError:
        print(shape)

for shape in [[3, 3, 3], [2, 2, 2], [1, 1, 1]]:
    try:
        foo(tf.random.uniform(shape=shape))
    except ValueError:
        print(shape)

'''
output: 
[1, 1, 1]
[2, 2, 2]
[3, 3, 3]
[3, 3, 3]
[2, 2, 2]
[1, 1, 1]
'''

@armando-fandango
Copy link
Contributor

This is on latest custom-ops container with master branch. I have two macbooks, and it fails only on one of them.

@armando-fandango
Copy link
Contributor

Do we need to add make clean in case something is getting cached ?

@WindQAQ
Copy link
Member

WindQAQ commented Apr 10, 2019

It's weird. Seems that the behavior varies from machine to machine. I think another workaround is not to use tf.function on this temporarily, but this might not meet the requirements for addons. Or we can just wait for tf.debugging.assert* working as expected...

@seanpmorgan
Copy link
Member Author

Yeah so I believe it'll fail on every machine with enough runs. I'm fine with a removal of tf.function for the time being... but we should look into a removal of tf.debugging as a long term fix

@seanpmorgan
Copy link
Member Author

Hmmm tests appear to be passing repeatedly. Possible there has been an upstream fix... such is life building against an alpha release.

Closing, but happy to re-open if we see issues again.

@seanpmorgan
Copy link
Member Author

Welp... too quick on calling that (nightly fail). We should either remove tf.debugging or remove the tf.funcion decorator

@khatchad
Copy link

Welp... too quick on calling that (nightly fail). We should either remove tf.debugging or remove the tf.funcion decorator

You should be able to use tf.debugging.assert_equal() with tf.function if you follow it up with a control dependency.

@khatchad
Copy link

Seems that the behavior of tf.debugging.assert* that we investigated in #53 results in the error. But I'm not so sure why the test fails even if it runs in eager mode only.

I don't think you need tf.debugging.assert_equal() in eager mode. It should return None in eager mode. I believe you should be able to use the normal assert() functions in eager mode as there is no graph involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working image
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants