Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS test failures #2331

Merged
merged 8 commits into from
Nov 9, 2018
Merged

TLS test failures #2331

merged 8 commits into from
Nov 9, 2018

Conversation

mrocklin
Copy link
Member

@mrocklin mrocklin commented Nov 1, 2018

Do not merge

@mrocklin
Copy link
Member Author

mrocklin commented Nov 2, 2018

@jcrist @mariusvniekerk do either of you have time to check out these test failures? They're breaking tests currently.

@mrocklin mrocklin changed the title A trivial change to test CI TLS test failures Nov 3, 2018
@mrocklin
Copy link
Member Author

mrocklin commented Nov 3, 2018

Also cc @pitrou

This was referenced Nov 3, 2018
@pitrou
Copy link
Member

pitrou commented Nov 3, 2018

You should be able to pinpoint when the failures started to occur? Is it on a specific changeset?

@mrocklin
Copy link
Member Author

mrocklin commented Nov 3, 2018

I haven't tried doing this yet. They didn't show up during any PR and now they fail consistently, so I don't think that they snuck in as some intermittent failures do.

My guess is that some dependency changed upstream.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2018

Do you conda list as part of CI jobs? That's very useful for cases like this, so that you can compare versions.

@pitrou
Copy link
Member

pitrou commented Nov 3, 2018

By the way, I don't think those failures are intermittent. They should even be reproducible locally, with the right software setup.

@mariusvniekerk
Copy link
Collaborator

@pitrou Pretty certain these are related to openssl 1.1 vs 1.0

@pitrou
Copy link
Member

pitrou commented Nov 8, 2018

@mariusvniekerk do you have a proof of that? e.g. conda list output?

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Nov 8, 2018

yep, and the test failures are around TLS 1.3 ciphers

See: https://travis-ci.org/dask/distributed/jobs/449255237#L523

Our issue seems to be mozilla/server-side-tls#191 (comment)

@pitrou
Copy link
Member

pitrou commented Nov 8, 2018

The cipher tests should be easy to relax a bit, but the certification reject test shouldn't fail.

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Nov 8, 2018

it does "warn" about it failing, so it is aware of that but doesn't raise?

except EnvironmentError as e:
# The handshake went wrong, log and ignore
logger.warning("Listener on %r: TLS handshake failed with remote %r: %s",
self.listen_address, address,
getattr(e, "real_error", None) or e)
else:
raise gen.Return(stream)

@pitrou
Copy link
Member

pitrou commented Nov 8, 2018

Right, but it should raise in that case. The warning is in the case the handshake error is because of transient network issues. But if the certificate is reject, that's a permanent error and it should raise an exception. So you must inspect the exception to decide what to do with it.

@mariusvniekerk
Copy link
Collaborator

mariusvniekerk commented Nov 8, 2018

We should change the test to catch and reraise the exception (within that test) if present to see what it is, we could still be raising something but pytest just says it's not EnvironmentError. Want to just rule that one out first.
I'm not sure if i have push rights to this pr

@mrocklin
Copy link
Member Author

mrocklin commented Nov 8, 2018 via email

@mrocklin
Copy link
Member Author

mrocklin commented Nov 8, 2018

Tests seem to pass. The Python 2 failure was due to the print(..., flush=True) syntax in debug code.

@mrocklin
Copy link
Member Author

mrocklin commented Nov 8, 2018

To the extent that I'm able to judge this seems fine to me. There were a couple of test failures but I think that they are intermittent and unrelated.

@mariusvniekerk is there any additional work that you think should be done here?

@@ -329,6 +333,13 @@ def connect(self, address, deserialize=True, **connection_args):
stream = yield client.connect(ip, port,
max_buffer_size=MAX_BUFFER_SIZE,
**kwargs)
# Under certain circumstances tornado will have a closed connnection with an error and not raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be worth reporting the issue to the Tornado bug tracker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrocklin I think this gets us working for now.
@pitrou I don't think tornado is tested a python built against openssl 1.1+ atm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, but it's still worth reporting. I generally find it annoying when people don't report issues upstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mrocklin
Copy link
Member Author

mrocklin commented Nov 8, 2018

I removed my initial dummy commit and added a couple small style changes. I hope that now that the first commit belongs to @mariusvniekerk that the "Squash and merge" feature will appropriately assign ownership to him

@mrocklin
Copy link
Member Author

mrocklin commented Nov 8, 2018

This does mean a force push though, so you may want to git reset --hard before pulling/pushing if there is any more to do

@mrocklin mrocklin merged commit 3fed641 into dask:master Nov 9, 2018
mrocklin pushed a commit that referenced this pull request Nov 9, 2018
* Check ciphers less aggresively for TLS1.3 ciphers

When python is compiled against openssl 1.1 we have TLS1.3 ciphers in the
mix as well.  We should treat these appropriately

* Attempt to mitigate the failed connection case

* Introduced a FatalCommClosedError

This deals with TLS errors that cannot be recovered.

* Relaxed forced cipher test
@mrocklin mrocklin deleted the test-ci branch November 9, 2018 12:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants