Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate SSL settings at parse time #49196

Merged
merged 18 commits into from
Mar 4, 2020

Conversation

danhermann
Copy link
Contributor

Provides parse-time validation for monitoring SSL/TLS settings as described in #47711.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Monitoring)

@danhermann danhermann changed the title Validate SSL settings at parse time [WIP] Validate SSL settings at parse time Nov 15, 2019
@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann danhermann changed the title [WIP] Validate SSL settings at parse time Validate SSL settings at parse time Nov 28, 2019
@danhermann
Copy link
Contributor Author

@rjernst, this is the last of the settings validation PRs if you or someone else from your team would like to review it.

@rjernst
Copy link
Member

rjernst commented Dec 5, 2019

@danhermann Perhaps I am misunderstanding this change. I thought the point was to add validators to the Setting instances, yet this seems to still do validation much later? Is the problem the "ssl" setting is a group setting, and thus we don't try to look at the settings underneath it until later?

@danhermann
Copy link
Contributor Author

@rjernst, the problem with adding the validation to SSL_SETTING itself is that the validation required instance data that was not available at the creation time of the static SSL_SETTING. Adding the validator on the cluster setting itself appears to have the same effect of rejecting invalid dynamic updates to that setting through the PUT /_cluster/settings API before they are applied to cluster state.

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann
Copy link
Contributor Author

@rjernst, is this approach to validation acceptable to you or do I need to revisit it?

@rjernst
Copy link
Member

rjernst commented Dec 16, 2019

There are two issues I see here:

  • I don't understand what validation is actually being done. The soft check (only a warning if http is set with user/pass set) seems to have been moved to not be done on validation. The validateSslSettings method seems to essentially check a precondition, as secure settings would never be passed through to the cluster state setting update validators. And configureSslStrategy does not appear to do any validation.
  • I believe we need validators in the setting objects themselves, otherwise we will not check when the node is instantiated. In the case of these settings, part of the problem may be that SSL_SETTING is something intended to capture all of these settings. We should be registering the individual settings instead of using a group setting (which IMO should go away as it was a precursor to affix settings, but that is a separate issue).

@danhermann
Copy link
Contributor Author

@rjernst, thank you for the additional info. Let me add a few details around the issues you raised above.

I don't understand what validation is actually being done. The soft check (only a warning if http is set with user/pass set) seems to have been moved to not be done on validation. The validateSslSettings method seems to essentially check a precondition, as secure settings would never be passed through to the cluster state setting update validators. And configureSslStrategy does not appear to do any validation.

The validation that's happening there is a check that none of the SSL settings involve secure settings since they are not reloadable and so may not be dynamically changed.

If there are no secure settings, the configureSslStrategy method checks that the provided settings produce a valid SSLIOStrategy instance since there are a number of exceptions that can be thrown from supporting classes in org.elasticsearch.xpack.core.ssl around unsupported cyphers, invalid verification modes, etc., and I did not think it made sense to duplicate all those checks in HttpExporter. These latter SSL settings were not previously being validated. If a dynamic update with invalid SSL settings was attempted, it would poison the cluster state and bring down the cluster in a way that recovery was difficult (#47038).

I believe the warning about http with user/pass is still logged at the same point as before.

I believe we need validators in the setting objects themselves, otherwise we will not check when the node is instantiated. In the case of these settings, part of the problem may be that SSL_SETTING is something intended to capture all of these settings. We should be registering the individual settings instead of using a group setting (which IMO should go away as it was a precursor to affix settings, but that is a separate issue).

Fully validating the SSL settings requires an instance of SSLService which, insofar as I could tell, is not statically available at class load time for the HttpExporter class and that's the main reason I stuck with a cluster settings validator.

That said, there are three scenarios I can think of in which invalid values could be provided for these settings:

  • dynamic updates via PUT /_cluster/settings: this is the reason this issue was originally opened and the cluster settings validator does successfully reject dynamic updates with invalid setting values.

  • elasticsearch.yml: I believe these are validated on node startup and invalid values prevent the node from starting.

  • preexisting cluster state: for a node to get into the state where the on-disk cluster state contained invalid SSL settings, an invalid value would have had to be dynamically provided in a version of ES without this validation. In that case, the cluster would be essentially down until the recovery method (disable monitoring, null out the invalid setting, re-enable monitoring and provide valid SSL settings) for No easy way to fix incorrect persistent cluster setting #47038 was followed. At that point, this validator would prevent the cluster from getting back into that state.

I thought it might be acceptable given that it prevents the first scenario, the second scenario is already handled, and the third scenario would be possible only when upgrading from previous ES versions and after recovery would subsequently be prevented. Let me know if you think I missed another option here that would be preferable.

@elasticmachine
Copy link
Collaborator

merge conflict between base and head

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann
Copy link
Contributor Author

@rjernst, do any of the comments I made above address either of the issues you raised?

@rjernst
Copy link
Member

rjernst commented Feb 27, 2020

Dan and I spoke offline and agreed the validation should be simpler if each SSL setting is an affix setting instead of pulling them all together as a group setting. I'm ok with this PR going in if there is a compelling need to get this validation in. @danhermann Do you think we should get this in or close it in favor of the cleanup work described?

@danhermann
Copy link
Contributor Author

Yes, I'll close this one out in favor of separate affix settings for the SSL settings.

@danhermann danhermann closed this Feb 28, 2020
@jakelandis
Copy link
Contributor

@rjernst I agree with the your earlier comment, however, after discussion with @danhermann, ideally these SSL settings would be re-usable and validate-able affix settings which would require a decent investment. This PR as-is provides protection from invalid cluster state and would like to get this in sooner then later. Any objections to re-open and merge this, and log a follow up issue ?

@rjernst
Copy link
Member

rjernst commented Mar 3, 2020

Any objections to re-open and merge this

I'm fine with that.

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann
Copy link
Contributor Author

@elasticmachine update branch

@danhermann danhermann merged commit 15a6442 into elastic:master Mar 4, 2020
@danhermann danhermann deleted the 47711_ssl_settings branch March 4, 2020 17:01
danhermann added a commit to danhermann/elasticsearch that referenced this pull request Mar 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants